Loading ...
Sorry, an error occurred while loading the content.

remove spaces in file name but not in path - howto?

Expand Messages
  • Alec Burgess
    Can anyone figure out a regular expression that when working on a filename.ext (eg. C: Program Files name with spaces.txt) will remove the space from
    Message 1 of 30 , Jun 19, 2011
    • 0 Attachment
      Can anyone figure out a regular expression that when working on a
      <pathname>filename.ext
      (eg. C:\Program Files\name with spaces.txt) will remove the space from
      the filename but not the path.

      why needed: MultiRen 3 from PCMag has regexp rename capability but (as
      I just discovered from a question in the PCMag forum) does *NOT* have
      the ability to exclude the path name from the file spec.
      so the simple minded Find=\x20 Replace=<nothing> would convert the above
      file to:
      C:\ProgramFiles\namewithspaces.txt - *NOT* what is desired!

      Find=(.*\\[^\x20]*)\x20 Replace=$1 will eat the first (only) space but
      not the remaining ones and
      Find=(.*\\[^\x20]*)\x20([^\x20]*)\x20 Replace=$1$2 will eat two spaces
      (could be extended to more) but will not eat only one space if that's
      all there are.

      I was trying to figure out something that might work with positive or
      negative look ahead/behind and or the \K w/o success.

      Note: Find=(.*\\[^\x20]*)\K\x20 does correctly select only the first
      space in file name but if I attempt to replace with <null> the Replace
      dialog says nothing to replace. Seems like a bug to me?

      If interested see:
      http://discuss.pcmag.com/forums/1004432069/ShowThread.aspx#1004432069

      --
      Regards ... Alec (buralex@gmail& WinLiveMess - alec.m.burgess@skype)
    • flo.gehrke
      ... Alec, Try this... ^!Replace x20(?=[^. ]+ .txt) WARS It replaces any space x20 at a position that is defined with the Lookahead Assertion
      Message 2 of 30 , Jun 19, 2011
      • 0 Attachment
        --- In ntb-clips@yahoogroups.com, Alec Burgess <buralex@...> wrote:
        >
        > Can anyone figure out a regular expression that when working on a
        > <pathname>filename.ext
        > (eg. C:\Program Files\name with spaces.txt) will remove the
        > space from the filename but not the path.

        Alec,

        Try this...

        ^!Replace "\x20(?=[^.\\]+\.txt)" >> "" WARS

        It replaces any space '\x20' at a position that is defined with the Lookahead Assertion '(?=[^.\\]+\.txt)'. That is, at a position where you don't see one or more characters being a literal dot or a backslash followed by a literal dot and the string 'txt' when looking ahead.

        Another approach would be: First select the file name, and , inside the selection, replace all spaces with nothing. For example...

        ^!Jump Doc_Start
        ^!Find "\\[^\\]+\.txt" RS
        ^!IfError Skip_2
        ^!Replace "\x20" >> "" HARS
        ^!Goto Skip_-3

        > Note: Find=(.*\\[^\x20]*)\K\x20 does correctly select only the
        > first space in file name but if I attempt to replace with
        > <null> the Replace dialog says nothing to replace. Seems like a
        > bug to me?

        Sorry, I can't reproduce this.

        Regards,
        Flo
      • acmewebwerks
        if I were going to build a clip to do this I would: 1. convert all complete names to a list - one per row 2. select from line end backwards to the first / 3.
        Message 3 of 30 , Jun 19, 2011
        • 0 Attachment
          if I were going to build a clip to do this I would:

          1. convert all complete names to a list - one per row
          2. select from line end backwards to the first /
          3. make your replacements within selected text
          4. do it for all rows

          maybe you could apply this to your regex logic.

          regards

          tf
        • acmewebwerks
          if I were going to build a clip to do this I would: 1. convert all complete names to a list - one per row 2. select from line end backwards to the first / 3.
          Message 4 of 30 , Jun 19, 2011
          • 0 Attachment
            if I were going to build a clip to do this I would:

            1. convert all complete names to a list - one per row
            2. select from line end backwards to the first /
            3. make your replacements within selected text
            4. do it for all rows

            maybe you could apply this to your regex logic.

            regards

            tf
          • Alec Burgess
            Flo: I posted your suggestion with slight modification on the thread: http://discuss.pcmag.com/forums/1004432073/ShowThread.aspx#1004432073 ... Thanks very
            Message 5 of 30 , Jun 19, 2011
            • 0 Attachment
              Flo: I posted your suggestion with slight modification on the thread:
              http://discuss.pcmag.com/forums/1004432073/ShowThread.aspx#1004432073
              > This regex Search Pattern=\x20(?=[^\\]*$) Replace Pattern=<null> (ie.
              > leave it empty) will replace *all* spaces with nothing in the file
              > name but not in the path name.
              > If you wish instead to replaces spaces by (say) an underscore just put
              > it in the Replace Pattern.
              > Note: In english above Search Pattern says look for a space followed
              > by (but not included in what is matched - a positive lookahead) a
              > string of zero or more characters which do NOT include a back-slash
              > followed by end of string - the $ sign
              > Credit to Flo Gehrke on the notetab-clips list
              > http://tech.groups.yahoo.com/group/ntb-clips/ who came up with this
              > approach in response to my question there.
              Thanks very much for you suggestion!

              On 2011-06-19 10:30, flo.gehrke wrote:
              > --- In ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com>,
              > Alec Burgess <buralex@...> wrote:
              > >
              > > Can anyone figure out a regular expression that when working on a
              > > <pathname>filename.ext
              > > (eg. C:\Program Files\name with spaces.txt) will remove the
              > > space from the filename but not the path.
              >
              > Alec,
              >
              > Try this...
              >
              > ^!Replace "\x20(?=[^.\\]+\.txt)" >> "" WARS
              >
              > It replaces any space '\x20' at a position that is defined with the
              > Lookahead Assertion '(?=[^.\\]+\.txt)'. That is, at a position where
              > you don't see one or more characters being a literal dot or a
              > backslash followed by a literal dot and the string 'txt' when looking
              > ahead.
              >
              > Another approach would be: First select the file name, and , inside
              > the selection, replace all spaces with nothing. For example...
              >
              > ^!Jump Doc_Start
              > ^!Find "\\[^\\]+\.txt" RS
              > ^!IfError Skip_2
              > ^!Replace "\x20" >> "" HARS
              > ^!Goto Skip_-3
              >
              > > Note: Find=(.*\\[^\x20]*)\K\x20 does correctly select only the
              > > first space in file name but if I attempt to replace with
              > > <null> the Replace dialog says nothing to replace. Seems like a
              > > bug to me?
              >
              > Sorry, I can't reproduce this.


              [Non-text portions of this message have been removed]
            • bruce.somers@web.de
              Can NoteTab determine the path to a given file with the volume label prepended ? When using removable media, the drive letter is not of great use in lists of
              Message 6 of 30 , Jun 20, 2011
              • 0 Attachment
                Can NoteTab determine the path to a given file with the volume label 'prepended'?

                When using removable media, the drive letter is not of great use in lists of file names, for example, as another drive letter is likely to be assigned when the device is mounted anew.

                     volume02:/directoryaaa/filename.ext

                is thus more significant and useful than

                     h:/directoryaaa/filename.ext
              • Eb
                It s not real clear what you want from the volume label. It s not part of a file name or path. If you simply want to make sure you process the correct file, or
                Message 7 of 30 , Jun 22, 2011
                • 0 Attachment
                  It's not real clear what you want from the volume label. It's not part of a file name or path.

                  If you simply want to make sure you process the correct file, or find it, there are a number of ways to do so. For example:

                  ^!Set %drv%=DCEFGH; %i%=0
                  :LOOP
                  ^!Inc %i%
                  ^!IfFileExist "^$StrIndex(^%drv%;^%i%)$:\FilePathName" PROCESS_FILE
                  ^!If ^%i%<6 LOOP else END
                  :PROCESS FILE

                  Cheers

                  Eb


                  --- In ntb-clips@yahoogroups.com, bruce.somers@... wrote:
                  >
                  > Can NoteTab determine the path to a given file with the volume label 'prepended'?
                  >
                  > When using removable media, the drive letter is not of great use in lists of file names, for example, as another drive letter is likely to be assigned when the device is mounted anew.
                  >
                  >      volume02:/directoryaaa/filename.ext
                  >
                  > is thus more significant and useful than
                  >
                  >      h:/directoryaaa/filename.ext
                  >
                • flo.gehrke
                  ... Sorry, I m unsure whether this a question or just a statement. Since the same message has already been posted on June 19 this might be a reminder of an
                  Message 8 of 30 , Jun 28, 2011
                  • 0 Attachment
                    --- In ntb-clips@yahoogroups.com, "acmewebwerks" <frank@...> wrote:
                    >
                    > if I were going to build a clip to do this I would:
                    >
                    > 1. convert all complete names to a list - one per row
                    > 2. select from line end backwards to the first /
                    > 3. make your replacements within selected text
                    > 4. do it for all rows
                    >
                    > maybe you could apply this to your regex logic.
                    >
                    > regards
                    >
                    > tf

                    Sorry, I'm unsure whether this a question or just a statement. Since the same message has already been posted on June 19 this might be a reminder of an unanswered question. Well then...

                    The following clip tries to translate your concept. For outputting the files in a list you could try something like this...


                    ^!Set %Types%=(txt|bmp|html)
                    ^!SetClipboard ^$GetDocListAll("C:\\.+?\.^%Types%";"$0\r\n")$
                    ^!Toolbar New Document
                    ^$GetClipboard$
                    ^!Jump Doc_Start
                    ^!Find "\\[^\\]+$" RIS
                    ^!IfError Skip_2
                    ^!Replace "\x20" >> "" HARS
                    ^!Goto Skip_-3


                    With a RegEx similar to the one I've posted before you could write...


                    ^!SetClipboard ^$GetDocListAll("C:\\.+?(txt|bmp|html)";"$0\r\n")$
                    ^!Toolbar New Document
                    ^$GetClipboard$
                    ^!Replace "\x20(?=[^\\]+$)" >> "" WARS


                    The search always depends on the structure of your document, of course. I'm assuming that we are dealing with paths and file names in a full-text document.

                    Regards,
                    Flo
                  • Alec Burgess
                    Hi Frank/Flo: Yes this was my original question. The requirement was a single regexp (find / replace pair) that could be used in another application - *NOT* in
                    Message 9 of 30 , Jun 28, 2011
                    • 0 Attachment
                      Hi Frank/Flo:

                      Yes this was my original question. The requirement was a single regexp
                      (find / replace pair) that could be used in another application - *NOT*
                      in a Notetab clip.

                      The Input string against which the regexp must work is a complete
                      <pathname>\<filename>.<ext> where both path-name and filename *MAY*
                      contain spaces.

                      The answer (supplied by Flo and tweaked by me) basically consists of:
                      Find a space and supply a positive lookaround after which does *NOT
                      contain a backslash and terminates at end of string.

                      Thus the first (and any following) spaces are matched one at a time and
                      replaced by <null> and same thing is done for any remaining spaces.
                      Because of the lookaround NOT including backslashes after the <space>
                      any spaces in the pathname are left as is.

                      This is the regex we came up with:Find= \x20(?=[^\\]*$) Replace=<empty>

                      If someone else wanted to do it in notetab (I don't) where input buffer
                      contains a list of path-file-ext above could be done in Find/Replace
                      dialog and click [Replace All]

                      **** I should have been clearer when I initially posted the question
                      that I was *NOT* looking for a clip but simply a Regex pattern. ****

                      Regards ... Alec (buralex@gmail& WinLiveMess - alec.m.burgess@skype)



                      On 2011-06-28 11:41, flo.gehrke wrote:
                      > --- In ntb-clips@yahoogroups.com, "acmewebwerks"<frank@...> wrote:
                      >> if I were going to build a clip to do this I would:
                      >>
                      >> 1. convert all complete names to a list - one per row
                      >> 2. select from line end backwards to the first /
                      >> 3. make your replacements within selected text
                      >> 4. do it for all rows
                      >>
                      >> maybe you could apply this to your regex logic.
                      >>
                      >> regards
                      >>
                      >> tf
                      > Sorry, I'm unsure whether this a question or just a statement. Since the same message has already been posted on June 19 this might be a reminder of an unanswered question. Well then...
                      >
                      > The following clip tries to translate your concept. For outputting the files in a list you could try something like this...
                      >
                      >
                      > ^!Set %Types%=(txt|bmp|html)
                      > ^!SetClipboard ^$GetDocListAll("C:\\.+?\.^%Types%";"$0\r\n")$
                      > ^!Toolbar New Document
                      > ^$GetClipboard$
                      > ^!Jump Doc_Start
                      > ^!Find "\\[^\\]+$" RIS
                      > ^!IfError Skip_2
                      > ^!Replace "\x20">> "" HARS
                      > ^!Goto Skip_-3
                      >
                      >
                      > With a RegEx similar to the one I've posted before you could write...
                      >
                      >
                      > ^!SetClipboard ^$GetDocListAll("C:\\.+?(txt|bmp|html)";"$0\r\n")$
                      > ^!Toolbar New Document
                      > ^$GetClipboard$
                      > ^!Replace "\x20(?=[^\\]+$)">> "" WARS
                      >
                      >
                      > The search always depends on the structure of your document, of course. I'm assuming that we are dealing with paths and file names in a full-text document.
                      >
                      > Regards,
                      > Flo
                      >
                      >
                      >
                      >
                    • Don
                      ... I can use a clip for this project, but I wonder if a regex could do it. I want to essentially ask for data that I do not want to find in a line and if that
                      Message 10 of 30 , Jun 28, 2011
                      • 0 Attachment
                        > **** I should have been clearer when I initially posted the question
                        > that I was *NOT* looking for a clip but simply a Regex pattern. ****

                        I can use a clip for this project, but I wonder if a regex could do it.

                        I want to essentially ask for data that I do not want to find in a line
                        and if that data is not in the line, delete the line.

                        We conclude the other day that negative classes were character at a time
                        (I think).

                        I was trying something like this (not expecting it to quite work obviously):
                        ^!Set %DataStuff%=org|net
                        ^!Replace ".*[^%DataStuff].*\r\n" >> "" RAWH

                        For the moment assume I want to remove all .com email addresses from a
                        list (in the end I want to prompt for the search phrase).

                        I was doing it with a stepped clip, but it keeps jumping out of the loop
                        half way through a job. I think it may be a keyboard delay issue as I
                        use the backspace key.

                        Input:
                        john@...
                        jane@...
                        jeff@...
                        fred@...

                        Output:
                        jane@...
                        fred@...

                        Existing clip:
                        ^!Jump Doc_Start
                        ^!Set %DataStuff%=^?{Term to Search For, Pipe Separated}
                        :Loop
                        ^!Select Eol
                        ^!If "^$GetSelection$" <> "" Skip_2
                        ^!Keyboard DELETE
                        ^!Goto Advance

                        ^!Find "^%DataStuff%" TIHRS
                        ^!IfError Next ELSE JumpLine
                        ^!DeleteLine
                        ^!If ^$GetRow$ = ^$GetLinecount$ End
                        ^!Goto Advance

                        :JumpLine
                        ^!If ^$GetRow$ = ^$GetLinecount$ Next ELSE Skip_2
                        ^!DeleteLine
                        ^!Goto End
                        ^!Jump +1

                        :Advance
                        ;end at end of file
                        ^!Goto Loop


                        This clip works pretty well but derails occasionally -- again I think a
                        keyboard issue. It also misses the last line so I need to think that
                        through better.
                      • Axel Berger
                        ... Is there any reason why noone so far has suggested making use of the intrinsic functions ^$GetFileName(FileName)$ and ^$GetPath(FileName)$ ? With these
                        Message 11 of 30 , Jun 28, 2011
                        • 0 Attachment
                          Alec Burgess wrote:
                          > The Input string against which the regexp must work is a complete
                          > <pathname>\<filename>.<ext> where both path-name and filename *MAY*
                          > contain spaces.

                          Is there any reason why noone so far has suggested making use of the
                          intrinsic functions ^$GetFileName(FileName)$ and ^$GetPath(FileName)$ ?
                          With these it's easy to separate the name and only to work on that.

                          Axel
                        • diodeom
                          ... If you want to remove the lines that don t contain certain terms (e.g. either .org or .net), you could avoid any hassle of actually looking for them by
                          Message 12 of 30 , Jun 28, 2011
                          • 0 Attachment
                            Don wrote:
                            >
                            > I want to essentially ask for data that I do not want to find in a line
                            > and if that data is not in the line, delete the line.
                            >

                            If you want to remove the lines that don't contain certain terms (e.g. either .org or .net), you could avoid any hassle of actually looking for them by instead just slurping the ones containing the "keeper" strings and pasting them over the selection.

                            For example:

                            ^!Select All
                            ^!SetListDelimiter ^p
                            ^$GetDocMatchAll("^.*\.(org|net).*$")$
                          • Alec Burgess
                            Axel: see my previous response ... its *NOT* for Notetab and hence, those functions are not available. Even if it *was* for notetab, those functions would
                            Message 13 of 30 , Jun 28, 2011
                            • 0 Attachment
                              Axel: see my previous response ... its *NOT* for Notetab and hence,
                              those functions are not available. Even if it *was* for notetab, those
                              functions would require either a loop on line-by-line processing. CORRECT?

                              Dollars to Donuts (Euros to appfel strudel?) the single regex would be
                              faster <grin>

                              On 2011-06-28 14:36, Axel Berger wrote:
                              > Alec Burgess wrote:
                              > > The Input string against which the regexp must work is a complete
                              > > <pathname>\<filename>.<ext> where both path-name and filename *MAY*
                              > > contain spaces.
                              >
                              > Is there any reason why noone so far has suggested making use of the
                              > intrinsic functions ^$GetFileName(FileName)$ and ^$GetPath(FileName)$ ?
                              > With these it's easy to separate the name and only to work on that.
                              Actually I think someone did but haven't re-checked the thread.
                            • Axel Berger
                              ... Got it, and will shut up now. Axel
                              Message 14 of 30 , Jun 28, 2011
                              • 0 Attachment
                                Alec Burgess wrote:
                                > its *NOT* for Notetab

                                Got it, and will shut up now.

                                Axel
                              • Don
                                ... You are brilliant and I thank you! I can t say how many time people on this list, you included as often as anyone, except perhaps the dear departed Jody
                                Message 15 of 30 , Jun 28, 2011
                                • 0 Attachment
                                  On 6/28/2011 2:38 PM, diodeom wrote:
                                  > ^!Select All
                                  > ^!SetListDelimiter ^p
                                  > ^$GetDocMatchAll("^.*\.(org|net).*$")$

                                  You are brilliant and I thank you! I can't say how many time people on
                                  this list, you included as often as anyone, except perhaps the dear
                                  departed Jody and Sheri, have point out the obvious way of doing
                                  something I sit there head scratching over.

                                  ^!Set %DataTested%=^?{RegEx Term to Search For, Pipe Separated "or"}
                                  ^!Select All
                                  ^!SetListDelimiter ^p
                                  ^!Set %DataOutput%="^$GetDocMatchAll("^.*(^%DataTested%).*$")$"
                                  ^!InsertText ^%DataOutput%

                                  Perfect!
                                • flo.gehrke
                                  ... Alec, No problem -- your posting was perfectly clear. My last message (#21834), however, was a reply to tf/Frank/acmewebwerks . It didn t address your
                                  Message 16 of 30 , Jun 29, 2011
                                  • 0 Attachment
                                    --- In ntb-clips@yahoogroups.com, Alec Burgess <buralex@...> wrote:
                                    >
                                    > Hi Frank/Flo:
                                    > Yes this was my original question....
                                    > **** I should have been clearer when I initially
                                    > posted the question that I was *NOT* looking for
                                    > a clip but simply a Regex pattern. ****

                                    Alec,

                                    No problem -- your posting was perfectly clear.

                                    My last message (#21834), however, was a reply to "tf/Frank/acmewebwerks". It didn't address your original question but an issue that was posted by Frank on June 19 (#21816). For an unknown reason, he posted the same message again on June 28 (#21827) -- or was this a "Yahoo trick"?

                                    Maybe he could clarify this confusion and answer whether those clips are matching *his* needs or not.

                                    Regards,
                                    Flo
                                  • diodeom
                                    ... I m not much for pastry, but I wouldn t mind high-rolling on crunchy Reibekuchen. I d propose that the apparent efficiency of PCRE, a
                                    Message 17 of 30 , Jun 29, 2011
                                    • 0 Attachment
                                      Alec Burgess wrote:
                                      >
                                      > (...) Even if it *was* for notetab, those functions would require
                                      > either a loop on line-by-line processing. CORRECT?
                                      >
                                      > Dollars to Donuts (Euros to appfel strudel?) the single regex would
                                      > be faster <grin>
                                      >

                                      I'm not much for pastry, but I wouldn't mind high-rolling on crunchy Reibekuchen. <dreamy smile>

                                      I'd propose that the apparent efficiency of PCRE, a streamlined C library versus comparatively slow iterations of highly interpreted Clip lingo has many of us "speed junkies" afflicted by LAS, the loop avoidance syndrome. In my severe case (and within the tunnel vision of my needs) I often see Clips just as a mere convenient interface to the mesmerizing powers of the "proper beast of burden," RegEx. And I ain't much apologetic about it. Only sporadic inklings of broader perspective remind me that, apples to apples, there is nothing slow about loops... when written in C. And even though within the contrastingly "lethargic" realm of Clips a disdain for these indispensable constructs often has pragmatic justification, I'm afraid that LAS makes me occasionally miss out on conceptually more straightforward solutions. Is there a pill for that?

                                      To follow your digression "Even if it *was* for notetab," I think one would still need to cycle through a directory of files to rename them one by one, whether the new names were acquired from some RegEx-manipulated list or not. In this context, right along Axel's observation, the core iterated portion of a simple file-renaming clip (that you didn't ask for to begin with, I know, I know) could look as follows:

                                      ^!RenameFile "^%f%" "^%p%^$StrReplace(" ";"";^$GetFileName(^%f%)$;0;0)$"

                                      ... where %f% stands for complete filename and %p% for its path portion. I believe this line framed in a loop of either of the GetFile methods could quite efficiently and elegantly accomplish the task delegated to MultiRen you were originally helping out with.
                                    • flo.gehrke
                                      ... diodeom, You are absolutely right, of course. But allow me some hair-splitting here ;-) To delete a line if that data is not in the line (Don) would be a
                                      Message 18 of 30 , Jun 29, 2011
                                      • 0 Attachment
                                        --- In ntb-clips@yahoogroups.com, "diodeom" <diomir@...> wrote:
                                        >
                                        > Don wrote:
                                        > >
                                        > > I want to essentially ask for data that I do not want to find in a line
                                        > > and if that data is not in the line, delete the line.
                                        > >
                                        >
                                        > If you want to remove the lines that don't contain certain terms (e.g. either .org or .net), you could avoid any hassle of actually looking for them by instead just slurping the ones containing the "keeper" strings and pasting them over the selection.
                                        >
                                        > For example:
                                        >
                                        > ^!Select All
                                        > ^!SetListDelimiter ^p
                                        > ^$GetDocMatchAll("^.*\.(org|net).*$")$
                                        >

                                        diodeom,

                                        You are absolutely right, of course. But allow me some hair-splitting here ;-)

                                        To delete a line "if that data is not in the line (Don)" would be a kind of *negative* definition of search criteria.

                                        According with Don's sample, we wouldn't search positively for 'org' or 'net', but we literally would delete all lines which do not contain 'com' or 'ru'. So, based on your approach, another and possibly more precise solution could be...

                                        ^!Select All
                                        ^!SetListDelimiter ^p
                                        ^$GetDocMatchAll("^.+$(?<!com|ru)")$

                                        It also avoids the '.*' that we see in your line...

                                        ^$GetDocMatchAll("^.*\.(org|net).*$")$

                                        which tests for something that, in Don's list, actually never occurs between the domain and the end of line. I assume this was meant as a trick to work around the error that NT (Pro 6.2) makes with

                                        ^$GetDocMatchAll("^.*\.(org|net)$")$

                                        In this case, NT stumbles over the '$' inside the parentheses. To avoid any error that might be caused by something that possibly follows the domain we could change your line to...

                                        ^$GetDocMatchAll("^.*\.(org|net)^%Dollar%")$

                                        BTW, also a simple one-liner might perform well...

                                        ^!Replace "^.+(ru|com)(\R|\Z)" >> "" WARS

                                        Regards,
                                        Flo
                                      • diodeom
                                        ... Well, of course -- provided that you somehow know all the terms that would mark lines for deletion. I understand the objective to be removal of lines in
                                        Message 19 of 30 , Jun 29, 2011
                                        • 0 Attachment
                                          Flo wrote:
                                          >
                                          > BTW, also a simple one-liner might perform well...
                                          >
                                          > ^!Replace "^.+(ru|com)(\R|\Z)" >> "" WARS
                                          >

                                          Well, of course -- provided that you somehow know all the terms that would mark lines for deletion. I understand the objective to be removal of lines in which certain terms are absent.
                                        • Axel Berger
                                          ... Actually I have never noticed loops slowing things down. I have one file that s updated form time to time and now has 1.1 MB in 30.000 lines. It is big
                                          Message 20 of 30 , Jun 29, 2011
                                          • 0 Attachment
                                            diodeom wrote:
                                            > has many of us "speed junkies" afflicted by LAS,
                                            > the loop avoidance syndrome.

                                            Actually I have never noticed loops slowing things down.

                                            I have one file that's updated form time to time and now has 1.1 MB in
                                            30.000 lines. It is big enough for consecutive ^!Replace to become
                                            visible. The one single thing that really slows clips down are frequent
                                            (several thousand in this case) ^!InsertSelect and ^!InsertText commands
                                            over a selection.

                                            But then I have thought about how to write an editor, just as a mental
                                            exercise, and replacing one string by another of different length in the
                                            middle of a text at any speed at all tied my brains in a knot.
                                            Considering that, NoteTab's speed is quite remarkable as it is.

                                            Axel
                                          • diodeom
                                            ... A funky one-line take could be something like: ^!Replace ^((.* .(org|net).*)|.++)( R| Z)(?(2) K) WARS ... where the (?(2) K) bit is a conditional
                                            Message 21 of 30 , Jun 29, 2011
                                            • 0 Attachment
                                              I wrote:
                                              >
                                              > I understand the objective to be removal of lines in which certain
                                              > terms are absent.
                                              >

                                              A funky one-line take could be something like:

                                              ^!Replace "^((.*\.(org|net).*)|.++)(\R|\Z)(?(2)\K)" >> "" WARS

                                              ... where the "(?(2)\K)" bit is a conditional subpatern that checks if $2, that is the "(.*\.(org|net).*)" substring was captured, and if so, resets the whole capture with \K to nothing, so the empty replacement leaves this line intact. When $2 isn't captured, the selection remains ready for its subsequent wipe-out.
                                            • diodeom
                                              ... I have to admit that I didn t consider the condition where the keeper terms are always at the line s end -- despite the provided sample data. Your
                                              Message 22 of 30 , Jun 30, 2011
                                              • 0 Attachment
                                                Flo wrote:
                                                >
                                                > BTW, also a simple one-liner might perform well...
                                                >
                                                > ^!Replace "^.+(ru|com)(\R|\Z)" >> "" WARS
                                                >

                                                I have to admit that I didn't consider the condition where the "keeper" terms are always at the line's end -- despite the provided sample data. Your look-behind is a beautifully simple solution for this case.
                                              • diodeom
                                                ... Sorry, Flo -- I quoted the wrong fragment. Here s your pattern I m referring to, placed in a swap statement: ^!Replace ^.+$(?
                                                Message 23 of 30 , Jun 30, 2011
                                                • 0 Attachment
                                                  I wrote:
                                                  >
                                                  > I have to admit that I didn't consider the condition where the "keeper" terms are always at the line's end -- despite the provided sample data. Your look-behind is a beautifully simple solution for this case.
                                                  >

                                                  Sorry, Flo -- I quoted the wrong fragment. Here's your pattern I'm referring to, placed in a swap statement:

                                                  ^!Replace "^.+$(?<!org|net)(\R|\Z)" >> "" WARS
                                                • Don
                                                  ... To be clearer ;-) I only provided a sample, it might be much different and I don t know the data going in -- so I literally only know what I want, not what
                                                  Message 24 of 30 , Jun 30, 2011
                                                  • 0 Attachment
                                                    On 6/30/2011 7:49 AM, diodeom wrote:
                                                    > I wrote:
                                                    >>
                                                    >> I have to admit that I didn't consider the condition where the "keeper" terms are always at the line's end -- despite the provided sample data. Your look-behind is a beautifully simple solution for this case.
                                                    >>
                                                    >
                                                    > Sorry, Flo -- I quoted the wrong fragment. Here's your pattern I'm referring to, placed in a swap statement:
                                                    >
                                                    > ^!Replace "^.+$(?<!org|net)(\R|\Z)" >> "" WARS

                                                    To be clearer ;-)

                                                    I only provided a sample, it might be much different and I don't know
                                                    the data going in -- so I literally only know what I want, not what I
                                                    don't want. There are hundreds of lines if not thousands sometimes
                                                    where I use this. Often it is for race results of running races where I
                                                    want to extract a particular team. So we cannot count on \R immediately
                                                    after the search term. I assume \Z is file end? Have to look that one
                                                    up. So it may appear ANYWHERE IN THE LINE, not only at the end.

                                                    I misplaced Flo's email before I responded to it. I'll dig it out again
                                                    later as it did have a lot of good stuff in it.
                                                  • flo.gehrke
                                                    ... Thanks, diodeom! We have often been asked to explain such patterns to members who are less acquainted with RegEx. So let me append...
                                                    Message 25 of 30 , Jun 30, 2011
                                                    • 0 Attachment
                                                      --- In ntb-clips@yahoogroups.com, "diodeom" <diomir@...> wrote:
                                                      >
                                                      > Sorry, Flo -- I quoted the wrong fragment. Here's your pattern I'm referring to, placed in a swap statement:
                                                      >
                                                      > ^!Replace "^.+$(?<!org|net)(\R|\Z)" >> "" WARS

                                                      Thanks, diodeom! We have often been asked to explain such patterns to members who are less acquainted with RegEx. So let me append...

                                                      ^.+$(?<!org|net)(\R|\Z)

                                                      ^ = assertion matching at the start of line
                                                      .+ = one or more characters of any type (except NL)
                                                      $ = end of line

                                                      When arriving at the end of line the RegEx Engine tests...

                                                      (?<!org|net) = Negative Lookbehind Assertion matching a position where you do NOT see 'org' or 'net' when looking behind

                                                      (\R|\Z) = alternation matching a CRNL or the end of string

                                                      In this case, we have genuine negative search criteria in the sense of Don's original question ("Removing lines not containing something"). So to speak, the RegEx is able "to find something that is not there" ;-)

                                                      Regards,
                                                      Flo
                                                    • diodeom
                                                      ... Life would be perfect if lookbehinds could accept variable-length patterns... To not capture the term located anywhere in the line (e.g. in John
                                                      Message 26 of 30 , Jun 30, 2011
                                                      • 0 Attachment
                                                        Flo wrote:
                                                        >
                                                        > (?<!org|net) = Negative Lookbehind Assertion matching a position where you do NOT see 'org' or 'net' when looking behind
                                                        >

                                                        Life would be perfect if lookbehinds could accept variable-length patterns...

                                                        To "not capture" the term located anywhere in the line (e.g. in "John Doe;john@...;555-555-5555") a lookahead could offer the necessary flexibility:

                                                        ^!Replace "^(?!.*(org|net).*$).+(\R|\Z)" >> "" WARS
                                                      • flo.gehrke
                                                        ... Hi Don, You gave us two conditions now... ... and ... Sorry, that s a little bit inconsistent, isn t it? Never mind! There s certainly a way to resolve
                                                        Message 27 of 30 , Jun 30, 2011
                                                        • 0 Attachment
                                                          --- In ntb-clips@yahoogroups.com, Don <don@...> wrote:

                                                          > To be clearer ;-)
                                                          >
                                                          > I only provided a sample, it might be much different and I
                                                          > don't know the data going in...

                                                          Hi Don,

                                                          You gave us two conditions now...

                                                          > I want to essentially ask for data that I do not want to find
                                                          > in a line... (#21836)

                                                          and

                                                          > ...so I literally only know what I want, not what I
                                                          > don't want. (#21850).

                                                          Sorry, that's a little bit inconsistent, isn't it?

                                                          Never mind! There's certainly a way to resolve your task -- even if it differs from the sample data in your first message and if your search criteria have to match "anywhere in the line".

                                                          You know that it would be helpful to see some more sample data...

                                                          Regards,
                                                          Flo
                                                        • diodeom
                                                          ... Don s justly universal anywhere in the line intent is probably most apparent in his outline .*[^%DataStuff%].* r n (which I followed with the same .*
                                                          Message 28 of 30 , Jun 30, 2011
                                                          • 0 Attachment
                                                            Flo wrote:
                                                            >
                                                            > There's certainly a way to resolve your task -- even if it differs from the sample data in your first message and if your search criteria have to match "anywhere in the line".
                                                            >

                                                            Don's justly universal "anywhere in the line" intent is probably most apparent in his outline ".*[^%DataStuff%].*\r\n" (which I followed with the same .* accommodations in each offered solution).
                                                          • Don
                                                            ... You may be right -- this is hard to explain. I guess ... I think ... maybe ... what I want is to say in the end leave all lines that contain one or more
                                                            Message 29 of 30 , Jun 30, 2011
                                                            • 0 Attachment
                                                              > Hi Don,
                                                              >
                                                              > You gave us two conditions now...
                                                              >
                                                              >> I want to essentially ask for data that I do not want to find
                                                              >> in a line... (#21836)
                                                              >
                                                              > and
                                                              >
                                                              >> ...so I literally only know what I want, not what I
                                                              >> don't want. (#21850).

                                                              You may be right -- this is hard to explain. I guess ... I think ...
                                                              maybe ... what I want is to say in the end leave all lines that contain
                                                              one or more pieces of data, and delete/remove all other lines.

                                                              I tried to come up with an easy example, but didn't mean to limit the
                                                              project to that particular set of data. I hope this can be universal.

                                                              I think we are getting really close.

                                                              In fact the three line clip I posted yesterday or so does what I want
                                                              ... I think.

                                                              But this negative and positive look about stuff is interesting and worth
                                                              discussion and my learning even more.

                                                              Your look behind works only if it is at the end of the line.

                                                              I will give some examples:
                                                              http://michigancrosscountry.com/wp-content/uploads/Region-1-1-Boys.txt

                                                              Say I want everyone listed from Grand Blanc and Alpena ...

                                                              so I use:
                                                              ^!Replace "^(?!.*(Grand Blanc|Alpena).*$).+(\R|\Z)" >> "" WARS

                                                              1 Omar Kaddurah 12 Grand Blanc 15:37.28 1
                                                              6 Drake Carr 12 Grand Blanc 16:18.58 6
                                                              8 Zachary Kughn 11 Grand Blanc 16:25.06 8
                                                              13 Jalen Payne 12 Grand Blanc 16:40.68 13
                                                              22 Scott Baughan 12 Grand Blanc 16:50.48 22
                                                              23 Nicholas Lefler 12 Grand Blanc 16:51.02 23
                                                              25 Carson Truesdell 10 Grand Blanc 16:56.10 25
                                                              33 Ethan Crowell 10 Alpena 17:10.37 33
                                                              46 R.J. Centala 9 Alpena 17:22.42 46
                                                              50 Jared Labarge 11 Alpena 17:36.46 50
                                                              51 Travis LaCross 11 Alpena 17:37.60 51
                                                              53 Jacob Benson 10 Alpena 17:40.54 53
                                                              69 Alexander Guzman 11 Alpena 18:14.48 69
                                                              71 Nathan LaBarge 12 Alpena 18:17.02 71



                                                              1 Grand Blanc 50 1 6 8 13 22 23 25
                                                              10 Alpena 233 33 46 50 51 53 69 71


                                                              Answer is I believe correct.

                                                              So now I make it a two liner:
                                                              ^!Set %DataTested%=^?{RegEx Term to Search For, Pipe Separated "or"}
                                                              ^!Replace "^(?!.*(^%DataTested).*$).+(\R|\Z)" >> "" WARS

                                                              Now what if Grand Blanc is the first or last thing on a line ....?

                                                              Seems to still work:
                                                              1 Omar Kaddurah 12 Grand Blanc 15:37.28 1
                                                              6 Drake Carr 12 Grand Blanc 16:18.58 6
                                                              Grand Blanc 8 Zachary Kughn 11 Grand Blanc
                                                              16:25.06 8
                                                              13 Jalen Payne 12 Grand Blanc
                                                              22 Scott Baughan 12 Grand Blanc 16:50.48 22
                                                              [deleted rest]

                                                              I think I now have a universal "delete all lines not containing [fill in
                                                              the blank using regex terms]" clip.
                                                              It has not keyboard commands and appears to be blinding fast.
                                                            • flo.gehrke
                                                              ... Yes, Don, that s the way it works because it was adapted to the data in your first message... ... where the strings in question are positioned at the end
                                                              Message 30 of 30 , Jun 30, 2011
                                                              • 0 Attachment
                                                                --- In ntb-clips@yahoogroups.com, Don <don@...> wrote:

                                                                > Your look behind works only if it is at the end of the line.

                                                                Yes, Don, that's the way it works because it was adapted to the data in your first message...

                                                                > Input:
                                                                > john@...
                                                                > jane@...
                                                                > jeff@...
                                                                > fred@...

                                                                where the strings in question are positioned at the end of line.

                                                                Different from these data, now the substrings in question ('Grand Blanc' or 'Alpena') are not to be found at the end of line but on any position in line. So the command to delete all lines that do NOT contain 'Grand Blanc' or 'Alpena' can be a little bit shorter...

                                                                ^!Replace "^(?!.*(Grand Blanc|Alpena)).*(\R|\Z)" >> "" WARS

                                                                However, your latest information shows that this job is more based on *positive* criteria (find 'Grand Blanc' or 'Alpena') than on *negative* criteria (exclude 'com' or 'ru' in your first message). So, in this case, it probably is of no advantage to work with a Lookaround. Maybe it will suffice just to run something like...

                                                                ^!SetClipboard ^$GetDocListAll("^.*(Grand Blanc|Alpena).*(\R|\Z)";$0)$
                                                                ^!Toolbar New Document
                                                                ^$GetClipboard$

                                                                If you want to overwrite the original list you could try...

                                                                ^!Select All
                                                                ^$GetDocListAll("^.*(Grand Blanc|Alpena).*(\R|\Z)";$0)$

                                                                Regards,
                                                                Flo
                                                              Your message has been successfully submitted and would be delivered to recipients shortly.