Loading ...
Sorry, an error occurred while loading the content.

Re: [Clip] regex to delete all lines not containing

Expand Messages
  • Don
    this works fine on 500 and 1000 lines but not on 1500 lines it removes all lines not containing something which can be alternate words (since it s regex) so
    Message 1 of 9 , Sep 3, 2011
    • 0 Attachment
      this works fine on 500 and 1000 lines but not on 1500 lines

      it removes all lines not containing something which can be alternate
      words (since it's regex) so there must be a limitation

      I used to do it with a several loop clip.

      On 9/3/2011 10:54 AM, John Shotsky wrote:
      > There is no file size limitation. but what are you trying to do?
      > If it's what I think, I'd probably just use a replace to tag the beginnings of lines with a special character that DO
      > contain what you want, then delete all the others. Then remove the tag.
      > Three lines of fast running code.
      >
      > Regards,
      > John
      >
      > From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Don
      > Sent: Saturday, September 03, 2011 07:46
      > To: ntb-clips@yahoogroups.com
      > Subject: [Clip] regex to delete all lines not containing
      >
      >
      > Suddenly not working ... thought it did.
      >
      > File has 1300 lines, is there a file size max on this working?
      >
      > ^!Set %DataTested%=^?{RegEx Term to Search For, Pipe Separated "or"}
      > ^!Select All
      > ^!SetListDelimiter ^P
      > ^!Set %DataOutput%="^$GetDocMatchAll("^.*(^%DataTested%).*$")$"
      > ^!InsertText ^%DataOutput%
      >
      >
    • John Shotsky
      I have a 900K regex library, and I run it on 100,000 line files with no such limitations. I suspect there is something else going on – computer memory, swap
      Message 2 of 9 , Sep 3, 2011
      • 0 Attachment
        I have a 900K regex library, and I run it on 100,000 line files with no such limitations. I suspect there is something
        else going on � computer memory, swap file, unexpected stuff in the file, etc. Since you appear to not want to provide
        an example, all we can do is guess.

        Regards,
        John

        From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Don
        Sent: Saturday, September 03, 2011 10:14
        To: ntb-clips@yahoogroups.com
        Subject: Re: [Clip] regex to delete all lines not containing


        this works fine on 500 and 1000 lines but not on 1500 lines

        it removes all lines not containing something which can be alternate
        words (since it's regex) so there must be a limitation

        I used to do it with a several loop clip.

        On 9/3/2011 10:54 AM, John Shotsky wrote:
        > There is no file size limitation. but what are you trying to do?
        > If it's what I think, I'd probably just use a replace to tag the beginnings of lines with a special character that DO
        > contain what you want, then delete all the others. Then remove the tag.
        > Three lines of fast running code.
        >
        > Regards,
        > John
        >
        > From: ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com> [mailto:ntb-clips@yahoogroups.com
        <mailto:ntb-clips%40yahoogroups.com> ] On Behalf Of Don
        > Sent: Saturday, September 03, 2011 07:46
        > To: ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com>
        > Subject: [Clip] regex to delete all lines not containing
        >
        >
        > Suddenly not working ... thought it did.
        >
        > File has 1300 lines, is there a file size max on this working?
        >
        > ^!Set %DataTested%=^?{RegEx Term to Search For, Pipe Separated "or"}
        > ^!Select All
        > ^!SetListDelimiter ^P
        > ^!Set %DataOutput%="^$GetDocMatchAll("^.*(^%DataTested%).*$")$"
        > ^!InsertText ^%DataOutput%
        >
        >



        [Non-text portions of this message have been removed]
      • Axel Berger
        ... Don saves all his lines in an array. 1300 lines at, say, 60 characters makes 78000 bytes. This smells like a 64 kB array limit. Axel
        Message 3 of 9 , Sep 3, 2011
        • 0 Attachment
          John Shotsky wrote:
          > and I run it on 100,000 line files with no such limitations.

          Don saves all his lines in an array. 1300 lines at, say, 60 characters
          makes 78000 bytes. This smells like a 64 kB array limit.

          Axel
        • flo.gehrke
          ... Don, For me, your clip works fine. But occasionally we ve experienced that ^$GetDocMatchAll$ gets into trouble with the $ sign. In this case, you better
          Message 4 of 9 , Sep 3, 2011
          • 0 Attachment
            --- In ntb-clips@yahoogroups.com, Don <don@...> wrote:
            >
            > Suddenly not working ... thought it did.
            >
            > File has 1300 lines, is there a file size max on this working?
            >
            > ^!Set %DataTested%=^?{RegEx Term to Search For, Pipe Separated "or"}
            > ^!Select All
            > ^!SetListDelimiter ^P
            > ^!Set %DataOutput%="^$GetDocMatchAll("^.*(^%DataTested%).*$")$"
            > ^!InsertText ^%DataOutput%
            >

            Don,

            For me, your clip works fine. But occasionally we've experienced that ^$GetDocMatchAll$ gets into trouble with the '$' sign. In this case, you better replace '$' with ^%Dollar%...

            ^!Set %DataOutput%="^$GetDocMatchAll("^.*(^%DataTested%).*^%Dollar%")$"

            (cf the P.S. in message #18321 of Sep 6, 2008).

            On the other hand, '^$GetDocListAll' has proved to be more reliable...

            ^!Set %DataOutput%="^$GetDocListAll("^.*(^%DataTested%).*$";"$0\r\n")$"

            (no '^!SetListDelimiter' needed here).

            Regards,
            Flo
          • flo.gehrke
            ... John, Complicated, isn t it?. Why don t you write... ^!Set %Del%=^?{Remove lines not containing:} ^!Replace ^(?!.*(?:^%Del%) b).*( R| Z) WARS Watch
            Message 5 of 9 , Sep 4, 2011
            • 0 Attachment
              --- In ntb-clips@yahoogroups.com, "John Shotsky" <jshotsky@...> wrote:

              > I'd probably just use a replace to tag the beginnings of lines with
              > a special character that DO contain what you want, then delete all
              > the others. Then remove the tag.

              John,

              Complicated, isn't it?. Why don't you write...

              ^!Set %Del%=^?{Remove lines not containing:}
              ^!Replace "^(?!.*(?:^%Del%)\b).*(\R|\Z)" >> "" WARS

              Watch the '\b' -- it makes sure that, for example, 'Alfred' is matched but not 'Alfredo'.

              > There is no file size limitation.

              Agreed! (I tested it with 50,000 lines.)

              Regards,
              Flo
            • Don
              Okay, so I switched out to yesterday s suggestion of GetDocListAll. Seems to work. As to the sample file it was suggested that I seemed unwilling to provide a
              Message 6 of 9 , Sep 4, 2011
              • 0 Attachment
                Okay, so I switched out to yesterday's suggestion of GetDocListAll.
                Seems to work.

                As to the sample file it was suggested that I seemed unwilling to
                provide a sample. Quite the opposite, I was a bunch of delimited text
                ... nothing special about it so I didn't see a need to send a sample.

                We found solutions.

                I will say that I typically am cleaning up team results so team names
                are unique and so the \b in today's suggestion would not typically come
                into play for me, but good to have there. What does the ?! at the
                beginning do however? Negative look ahead? I don't get look aheads
                fully just yet, when to use them, what to do with them.

                I appreciate those that helped. Should it fail again of course I'll
                write again and let your wise minds and willing spirits come back into play.

                On 9/4/2011 7:45 AM, flo.gehrke wrote:
                > --- In ntb-clips@yahoogroups.com, "John Shotsky" <jshotsky@...> wrote:
                >
                >> I'd probably just use a replace to tag the beginnings of lines with
                >> a special character that DO contain what you want, then delete all
                >> the others. Then remove the tag.
                >
                > John,
                >
                > Complicated, isn't it?. Why don't you write...
                >
                > ^!Set %Del%=^?{Remove lines not containing:}
                > ^!Replace "^(?!.*(?:^%Del%)\b).*(\R|\Z)" >> "" WARS
                >
                > Watch the '\b' -- it makes sure that, for example, 'Alfred' is matched but not 'Alfredo'.
                >
                >> There is no file size limitation.
                >
                > Agreed! (I tested it with 50,000 lines.)
                >
                > Regards,
                > Flo
                >
                >
                >
                >
                > ------------------------------------
                >
                > Fookes Software: http://www.fookes.com/
                > NoteTab website: http://www.notetab.com/
                > NoteTab Discussion Lists: http://www.notetab.com/groups.php
                >
                > ***
                > Yahoo! Groups Links
                >
                >
                >
                >
              • flo.gehrke
                ... It says: Find a line where - beginning at the start of line ( ^ ) - you do NOT see the search string (^%Del%) from any position when looking ahead. The
                Message 7 of 9 , Sep 5, 2011
                • 0 Attachment
                  --- In ntb-clips@yahoogroups.com, Don <don@...> wrote:

                  >> ^!Set %Del%=^?{Remove lines not containing:}
                  >> ^!Replace "^(?!.*(?:^%Del%)\b).*(\R|\Z)" >> "" WARS

                  > What does the ?! at the beginning do however?
                  > Negative look ahead? I don't get look aheads
                  > fully just yet, when to use them, what to do with them.

                  It says: Find a line where - beginning at the start of line ('^') - you do NOT see the search string (^%Del%) from any position when looking ahead. The search string may be preceded or followed by any character 0 or more times ('.*'). If this true, replace that line including a CRNL with an empty string (i.e. delete it). With '\Z', it also matches at the end of the subject string where no CRNL follows.

                  Regards,
                  Flo
                Your message has been successfully submitted and would be delivered to recipients shortly.