Loading ...
Sorry, an error occurred while loading the content.

A little help on look behinds

Expand Messages
  • John Shotsky
    I always have trouble making these things work. I enclose groups of text with [[[[group of text]]]] with up to thousands of instances. The very first one may
    Message 1 of 21 , Oct 6, 2011
    • 0 Attachment
      I always have trouble making these things work.

      I enclose groups of text with [[[[group of text]]]] with up to thousands of instances. The very first one may be missing
      the [[[ part. That's the only one.

      I want to delete the ]]] that is not preceded by a [[[ + a group of text that exists. It can always be found as
      happening after the start of the file (/A) and before any [[[. I apparently don't know how to apply multiline to a look
      behind to detect this situation.

      An example:

      This is the first line in the file.

      This is the second line in the file.

      ]]] this is the line I want removed.

      [[[ this is a valid line
      any number of lines here are valid lines
      ]]] this is a valid line.

      The only one to remove is the first one.

      Any ideas? This has me stumped. I know how to do it the hard way, by permitting sentences that don't start with ] after
      the file beginning, but I know there is a better way. Besides, I don't always know how many lines may exist between the
      top of the file and the first ]]] that is not preceded by a [[[.

      Thanks,
      John
    • Alec Burgess
      Hi John: This appears to work from Replace dialog - or in a clip Find= A[^ []* K ]{3}.*? R Replace= Note - the K throws away everything before it (ie.
      Message 2 of 21 , Oct 6, 2011
      • 0 Attachment
        Hi John:
        This appears to work from Replace dialog - or in a clip
        Find=\A[^\[]*\K\]{3}.*?\R
        Replace=<empty>

        Note - the \K throws away everything before it (ie. beginning of buffer
        thru any length string of characters NOT containing and "[" then
        matches the first [[[[ and remainder up to first linefeed. This gets
        around the normal lookbehind requirement that it be of fixed length.

        Note - another way to do would be match both parts but discard the
        second (replacing both by just the first) (not tested)
        Find=(\A[^\[]*)(\]{3}.*?\R)
        Replacement=$1

        --
        Regards ... Alec (buralex@gmail & WinLiveMess - alec.m.burgess@skype)

        On 2011-10-06 23:01, John Shotsky wrote:
        > I always have trouble making these things work.
        >
        > I enclose groups of text with [[[[group of text]]]] with up to thousands of instances. The very first one may be missing
        > the [[[ part. That's the only one.
        >
        > I want to delete the ]]] that is not preceded by a [[[ + a group of text that exists. It can always be found as
        > happening after the start of the file (/A) and before any [[[. I apparently don't know how to apply multiline to a look
        > behind to detect this situation.
        >
        > An example:
        >
        > This is the first line in the file.
        >
        > This is the second line in the file.
        >
        > ]]] this is the line I want removed.
        >
        > [[[ this is a valid line
        > any number of lines here are valid lines
        > ]]] this is a valid line.
        >
        > The only one to remove is the first one.
        >
        > Any ideas? This has me stumped. I know how to do it the hard way, by permitting sentences that don't start with ] after
        > the file beginning, but I know there is a better way. Besides, I don't always know how many lines may exist between the
        > top of the file and the first ]]] that is not preceded by a [[[.
      • Sheri
        In case there might be any valid opening brackets in the text prior to the ]]] line that needs deleting: ^!Replace
        Message 3 of 21 , Oct 6, 2011
        • 0 Attachment
          In case there might be any valid opening brackets in the text prior to
          the ]]] line that needs deleting:

          ^!Replace "(?s)\A.+?\R\K(?=\]\]\]|\[\[\[)(*COMMIT)(?-s)\]\]\].*\R" >> ""
          WARS

          Regards,
          Sheri
        • John Shotsky
          As a matter of fact, both opening and closing may appear between A and the wanted removal. Just not multiples. I think this does it, thank you! I have never
          Message 4 of 21 , Oct 7, 2011
          • 0 Attachment
            As a matter of fact, both opening and closing may appear between \A and the wanted removal. Just not multiples. I think
            this does it, thank you! I have never used *COMMIT before, so I'll go see what that does.

            Regards,
            John


            -----Original Message-----
            From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Sheri
            Sent: Thursday, October 06, 2011 23:01
            To: ntb-clips@yahoogroups.com
            Subject: Re: [Clip] A little help on look behinds

            In case there might be any valid opening brackets in the text prior to
            the ]]] line that needs deleting:

            ^!Replace "(?s)\A.+?\R\K(?=\]\]\]|\[\[\[)(*COMMIT)(?-s)\]\]\].*\R" >> ""
            WARS

            Regards,
            Sheri


            ------------------------------------

            Fookes Software: http://www.fookes.com/
            NoteTab website: http://www.notetab.com/
            NoteTab Discussion Lists: http://www.notetab.com/groups.php

            ***
            Yahoo! Groups Links
          • John Shotsky
            It turns out that this does not do as needed. For reference, there may be, at most, one instance where ]]] is not preceded anywhere in the file by a [[[. That
            Message 5 of 21 , Oct 15, 2011
            • 0 Attachment
              It turns out that this does not do as needed. For reference, there may be, at most, one instance where ]]] is not
              preceded anywhere in the file by a [[[. That is the only ]]] that should be removed. It is always the first one. So in
              essence it is
              ;Start at beginning of file \A
              ;no instances of ([[[) permitted in search
              ;one instance of (]]]) gets removed, if present. There may be lots of text and various [] preceding, but not multiples
              like [[[ ]]].

              Sample:
              This is the beginning of the file. There may be paragraphs here that include [ and or ] but not in multiples. This text
              is to remain. The next line is to disappear.
              ]]]
              [[[
              some text
              ]]]
              [[[
              some more text
              ]]]

              Only the first ]]] needs to be removed.
              Regards,
              John


              -----Original Message-----
              From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of John Shotsky
              Sent: Friday, October 07, 2011 04:00
              To: ntb-clips@yahoogroups.com
              Subject: RE: [Clip] A little help on look behinds

              As a matter of fact, both opening and closing may appear between \A and the wanted removal. Just not multiples. I think
              this does it, thank you! I have never used *COMMIT before, so I'll go see what that does.

              Regards,
              John


              -----Original Message-----
              From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Sheri
              Sent: Thursday, October 06, 2011 23:01
              To: ntb-clips@yahoogroups.com
              Subject: Re: [Clip] A little help on look behinds

              In case there might be any valid opening brackets in the text prior to
              the ]]] line that needs deleting:

              ^!Replace "(?s)\A.+?\R\K(?=\]\]\]|\[\[\[)(*COMMIT)(?-s)\]\]\].*\R" >> ""
              WARS

              Regards,
              Sheri


              ------------------------------------

              Fookes Software: http://www.fookes.com/
              NoteTab website: http://www.notetab.com/
              NoteTab Discussion Lists: http://www.notetab.com/groups.php

              ***
              Yahoo! Groups Links





              ------------------------------------

              Fookes Software: http://www.fookes.com/
              NoteTab website: http://www.notetab.com/
              NoteTab Discussion Lists: http://www.notetab.com/groups.php

              ***
              Yahoo! Groups Links
            • Sheri
              I m using 6.2 and it works on your sample. You might want to add a prompt to tell you when its done. It will never remove more than one, but I suppose it would
              Message 6 of 21 , Oct 15, 2011
              • 0 Attachment
                I'm using 6.2 and it works on your sample. You might want to add a
                prompt to tell you when its done.

                It will never remove more than one, but I suppose it would be more
                efficient to omit the "A" option. Same difference in result without it,
                but after the single replacement, it doesn't need to travel through the
                rest of the document futilely looking for another \A

                ^!Replace "(?s)\A.+?\R\K(?=\]\]\]|\[\[\[)(*COMMIT)(?-s)\]\]\].*\R">> "" WRS

                It is requiring the ]]] or [[[ to be at the start of a line (and cannot be the first line) as that was my understanding from this and previous samples. So even if your entire message with the explanatory paragraph above the sample is in the document, only the one line in the sample portion is removed.

                To be clear, here is what the pattern does:
                consumes all characters from the very beginning of the (Whole) document until it sees ahead a line that begins with ]]] or [[[

                At that point the rest of the pattern must match or there is no match and no backtrack. The critical rest of the pattern is ]]] plus any characters up to the end of the line plus a line break.

                If there is match, the \K excludes all the text from the beginning of the text up to the beginning of the ]]] line from being part of the text that gets deleted.

                There can only ever be one match anchored to the start of the text \A (even with All option).

                If there was another ]]] line before a [[[ one, you would have to run the clip line again to get it (even if you had the "A" (replace All) option.

                Regards,
                Sheri
              • John Shotsky
                Ok, thanks. I don t know why, but this one works fine, and the first one didn t. I had already done the bulldozer approach of tagging all the matched ]]]
                Message 7 of 21 , Oct 15, 2011
                • 0 Attachment
                  Ok, thanks. I don't know why, but this one works fine, and the first one didn't. I had already done the bulldozer
                  approach of tagging all the matched ]]] first, then deleting any not tagged, then removing the tags. this is exactly
                  what I wanted, and will be used in other areas as well. Thanks for the explanation; too, I really like to understand how
                  new things work.

                  I did add extra ]]]'s to the top of the file, and also mid-document to see if those would be removed. All worked as
                  expected/wanted. Perfect.
                  Regards,
                  John


                  -----Original Message-----
                  From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Sheri
                  Sent: Saturday, October 15, 2011 10:53
                  To: ntb-clips@yahoogroups.com
                  Subject: Re: [Clip] A little help on look behinds

                  I'm using 6.2 and it works on your sample. You might want to add a
                  prompt to tell you when its done.

                  It will never remove more than one, but I suppose it would be more
                  efficient to omit the "A" option. Same difference in result without it,
                  but after the single replacement, it doesn't need to travel through the
                  rest of the document futilely looking for another \A

                  ^!Replace "(?s)\A.+?\R\K(?=\]\]\]|\[\[\[)(*COMMIT)(?-s)\]\]\].*\R">> "" WRS

                  It is requiring the ]]] or [[[ to be at the start of a line (and cannot be the first line) as that was my understanding
                  from this and previous samples. So even if your entire message with the explanatory paragraph above the sample is in the
                  document, only the one line in the sample portion is removed.

                  To be clear, here is what the pattern does:
                  consumes all characters from the very beginning of the (Whole) document until it sees ahead a line that begins with ]]]
                  or [[[

                  At that point the rest of the pattern must match or there is no match and no backtrack. The critical rest of the pattern
                  is ]]] plus any characters up to the end of the line plus a line break.

                  If there is match, the \K excludes all the text from the beginning of the text up to the beginning of the ]]] line from
                  being part of the text that gets deleted.

                  There can only ever be one match anchored to the start of the text \A (even with All option).

                  If there was another ]]] line before a [[[ one, you would have to run the clip line again to get it (even if you had the
                  "A" (replace All) option.

                  Regards,
                  Sheri





                  ------------------------------------

                  Fookes Software: http://www.fookes.com/
                  NoteTab website: http://www.notetab.com/
                  NoteTab Discussion Lists: http://www.notetab.com/groups.php

                  ***
                  Yahoo! Groups Links
                • Alec Burgess
                  ... Sheri - what does the (*COMMIT) in the above regexp do? (And if possible a link to where it is in PCRE documentation) -- Regards ... Alec (buralex@gmail &
                  Message 8 of 21 , Oct 15, 2011
                  • 0 Attachment
                    On 2011-10-15 13:52, Sheri wrote:
                    > I'm using 6.2 and it works on your sample. You might want to add a
                    > prompt to tell you when its done.
                    >
                    > It will never remove more than one, but I suppose it would be more
                    > efficient to omit the "A" option. Same difference in result without it,
                    > but after the single replacement, it doesn't need to travel through the
                    > rest of the document futilely looking for another \A
                    >
                    > ^!Replace "(?s)\A.+?\R\K(?=\]\]\]|\[\[\[)(*COMMIT)(?-s)\]\]\].*\R">> "" WRS
                    >
                    > It is requiring the ]]] or [[[ to be at the start of a line (and cannot be the first line) as that was my understanding from this and previous samples. So even if your entire message with the explanatory paragraph above the sample is in the document, only the one line in the sample portion is removed.
                    >
                    > To be clear, here is what the pattern does:
                    > consumes all characters from the very beginning of the (Whole) document until it sees ahead a line that begins with ]]] or [[[
                    >
                    > At that point the rest of the pattern must match or there is no match and no backtrack. The critical rest of the pattern is ]]] plus any characters up to the end of the line plus a line break.
                    >
                    > If there is match, the \K excludes all the text from the beginning of the text up to the beginning of the ]]] line from being part of the text that gets deleted.
                    >
                    > There can only ever be one match anchored to the start of the text \A (even with All option).
                    >
                    > If there was another ]]] line before a [[[ one, you would have to run the clip line again to get it (even if you had the "A" (replace All) option.
                    Sheri - what does the (*COMMIT) in the above regexp do? (And if possible
                    a link to where it is in PCRE documentation)

                    --
                    Regards ... Alec (buralex@gmail & WinLiveMess - alec.m.burgess@skype)
                  • Sheri
                    ... Hi Alec, (*COMMIT) says the rest of the pattern must match from here without backtracking. Its in the regex.chm help file in the section on Backtracking
                    Message 9 of 21 , Oct 15, 2011
                    • 0 Attachment
                      On 10/15/2011 4:26 PM, Alec Burgess wrote:
                      >
                      > On 2011-10-15 13:52, Sheri wrote:
                      >> I'm using 6.2 and it works on your sample. You might want to add a
                      >> prompt to tell you when its done.
                      >>
                      >> It will never remove more than one, but I suppose it would be more
                      >> efficient to omit the "A" option. Same difference in result without it,
                      >> but after the single replacement, it doesn't need to travel through the
                      >> rest of the document futilely looking for another \A
                      >>
                      >> ^!Replace "(?s)\A.+?\R\K(?=\]\]\]|\[\[\[)(*COMMIT)(?-s)\]\]\].*\R">> "" WRS
                      >>
                      >> It is requiring the ]]] or [[[ to be at the start of a line (and cannot be the first line) as that was my understanding from this and previous samples. So even if your entire message with the explanatory paragraph above the sample is in the document, only the one line in the sample portion is removed.
                      >>
                      >> To be clear, here is what the pattern does:
                      >> consumes all characters from the very beginning of the (Whole) document until it sees ahead a line that begins with ]]] or [[[
                      >>
                      >> At that point the rest of the pattern must match or there is no match and no backtrack. The critical rest of the pattern is ]]] plus any characters up to the end of the line plus a line break.
                      >>
                      >> If there is match, the \K excludes all the text from the beginning of the text up to the beginning of the ]]] line from being part of the text that gets deleted.
                      >>
                      >> There can only ever be one match anchored to the start of the text \A (even with All option).
                      >>
                      >> If there was another ]]] line before a [[[ one, you would have to run the clip line again to get it (even if you had the "A" (replace All) option.
                      > Sheri - what does the (*COMMIT) in the above regexp do? (And if possible
                      > a link to where it is in PCRE documentation)
                      Hi Alec,

                      (*COMMIT) says the rest of the pattern must match from here without
                      backtracking. Its in the regex.chm help file in the section on
                      Backtracking Control.

                      I guess you could say it creates an anchor in the middle of the pattern.

                      Regards,
                      Sheri
                    • flo.gehrke
                      ... John, I think if it s always the first ]]] in the string you could just write... ^!Replace (?s).{1,}? K ]{3} R WRS The beginning of file is
                      Message 10 of 21 , Oct 16, 2011
                      • 0 Attachment
                        --- In ntb-clips@yahoogroups.com, "John Shotsky" <jshotsky@...> wrote:
                        >
                        > ...there may be, at most, one instance where ]]] is not
                        > preceded anywhere in the file by a [[[. That is the only ]]] that
                        > should be removed. It is always the first one...

                        John,

                        I think if it's always the first ']]]' in the string you could just write...

                        ^!Replace "(?s).{1,}?\K\]{3}\R" >> "" WRS

                        The beginning of file is asserted with the 'W' option -- so there's no '\A' necessary. Without the 'A' option, only the first occurrence gets removed.

                        If "no instances of ([[[) [is] permitted in search" you could exclude that with a negative Lookahead...

                        ^!Replace "(?s)(?!\[{3}).{1,}?\K\]{3}\R" >> "" WRS

                        ...if necessary.

                        Regards,
                        Flo
                      • flo.gehrke
                        ... Sheri, I would be grateful for some more explanations about that verb (*COMMIT). I ve tested your clip...
                        Message 11 of 21 , Oct 16, 2011
                        • 0 Attachment
                          --- In ntb-clips@yahoogroups.com, Sheri <silvermoonwoman@...> wrote:
                          >
                          > (*COMMIT) says the rest of the pattern must match from here without
                          > backtracking...I guess you could say it creates an anchor in
                          > the middle of the pattern.

                          Sheri,

                          I would be grateful for some more explanations about that verb '(*COMMIT).

                          I've tested your clip...

                          (?s)\A.+?\R\K(?=\]\]\]|\[\[\[)(*COMMIT)(?-s)\]\]\].*\R

                          against the following text which is quite similar to John's first sample. For our discussion, I've added line numbers (to be removed when testing):

                          1 First line
                          2
                          3 [valid line]
                          4
                          5 ]]] remove
                          6
                          7 ]]] remove
                          8
                          9 [[[ valid line
                          10 more valid lines
                          11 ]]] valid line.
                          12
                          13 [[[ valid line

                          It's quite clear for me why the clip removes line #5 and #7 but not #9. But I still can't see why it doesn't remove line #11.

                          If we omit the '\K' we can see two matches:

                          - 1. from start of string to end of line #5

                          - 2. line #6 till end of line #7

                          Next, line #8 and #9 are not matched because line #9 doesn't start with ']]]'.

                          But WHY doesn't the clip jump over that mismatch and moves on selecting line #10 and #11? IMHO, line #10 should be matched with '(?s)\A.+?\R\K(?=\]\]\]|\[\[\[)' (with or without '\A'), and the following '(?-s)\]\]\].*\R'. Why on earth is '(*COMMIT)' preventing this?

                          Thanks for any light you can shed on this!

                          Flo
                        • John Shotsky
                          I found some fairly extensive explanations and examples of this and other such verbs in the PCRE text manual. Search for commit to find all the bits.
                          Message 12 of 21 , Oct 16, 2011
                          • 0 Attachment
                            I found some fairly extensive explanations and examples of this and other such verbs in the PCRE text manual. Search for
                            commit to find all the bits.
                            http://www.pcre.org/pcre.txt

                            Regards,
                            John

                            From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of flo.gehrke
                            Sent: Sunday, October 16, 2011 05:33
                            To: ntb-clips@yahoogroups.com
                            Subject: Re: [Clip] A little help on look behinds


                            --- In ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com> , Sheri <silvermoonwoman@...> wrote:
                            >
                            > (*COMMIT) says the rest of the pattern must match from here without
                            > backtracking...I guess you could say it creates an anchor in
                            > the middle of the pattern.

                            Sheri,

                            I would be grateful for some more explanations about that verb '(*COMMIT).

                            I've tested your clip...

                            (?s)\A.+?\R\K(?=\]\]\]|\[\[\[)(*COMMIT)(?-s)\]\]\].*\R

                            against the following text which is quite similar to John's first sample. For our discussion, I've added line numbers
                            (to be removed when testing):

                            1 First line
                            2
                            3 [valid line]
                            4
                            5 ]]] remove
                            6
                            7 ]]] remove
                            8
                            9 [[[ valid line
                            10 more valid lines
                            11 ]]] valid line.
                            12
                            13 [[[ valid line

                            It's quite clear for me why the clip removes line #5 and #7 but not #9. But I still can't see why it doesn't remove line
                            #11.

                            If we omit the '\K' we can see two matches:

                            - 1. from start of string to end of line #5

                            - 2. line #6 till end of line #7

                            Next, line #8 and #9 are not matched because line #9 doesn't start with ']]]'.

                            But WHY doesn't the clip jump over that mismatch and moves on selecting line #10 and #11? IMHO, line #10 should be
                            matched with '(?s)\A.+?\R\K(?=\]\]\]|\[\[\[)' (with or without '\A'), and the following '(?-s)\]\]\].*\R'. Why on earth
                            is '(*COMMIT)' preventing this?

                            Thanks for any light you can shed on this!

                            Flo





                            [Non-text portions of this message have been removed]
                          • John Shotsky
                            Thanks, Flo, the second example does what is needed. It s also easier for me to understand… :-) Regards, John From: ntb-clips@yahoogroups.com
                            Message 13 of 21 , Oct 16, 2011
                            • 0 Attachment
                              Thanks, Flo, the second example does what is needed. It's also easier for me to understand� :-)

                              Regards,
                              John

                              From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of flo.gehrke
                              Sent: Sunday, October 16, 2011 05:25
                              To: ntb-clips@yahoogroups.com
                              Subject: Re: [Clip] A little help on look behinds


                              --- In ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com> , "John Shotsky" <jshotsky@...> wrote:
                              >
                              > ...there may be, at most, one instance where ]]] is not
                              > preceded anywhere in the file by a [[[. That is the only ]]] that
                              > should be removed. It is always the first one...

                              John,

                              I think if it's always the first ']]]' in the string you could just write...

                              ^!Replace "(?s).{1,}?\K\]{3}\R" >> "" WRS

                              The beginning of file is asserted with the 'W' option -- so there's no '\A' necessary. Without the 'A' option, only the
                              first occurrence gets removed.

                              If "no instances of ([[[) [is] permitted in search" you could exclude that with a negative Lookahead...

                              ^!Replace "(?s)(?!\[{3}).{1,}?\K\]{3}\R" >> "" WRS

                              ...if necessary.

                              Regards,
                              Flo



                              [Non-text portions of this message have been removed]
                            • flo.gehrke
                              ... John, Thanks for that hint, but I can t find an answer there to my question. It appears noticeable that these explanations are rather poor. Maybe they pay
                              Message 14 of 21 , Oct 16, 2011
                              • 0 Attachment
                                --- In ntb-clips@yahoogroups.com, "John Shotsky" <jshotsky@...> wrote:
                                >
                                > I found some fairly extensive explanations and examples of this
                                > and other such verbs in the PCRE text manual. Search for
                                > commit to find all the bits. http://www.pcre.org/pcre.txt

                                John,

                                Thanks for that hint, but I can't find an answer there to my question.

                                It appears noticeable that these explanations are rather poor. Maybe they pay little attention to these Backtracking Control Verbs because they are regarded as "experimental" only.

                                Still in hope for an answer from Sheri or any other expert,

                                Flo
                              • diodeom
                                ... I d guess you re running this pattern in the (Ctrl+R) dialog box instead of in a clip -- where it s meant to ***capture or fail*** only once (on the very
                                Message 15 of 21 , Oct 16, 2011
                                • 0 Attachment
                                  Flo wrote:
                                  >
                                  > --- In ntb-clips@yahoogroups.com, Sheri <silvermoonwoman@> wrote:
                                  > >
                                  > > (*COMMIT) says the rest of the pattern must match from here without
                                  > > backtracking...I guess you could say it creates an anchor in
                                  > > the middle of the pattern.
                                  >
                                  > Sheri,
                                  >
                                  > I would be grateful for some more explanations about that verb '(*COMMIT).
                                  >
                                  > I've tested your clip...
                                  >
                                  > (?s)\A.+?\R\K(?=\]\]\]|\[\[\[)(*COMMIT)(?-s)\]\]\].*\R
                                  >
                                  > against the following text which is quite similar to John's first sample. For our discussion, I've added line numbers (to be removed when testing):
                                  >
                                  > 1 First line
                                  > 2
                                  > 3 [valid line]
                                  > 4
                                  > 5 ]]] remove
                                  > 6
                                  > 7 ]]] remove
                                  > 8
                                  > 9 [[[ valid line
                                  > 10 more valid lines
                                  > 11 ]]] valid line.
                                  > 12
                                  > 13 [[[ valid line
                                  >
                                  > It's quite clear for me why the clip removes line #5 and #7 but not #9. But I still can't see why it doesn't remove line #11.
                                  >
                                  > If we omit the '\K' we can see two matches:
                                  >
                                  > - 1. from start of string to end of line #5
                                  >
                                  > - 2. line #6 till end of line #7
                                  >
                                  > Next, line #8 and #9 are not matched because line #9 doesn't start with ']]]'.
                                  >
                                  > But WHY doesn't the clip jump over that mismatch and moves on selecting line #10 and #11? IMHO, line #10 should be matched with '(?s)\A.+?\R\K(?=\]\]\]|\[\[\[)' (with or without '\A'), and the following '(?-s)\]\]\].*\R'. Why on earth is '(*COMMIT)' preventing this?
                                  >
                                  > Thanks for any light you can shed on this!
                                  >


                                  I'd guess you're running this pattern in the (Ctrl+R) dialog box instead of in a clip -- where it's meant to ***capture or fail*** only once (on the very first instance of either [[[ or ]]]).

                                  If you click "Find Next" after #5 and #7, notice that your beginning position for the next attempt is on or after line #7. After the first available alternative "[[[" is spotted by the look-ahead now on line #9, (*COMMIT) demands that at this very location either "]]]" should be found or else the whole pattern should abandon any further matching attempts. Obviously, "[[[" ain't the required "]]]" so the pattern fails by design.
                                • Sheri
                                  ... Hi Dio, Flo, everyone, That seems to say it well Dio. In retrospect, this is likely sufficient: ^!Replace ^(?= Q]]] E| Q[[[ E)(*COMMIT) Q]]] E.* R
                                  Message 16 of 21 , Oct 16, 2011
                                  • 0 Attachment
                                    On 10/16/2011 2:26 PM, diodeom wrote:
                                    >
                                    >
                                    > I'd guess you're running this pattern in the (Ctrl+R) dialog box instead of in a clip -- where it's meant to ***capture or fail*** only once (on the very first instance of either [[[ or ]]]).
                                    >
                                    > If you click "Find Next" after #5 and #7, notice that your beginning position for the next attempt is on or after line #7. After the first available alternative "[[[" is spotted by the look-ahead now on line #9, (*COMMIT) demands that at this very location either "]]]" should be found or else the whole pattern should abandon any further matching attempts. Obviously, "[[[" ain't the required "]]]" so the pattern fails by design.
                                    >
                                    >
                                    >

                                    Hi Dio, Flo, everyone,

                                    That seems to say it well Dio.

                                    In retrospect, this is likely sufficient:

                                    ^!Replace "^(?=\Q]]]\E|\Q[[[\E)(*COMMIT)\Q]]]\E.*\R" >> "" WRS

                                    if the "A" (replace all) option were added, it would remove both
                                    "remove" lines from Flo's sample and quit upon seeing a line that begins
                                    with [[[

                                    I agree that the PCRE (*VERB) documentation is poor. In addition it
                                    should be noted that there have been bug fixes and enhancements to verb
                                    processing in the three PCRE updates that have occurred since the
                                    version built into NoteTab 6.2.

                                    Yet another PCRE update is pending (PCRE 8.20) and I hope Eric will take
                                    note when its available.

                                    That said, the best way to understand what the verbs do is to experiment
                                    with them.

                                    Regards,
                                    Sheri
                                  • flo.gehrke
                                    ... Thanks, diodeom! It was clear, however, why the expression fails at line #9. But the question was: Why doesn t it match line #10 and #11? Why doesn t the
                                    Message 17 of 21 , Oct 17, 2011
                                    • 0 Attachment
                                      --- In ntb-clips@yahoogroups.com, "diodeom" <diomir@...> wrote:
                                      >
                                      > I'd guess you're running this pattern in the (Ctrl+R) dialog
                                      > box instead of in a clip...

                                      Thanks, diodeom!

                                      It was clear, however, why the expression fails at line #9. But the question was: Why doesn't it match line #10 and #11? Why doesn't the engine just skip the mismatch?

                                      If we start anew from the beginning of line #10 then line #11 will be selected. But when starting from the beginning of the subject string the verb seems to nail the cursor to the beginning of line #8.

                                      Moreover, it doesn't seem to be a matter of running it in the dialog box or in a clip. For example...

                                      ^!Info ^$GetDocListAll("(?s).+?\R\K(?=\]\]\]|\[\[\[)(*COMMIT)(?-s)\]\]\].*\R";"$0\r\n")$

                                      achieves only two matches as well: line #5 and #7 (I've omitted '\A' here because it doesn't change the result no matter if used in the dialog or a clip).

                                      Well, I don't want to tax your patience too much with my slow-wittedness. Don't ask -- just be surprised! Obviously, that's the way the verb is designed to work. It prevents the engine from making any further attempt at all once it has failed at any position. I hope I've learned the lesson...

                                      Flo

                                      PS Also thanks to Sheri for her latest reply!

                                      ---

                                      > Flo wrote:
                                      > >
                                      > > --- In ntb-clips@yahoogroups.com, Sheri <silvermoonwoman@> wrote:
                                      > > >
                                      > > > (*COMMIT) says the rest of the pattern must match from here without
                                      > > > backtracking...I guess you could say it creates an anchor in
                                      > > > the middle of the pattern.
                                      > >
                                      > > Sheri,
                                      > >
                                      > > I would be grateful for some more explanations about that verb '(*COMMIT).
                                      > >
                                      > > I've tested your clip...
                                      > >
                                      > > (?s)\A.+?\R\K(?=\]\]\]|\[\[\[)(*COMMIT)(?-s)\]\]\].*\R
                                      > >
                                      > > against the following text which is quite similar to John's first sample. For our discussion, I've added line numbers (to be removed when testing):
                                      > >
                                      > > 1 First line
                                      > > 2
                                      > > 3 [valid line]
                                      > > 4
                                      > > 5 ]]] remove
                                      > > 6
                                      > > 7 ]]] remove
                                      > > 8
                                      > > 9 [[[ valid line
                                      > > 10 more valid lines
                                      > > 11 ]]] valid line.
                                      > > 12
                                      > > 13 [[[ valid line
                                      > >
                                      > > It's quite clear for me why the clip removes line #5 and #7 but not #9. But I still can't see why it doesn't remove line #11.
                                      > >
                                      > > If we omit the '\K' we can see two matches:
                                      > >
                                      > > - 1. from start of string to end of line #5
                                      > >
                                      > > - 2. line #6 till end of line #7
                                      > >
                                      > > Next, line #8 and #9 are not matched because line #9 doesn't start with ']]]'.
                                      > >
                                      > > But WHY doesn't the clip jump over that mismatch and moves on selecting line #10 and #11? IMHO, line #10 should be matched with '(?s)\A.+?\R\K(?=\]\]\]|\[\[\[)' (with or without '\A'), and the following '(?-s)\]\]\].*\R'. Why on earth is '(*COMMIT)' preventing this?
                                      > >
                                      > > Thanks for any light you can shed on this!
                                      > >
                                      >
                                      >
                                      > I'd guess you're running this pattern in the (Ctrl+R) dialog box instead of in a clip -- where it's meant to ***capture or fail*** only once (on the very first instance of either [[[ or ]]]).
                                      >
                                      > If you click "Find Next" after #5 and #7, notice that your beginning position for the next attempt is on or after line #7. After the first available alternative "[[[" is spotted by the look-ahead now on line #9, (*COMMIT) demands that at this very location either "]]]" should be found or else the whole pattern should abandon any further matching attempts. Obviously, "[[[" ain't the required "]]]" so the pattern fails by design.
                                      >
                                    • John Shotsky
                                      Flo, It turns out that your suggestion fails at times, and takes out ]]] which IS preceded by a [[[ somewhere above it in the text. If [[[ were the first thing
                                      Message 18 of 21 , Oct 17, 2011
                                      • 0 Attachment
                                        Flo,

                                        It turns out that your suggestion fails at times, and takes out ]]] which IS preceded by a [[[ somewhere above it in the
                                        text. If [[[ were the first thing in the file, it should do nothing.
                                        ^!Replace "(?s)(?!\[{3}).{1,}?\K\]{3}\R" >> "" WRS

                                        I didn't take time to troubleshoot it, as the '*COMMIT' version does not fail. I just mention it in case you want to
                                        play with it some more.

                                        Regards,
                                        John

                                        From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of flo.gehrke
                                        Sent: Monday, October 17, 2011 03:49
                                        To: ntb-clips@yahoogroups.com
                                        Subject: [Clip] Re: A little help on look behinds


                                        --- In ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com> , "diodeom" <diomir@...> wrote:
                                        >
                                        > I'd guess you're running this pattern in the (Ctrl+R) dialog
                                        > box instead of in a clip...

                                        Thanks, diodeom!

                                        It was clear, however, why the expression fails at line #9. But the question was: Why doesn't it match line #10 and #11?
                                        Why doesn't the engine just skip the mismatch?

                                        If we start anew from the beginning of line #10 then line #11 will be selected. But when starting from the beginning of
                                        the subject string the verb seems to nail the cursor to the beginning of line #8.

                                        Moreover, it doesn't seem to be a matter of running it in the dialog box or in a clip. For example...

                                        ^!Info ^$GetDocListAll("(?s).+?\R\K(?=\]\]\]|\[\[\[)(*COMMIT)(?-s)\]\]\].*\R";"$0\r\n")$

                                        achieves only two matches as well: line #5 and #7 (I've omitted '\A' here because it doesn't change the result no matter
                                        if used in the dialog or a clip).

                                        Well, I don't want to tax your patience too much with my slow-wittedness. Don't ask -- just be surprised! Obviously,
                                        that's the way the verb is designed to work. It prevents the engine from making any further attempt at all once it has
                                        failed at any position. I hope I've learned the lesson...

                                        Flo

                                        PS Also thanks to Sheri for her latest reply!

                                        ---

                                        > Flo wrote:
                                        > >
                                        > > --- In ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com> , Sheri <silvermoonwoman@> wrote:
                                        > > >
                                        > > > (*COMMIT) says the rest of the pattern must match from here without
                                        > > > backtracking...I guess you could say it creates an anchor in
                                        > > > the middle of the pattern.
                                        > >
                                        > > Sheri,
                                        > >
                                        > > I would be grateful for some more explanations about that verb '(*COMMIT).
                                        > >
                                        > > I've tested your clip...
                                        > >
                                        > > (?s)\A.+?\R\K(?=\]\]\]|\[\[\[)(*COMMIT)(?-s)\]\]\].*\R
                                        > >
                                        > > against the following text which is quite similar to John's first sample. For our discussion, I've added line
                                        numbers (to be removed when testing):
                                        > >
                                        > > 1 First line
                                        > > 2
                                        > > 3 [valid line]
                                        > > 4
                                        > > 5 ]]] remove
                                        > > 6
                                        > > 7 ]]] remove
                                        > > 8
                                        > > 9 [[[ valid line
                                        > > 10 more valid lines
                                        > > 11 ]]] valid line.
                                        > > 12
                                        > > 13 [[[ valid line
                                        > >
                                        > > It's quite clear for me why the clip removes line #5 and #7 but not #9. But I still can't see why it doesn't remove
                                        line #11.
                                        > >
                                        > > If we omit the '\K' we can see two matches:
                                        > >
                                        > > - 1. from start of string to end of line #5
                                        > >
                                        > > - 2. line #6 till end of line #7
                                        > >
                                        > > Next, line #8 and #9 are not matched because line #9 doesn't start with ']]]'.
                                        > >
                                        > > But WHY doesn't the clip jump over that mismatch and moves on selecting line #10 and #11? IMHO, line #10 should be
                                        matched with '(?s)\A.+?\R\K(?=\]\]\]|\[\[\[)' (with or without '\A'), and the following '(?-s)\]\]\].*\R'. Why on earth
                                        is '(*COMMIT)' preventing this?
                                        > >
                                        > > Thanks for any light you can shed on this!
                                        > >
                                        >
                                        >
                                        > I'd guess you're running this pattern in the (Ctrl+R) dialog box instead of in a clip -- where it's meant to
                                        ***capture or fail*** only once (on the very first instance of either [[[ or ]]]).
                                        >
                                        > If you click "Find Next" after #5 and #7, notice that your beginning position for the next attempt is on or after line
                                        #7. After the first available alternative "[[[" is spotted by the look-ahead now on line #9, (*COMMIT) demands that at
                                        this very location either "]]]" should be found or else the whole pattern should abandon any further matching attempts.
                                        Obviously, "[[[" ain't the required "]]]" so the pattern fails by design.
                                        >



                                        [Non-text portions of this message have been removed]
                                      • Sheri
                                        Flo, do you remember the G? I think (*COMMIT) is like that, except the match position within the subject is established dynamically after matching what s
                                        Message 19 of 21 , Oct 17, 2011
                                        • 0 Attachment
                                          Flo, do you remember the \G? I think (*COMMIT) is like that, except the
                                          match position within the subject is established dynamically after
                                          matching what's before the (*COMMIT).

                                          Remember that PCRE does not itself find multiple matches. NoteTab's
                                          functions and commands that find or replace multiple matches require
                                          NoteTab to execute PCRE multiple times at different starting positions.
                                          NoteTab's general behavior in doing so is to advance the cursor after a
                                          successful match (to find more matches past that match). NoteTab only
                                          advances the cursor and continues looking for more matches after a
                                          successful match, it doesn't do it after a "No Match" result.

                                          I believe \A matches only at the very start of a subject. Don't have
                                          time to play with GetDocListAll til later, but I think the only way more
                                          than one match could be found using a pattern starting with \A would be
                                          if NoteTab were sending PCRE different subject strings on each execution
                                          (not just different starting positions). Would surprise me if it is.

                                          Regards,
                                          Sheri
                                        • flo.gehrke
                                          ... John, No surprise -- I took your message literally. In #22150, you spoke of one instance where ]]] is NOT preceded anywhere in the file by a [[[. That is
                                          Message 20 of 21 , Oct 17, 2011
                                          • 0 Attachment
                                            --- In ntb-clips@yahoogroups.com, "John Shotsky" <jshotsky@...> wrote:
                                            >
                                            > Flo,
                                            >
                                            > It turns out that your suggestion fails at times, and takes out ]]]
                                            > which IS preceded by a [[[ somewhere above it in the
                                            > text....

                                            John,

                                            No surprise -- I took your message literally. In #22150, you spoke of "one instance where ]]] is NOT preceded anywhere in the file by a [[[. That is the only ]]] that should be removed. It is always the first one."

                                            Well, here's another idea: It removes any line (empty or not) starting with ']]]' which is NOT preceeded by '[[['. All lines starting with '[[[' and being followed somewhere by a closing ']]]' are left untouched.

                                            ^!Replace "(?s)^\[{3}.*?\]{3}\K|(?-s)^\]{3}.*(\R{1,}|\Z)" >> "" WARS

                                            Tested with...

                                            Beginning of file
                                            [This text is to remain]
                                            ]]]
                                            ]]] remove
                                            [[[ valid line
                                            valid line ]]]
                                            [[[ valid line ]]]
                                            [[[
                                            valid line
                                            ]]]
                                            ]]] remove

                                            Line #3, #4, and #11 will be removed.

                                            Regards,
                                            Flo
                                          • flo.gehrke
                                            ... Oh yes, I do remember G ! Great discussion in Oct 2008 (see #18566) Probably, this could explain why, at times, they call (*COMMIT) an anchor. Thanks
                                            Message 21 of 21 , Oct 17, 2011
                                            • 0 Attachment
                                              --- In ntb-clips@yahoogroups.com, Sheri <silvermoonwoman@...> wrote:
                                              >
                                              > Flo, do you remember the \G? I think (*COMMIT) is like that,...

                                              Oh yes, I do remember '\G'! Great discussion in Oct 2008 (see #18566)

                                              Probably, this could explain why, at times, they call '(*COMMIT)' an anchor.

                                              Thanks again for your explanations. It's always a pleasure to learn more about NT's hidden secrets from you :-)

                                              Flo
                                            Your message has been successfully submitted and would be delivered to recipients shortly.