Loading ...
Sorry, an error occurred while loading the content.

Re: [Clip] A little help on look behinds

Expand Messages
  • flo.gehrke
    ... John, I think if it s always the first ]]] in the string you could just write... ^!Replace (?s).{1,}? K ]{3} R WRS The beginning of file is
    Message 1 of 21 , Oct 16, 2011
    • 0 Attachment
      --- In ntb-clips@yahoogroups.com, "John Shotsky" <jshotsky@...> wrote:
      >
      > ...there may be, at most, one instance where ]]] is not
      > preceded anywhere in the file by a [[[. That is the only ]]] that
      > should be removed. It is always the first one...

      John,

      I think if it's always the first ']]]' in the string you could just write...

      ^!Replace "(?s).{1,}?\K\]{3}\R" >> "" WRS

      The beginning of file is asserted with the 'W' option -- so there's no '\A' necessary. Without the 'A' option, only the first occurrence gets removed.

      If "no instances of ([[[) [is] permitted in search" you could exclude that with a negative Lookahead...

      ^!Replace "(?s)(?!\[{3}).{1,}?\K\]{3}\R" >> "" WRS

      ...if necessary.

      Regards,
      Flo
    • flo.gehrke
      ... Sheri, I would be grateful for some more explanations about that verb (*COMMIT). I ve tested your clip...
      Message 2 of 21 , Oct 16, 2011
      • 0 Attachment
        --- In ntb-clips@yahoogroups.com, Sheri <silvermoonwoman@...> wrote:
        >
        > (*COMMIT) says the rest of the pattern must match from here without
        > backtracking...I guess you could say it creates an anchor in
        > the middle of the pattern.

        Sheri,

        I would be grateful for some more explanations about that verb '(*COMMIT).

        I've tested your clip...

        (?s)\A.+?\R\K(?=\]\]\]|\[\[\[)(*COMMIT)(?-s)\]\]\].*\R

        against the following text which is quite similar to John's first sample. For our discussion, I've added line numbers (to be removed when testing):

        1 First line
        2
        3 [valid line]
        4
        5 ]]] remove
        6
        7 ]]] remove
        8
        9 [[[ valid line
        10 more valid lines
        11 ]]] valid line.
        12
        13 [[[ valid line

        It's quite clear for me why the clip removes line #5 and #7 but not #9. But I still can't see why it doesn't remove line #11.

        If we omit the '\K' we can see two matches:

        - 1. from start of string to end of line #5

        - 2. line #6 till end of line #7

        Next, line #8 and #9 are not matched because line #9 doesn't start with ']]]'.

        But WHY doesn't the clip jump over that mismatch and moves on selecting line #10 and #11? IMHO, line #10 should be matched with '(?s)\A.+?\R\K(?=\]\]\]|\[\[\[)' (with or without '\A'), and the following '(?-s)\]\]\].*\R'. Why on earth is '(*COMMIT)' preventing this?

        Thanks for any light you can shed on this!

        Flo
      • John Shotsky
        I found some fairly extensive explanations and examples of this and other such verbs in the PCRE text manual. Search for commit to find all the bits.
        Message 3 of 21 , Oct 16, 2011
        • 0 Attachment
          I found some fairly extensive explanations and examples of this and other such verbs in the PCRE text manual. Search for
          commit to find all the bits.
          http://www.pcre.org/pcre.txt

          Regards,
          John

          From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of flo.gehrke
          Sent: Sunday, October 16, 2011 05:33
          To: ntb-clips@yahoogroups.com
          Subject: Re: [Clip] A little help on look behinds


          --- In ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com> , Sheri <silvermoonwoman@...> wrote:
          >
          > (*COMMIT) says the rest of the pattern must match from here without
          > backtracking...I guess you could say it creates an anchor in
          > the middle of the pattern.

          Sheri,

          I would be grateful for some more explanations about that verb '(*COMMIT).

          I've tested your clip...

          (?s)\A.+?\R\K(?=\]\]\]|\[\[\[)(*COMMIT)(?-s)\]\]\].*\R

          against the following text which is quite similar to John's first sample. For our discussion, I've added line numbers
          (to be removed when testing):

          1 First line
          2
          3 [valid line]
          4
          5 ]]] remove
          6
          7 ]]] remove
          8
          9 [[[ valid line
          10 more valid lines
          11 ]]] valid line.
          12
          13 [[[ valid line

          It's quite clear for me why the clip removes line #5 and #7 but not #9. But I still can't see why it doesn't remove line
          #11.

          If we omit the '\K' we can see two matches:

          - 1. from start of string to end of line #5

          - 2. line #6 till end of line #7

          Next, line #8 and #9 are not matched because line #9 doesn't start with ']]]'.

          But WHY doesn't the clip jump over that mismatch and moves on selecting line #10 and #11? IMHO, line #10 should be
          matched with '(?s)\A.+?\R\K(?=\]\]\]|\[\[\[)' (with or without '\A'), and the following '(?-s)\]\]\].*\R'. Why on earth
          is '(*COMMIT)' preventing this?

          Thanks for any light you can shed on this!

          Flo





          [Non-text portions of this message have been removed]
        • John Shotsky
          Thanks, Flo, the second example does what is needed. It s also easier for me to understand… :-) Regards, John From: ntb-clips@yahoogroups.com
          Message 4 of 21 , Oct 16, 2011
          • 0 Attachment
            Thanks, Flo, the second example does what is needed. It's also easier for me to understand� :-)

            Regards,
            John

            From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of flo.gehrke
            Sent: Sunday, October 16, 2011 05:25
            To: ntb-clips@yahoogroups.com
            Subject: Re: [Clip] A little help on look behinds


            --- In ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com> , "John Shotsky" <jshotsky@...> wrote:
            >
            > ...there may be, at most, one instance where ]]] is not
            > preceded anywhere in the file by a [[[. That is the only ]]] that
            > should be removed. It is always the first one...

            John,

            I think if it's always the first ']]]' in the string you could just write...

            ^!Replace "(?s).{1,}?\K\]{3}\R" >> "" WRS

            The beginning of file is asserted with the 'W' option -- so there's no '\A' necessary. Without the 'A' option, only the
            first occurrence gets removed.

            If "no instances of ([[[) [is] permitted in search" you could exclude that with a negative Lookahead...

            ^!Replace "(?s)(?!\[{3}).{1,}?\K\]{3}\R" >> "" WRS

            ...if necessary.

            Regards,
            Flo



            [Non-text portions of this message have been removed]
          • flo.gehrke
            ... John, Thanks for that hint, but I can t find an answer there to my question. It appears noticeable that these explanations are rather poor. Maybe they pay
            Message 5 of 21 , Oct 16, 2011
            • 0 Attachment
              --- In ntb-clips@yahoogroups.com, "John Shotsky" <jshotsky@...> wrote:
              >
              > I found some fairly extensive explanations and examples of this
              > and other such verbs in the PCRE text manual. Search for
              > commit to find all the bits. http://www.pcre.org/pcre.txt

              John,

              Thanks for that hint, but I can't find an answer there to my question.

              It appears noticeable that these explanations are rather poor. Maybe they pay little attention to these Backtracking Control Verbs because they are regarded as "experimental" only.

              Still in hope for an answer from Sheri or any other expert,

              Flo
            • diodeom
              ... I d guess you re running this pattern in the (Ctrl+R) dialog box instead of in a clip -- where it s meant to ***capture or fail*** only once (on the very
              Message 6 of 21 , Oct 16, 2011
              • 0 Attachment
                Flo wrote:
                >
                > --- In ntb-clips@yahoogroups.com, Sheri <silvermoonwoman@> wrote:
                > >
                > > (*COMMIT) says the rest of the pattern must match from here without
                > > backtracking...I guess you could say it creates an anchor in
                > > the middle of the pattern.
                >
                > Sheri,
                >
                > I would be grateful for some more explanations about that verb '(*COMMIT).
                >
                > I've tested your clip...
                >
                > (?s)\A.+?\R\K(?=\]\]\]|\[\[\[)(*COMMIT)(?-s)\]\]\].*\R
                >
                > against the following text which is quite similar to John's first sample. For our discussion, I've added line numbers (to be removed when testing):
                >
                > 1 First line
                > 2
                > 3 [valid line]
                > 4
                > 5 ]]] remove
                > 6
                > 7 ]]] remove
                > 8
                > 9 [[[ valid line
                > 10 more valid lines
                > 11 ]]] valid line.
                > 12
                > 13 [[[ valid line
                >
                > It's quite clear for me why the clip removes line #5 and #7 but not #9. But I still can't see why it doesn't remove line #11.
                >
                > If we omit the '\K' we can see two matches:
                >
                > - 1. from start of string to end of line #5
                >
                > - 2. line #6 till end of line #7
                >
                > Next, line #8 and #9 are not matched because line #9 doesn't start with ']]]'.
                >
                > But WHY doesn't the clip jump over that mismatch and moves on selecting line #10 and #11? IMHO, line #10 should be matched with '(?s)\A.+?\R\K(?=\]\]\]|\[\[\[)' (with or without '\A'), and the following '(?-s)\]\]\].*\R'. Why on earth is '(*COMMIT)' preventing this?
                >
                > Thanks for any light you can shed on this!
                >


                I'd guess you're running this pattern in the (Ctrl+R) dialog box instead of in a clip -- where it's meant to ***capture or fail*** only once (on the very first instance of either [[[ or ]]]).

                If you click "Find Next" after #5 and #7, notice that your beginning position for the next attempt is on or after line #7. After the first available alternative "[[[" is spotted by the look-ahead now on line #9, (*COMMIT) demands that at this very location either "]]]" should be found or else the whole pattern should abandon any further matching attempts. Obviously, "[[[" ain't the required "]]]" so the pattern fails by design.
              • Sheri
                ... Hi Dio, Flo, everyone, That seems to say it well Dio. In retrospect, this is likely sufficient: ^!Replace ^(?= Q]]] E| Q[[[ E)(*COMMIT) Q]]] E.* R
                Message 7 of 21 , Oct 16, 2011
                • 0 Attachment
                  On 10/16/2011 2:26 PM, diodeom wrote:
                  >
                  >
                  > I'd guess you're running this pattern in the (Ctrl+R) dialog box instead of in a clip -- where it's meant to ***capture or fail*** only once (on the very first instance of either [[[ or ]]]).
                  >
                  > If you click "Find Next" after #5 and #7, notice that your beginning position for the next attempt is on or after line #7. After the first available alternative "[[[" is spotted by the look-ahead now on line #9, (*COMMIT) demands that at this very location either "]]]" should be found or else the whole pattern should abandon any further matching attempts. Obviously, "[[[" ain't the required "]]]" so the pattern fails by design.
                  >
                  >
                  >

                  Hi Dio, Flo, everyone,

                  That seems to say it well Dio.

                  In retrospect, this is likely sufficient:

                  ^!Replace "^(?=\Q]]]\E|\Q[[[\E)(*COMMIT)\Q]]]\E.*\R" >> "" WRS

                  if the "A" (replace all) option were added, it would remove both
                  "remove" lines from Flo's sample and quit upon seeing a line that begins
                  with [[[

                  I agree that the PCRE (*VERB) documentation is poor. In addition it
                  should be noted that there have been bug fixes and enhancements to verb
                  processing in the three PCRE updates that have occurred since the
                  version built into NoteTab 6.2.

                  Yet another PCRE update is pending (PCRE 8.20) and I hope Eric will take
                  note when its available.

                  That said, the best way to understand what the verbs do is to experiment
                  with them.

                  Regards,
                  Sheri
                • flo.gehrke
                  ... Thanks, diodeom! It was clear, however, why the expression fails at line #9. But the question was: Why doesn t it match line #10 and #11? Why doesn t the
                  Message 8 of 21 , Oct 17, 2011
                  • 0 Attachment
                    --- In ntb-clips@yahoogroups.com, "diodeom" <diomir@...> wrote:
                    >
                    > I'd guess you're running this pattern in the (Ctrl+R) dialog
                    > box instead of in a clip...

                    Thanks, diodeom!

                    It was clear, however, why the expression fails at line #9. But the question was: Why doesn't it match line #10 and #11? Why doesn't the engine just skip the mismatch?

                    If we start anew from the beginning of line #10 then line #11 will be selected. But when starting from the beginning of the subject string the verb seems to nail the cursor to the beginning of line #8.

                    Moreover, it doesn't seem to be a matter of running it in the dialog box or in a clip. For example...

                    ^!Info ^$GetDocListAll("(?s).+?\R\K(?=\]\]\]|\[\[\[)(*COMMIT)(?-s)\]\]\].*\R";"$0\r\n")$

                    achieves only two matches as well: line #5 and #7 (I've omitted '\A' here because it doesn't change the result no matter if used in the dialog or a clip).

                    Well, I don't want to tax your patience too much with my slow-wittedness. Don't ask -- just be surprised! Obviously, that's the way the verb is designed to work. It prevents the engine from making any further attempt at all once it has failed at any position. I hope I've learned the lesson...

                    Flo

                    PS Also thanks to Sheri for her latest reply!

                    ---

                    > Flo wrote:
                    > >
                    > > --- In ntb-clips@yahoogroups.com, Sheri <silvermoonwoman@> wrote:
                    > > >
                    > > > (*COMMIT) says the rest of the pattern must match from here without
                    > > > backtracking...I guess you could say it creates an anchor in
                    > > > the middle of the pattern.
                    > >
                    > > Sheri,
                    > >
                    > > I would be grateful for some more explanations about that verb '(*COMMIT).
                    > >
                    > > I've tested your clip...
                    > >
                    > > (?s)\A.+?\R\K(?=\]\]\]|\[\[\[)(*COMMIT)(?-s)\]\]\].*\R
                    > >
                    > > against the following text which is quite similar to John's first sample. For our discussion, I've added line numbers (to be removed when testing):
                    > >
                    > > 1 First line
                    > > 2
                    > > 3 [valid line]
                    > > 4
                    > > 5 ]]] remove
                    > > 6
                    > > 7 ]]] remove
                    > > 8
                    > > 9 [[[ valid line
                    > > 10 more valid lines
                    > > 11 ]]] valid line.
                    > > 12
                    > > 13 [[[ valid line
                    > >
                    > > It's quite clear for me why the clip removes line #5 and #7 but not #9. But I still can't see why it doesn't remove line #11.
                    > >
                    > > If we omit the '\K' we can see two matches:
                    > >
                    > > - 1. from start of string to end of line #5
                    > >
                    > > - 2. line #6 till end of line #7
                    > >
                    > > Next, line #8 and #9 are not matched because line #9 doesn't start with ']]]'.
                    > >
                    > > But WHY doesn't the clip jump over that mismatch and moves on selecting line #10 and #11? IMHO, line #10 should be matched with '(?s)\A.+?\R\K(?=\]\]\]|\[\[\[)' (with or without '\A'), and the following '(?-s)\]\]\].*\R'. Why on earth is '(*COMMIT)' preventing this?
                    > >
                    > > Thanks for any light you can shed on this!
                    > >
                    >
                    >
                    > I'd guess you're running this pattern in the (Ctrl+R) dialog box instead of in a clip -- where it's meant to ***capture or fail*** only once (on the very first instance of either [[[ or ]]]).
                    >
                    > If you click "Find Next" after #5 and #7, notice that your beginning position for the next attempt is on or after line #7. After the first available alternative "[[[" is spotted by the look-ahead now on line #9, (*COMMIT) demands that at this very location either "]]]" should be found or else the whole pattern should abandon any further matching attempts. Obviously, "[[[" ain't the required "]]]" so the pattern fails by design.
                    >
                  • John Shotsky
                    Flo, It turns out that your suggestion fails at times, and takes out ]]] which IS preceded by a [[[ somewhere above it in the text. If [[[ were the first thing
                    Message 9 of 21 , Oct 17, 2011
                    • 0 Attachment
                      Flo,

                      It turns out that your suggestion fails at times, and takes out ]]] which IS preceded by a [[[ somewhere above it in the
                      text. If [[[ were the first thing in the file, it should do nothing.
                      ^!Replace "(?s)(?!\[{3}).{1,}?\K\]{3}\R" >> "" WRS

                      I didn't take time to troubleshoot it, as the '*COMMIT' version does not fail. I just mention it in case you want to
                      play with it some more.

                      Regards,
                      John

                      From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of flo.gehrke
                      Sent: Monday, October 17, 2011 03:49
                      To: ntb-clips@yahoogroups.com
                      Subject: [Clip] Re: A little help on look behinds


                      --- In ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com> , "diodeom" <diomir@...> wrote:
                      >
                      > I'd guess you're running this pattern in the (Ctrl+R) dialog
                      > box instead of in a clip...

                      Thanks, diodeom!

                      It was clear, however, why the expression fails at line #9. But the question was: Why doesn't it match line #10 and #11?
                      Why doesn't the engine just skip the mismatch?

                      If we start anew from the beginning of line #10 then line #11 will be selected. But when starting from the beginning of
                      the subject string the verb seems to nail the cursor to the beginning of line #8.

                      Moreover, it doesn't seem to be a matter of running it in the dialog box or in a clip. For example...

                      ^!Info ^$GetDocListAll("(?s).+?\R\K(?=\]\]\]|\[\[\[)(*COMMIT)(?-s)\]\]\].*\R";"$0\r\n")$

                      achieves only two matches as well: line #5 and #7 (I've omitted '\A' here because it doesn't change the result no matter
                      if used in the dialog or a clip).

                      Well, I don't want to tax your patience too much with my slow-wittedness. Don't ask -- just be surprised! Obviously,
                      that's the way the verb is designed to work. It prevents the engine from making any further attempt at all once it has
                      failed at any position. I hope I've learned the lesson...

                      Flo

                      PS Also thanks to Sheri for her latest reply!

                      ---

                      > Flo wrote:
                      > >
                      > > --- In ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com> , Sheri <silvermoonwoman@> wrote:
                      > > >
                      > > > (*COMMIT) says the rest of the pattern must match from here without
                      > > > backtracking...I guess you could say it creates an anchor in
                      > > > the middle of the pattern.
                      > >
                      > > Sheri,
                      > >
                      > > I would be grateful for some more explanations about that verb '(*COMMIT).
                      > >
                      > > I've tested your clip...
                      > >
                      > > (?s)\A.+?\R\K(?=\]\]\]|\[\[\[)(*COMMIT)(?-s)\]\]\].*\R
                      > >
                      > > against the following text which is quite similar to John's first sample. For our discussion, I've added line
                      numbers (to be removed when testing):
                      > >
                      > > 1 First line
                      > > 2
                      > > 3 [valid line]
                      > > 4
                      > > 5 ]]] remove
                      > > 6
                      > > 7 ]]] remove
                      > > 8
                      > > 9 [[[ valid line
                      > > 10 more valid lines
                      > > 11 ]]] valid line.
                      > > 12
                      > > 13 [[[ valid line
                      > >
                      > > It's quite clear for me why the clip removes line #5 and #7 but not #9. But I still can't see why it doesn't remove
                      line #11.
                      > >
                      > > If we omit the '\K' we can see two matches:
                      > >
                      > > - 1. from start of string to end of line #5
                      > >
                      > > - 2. line #6 till end of line #7
                      > >
                      > > Next, line #8 and #9 are not matched because line #9 doesn't start with ']]]'.
                      > >
                      > > But WHY doesn't the clip jump over that mismatch and moves on selecting line #10 and #11? IMHO, line #10 should be
                      matched with '(?s)\A.+?\R\K(?=\]\]\]|\[\[\[)' (with or without '\A'), and the following '(?-s)\]\]\].*\R'. Why on earth
                      is '(*COMMIT)' preventing this?
                      > >
                      > > Thanks for any light you can shed on this!
                      > >
                      >
                      >
                      > I'd guess you're running this pattern in the (Ctrl+R) dialog box instead of in a clip -- where it's meant to
                      ***capture or fail*** only once (on the very first instance of either [[[ or ]]]).
                      >
                      > If you click "Find Next" after #5 and #7, notice that your beginning position for the next attempt is on or after line
                      #7. After the first available alternative "[[[" is spotted by the look-ahead now on line #9, (*COMMIT) demands that at
                      this very location either "]]]" should be found or else the whole pattern should abandon any further matching attempts.
                      Obviously, "[[[" ain't the required "]]]" so the pattern fails by design.
                      >



                      [Non-text portions of this message have been removed]
                    • Sheri
                      Flo, do you remember the G? I think (*COMMIT) is like that, except the match position within the subject is established dynamically after matching what s
                      Message 10 of 21 , Oct 17, 2011
                      • 0 Attachment
                        Flo, do you remember the \G? I think (*COMMIT) is like that, except the
                        match position within the subject is established dynamically after
                        matching what's before the (*COMMIT).

                        Remember that PCRE does not itself find multiple matches. NoteTab's
                        functions and commands that find or replace multiple matches require
                        NoteTab to execute PCRE multiple times at different starting positions.
                        NoteTab's general behavior in doing so is to advance the cursor after a
                        successful match (to find more matches past that match). NoteTab only
                        advances the cursor and continues looking for more matches after a
                        successful match, it doesn't do it after a "No Match" result.

                        I believe \A matches only at the very start of a subject. Don't have
                        time to play with GetDocListAll til later, but I think the only way more
                        than one match could be found using a pattern starting with \A would be
                        if NoteTab were sending PCRE different subject strings on each execution
                        (not just different starting positions). Would surprise me if it is.

                        Regards,
                        Sheri
                      • flo.gehrke
                        ... John, No surprise -- I took your message literally. In #22150, you spoke of one instance where ]]] is NOT preceded anywhere in the file by a [[[. That is
                        Message 11 of 21 , Oct 17, 2011
                        • 0 Attachment
                          --- In ntb-clips@yahoogroups.com, "John Shotsky" <jshotsky@...> wrote:
                          >
                          > Flo,
                          >
                          > It turns out that your suggestion fails at times, and takes out ]]]
                          > which IS preceded by a [[[ somewhere above it in the
                          > text....

                          John,

                          No surprise -- I took your message literally. In #22150, you spoke of "one instance where ]]] is NOT preceded anywhere in the file by a [[[. That is the only ]]] that should be removed. It is always the first one."

                          Well, here's another idea: It removes any line (empty or not) starting with ']]]' which is NOT preceeded by '[[['. All lines starting with '[[[' and being followed somewhere by a closing ']]]' are left untouched.

                          ^!Replace "(?s)^\[{3}.*?\]{3}\K|(?-s)^\]{3}.*(\R{1,}|\Z)" >> "" WARS

                          Tested with...

                          Beginning of file
                          [This text is to remain]
                          ]]]
                          ]]] remove
                          [[[ valid line
                          valid line ]]]
                          [[[ valid line ]]]
                          [[[
                          valid line
                          ]]]
                          ]]] remove

                          Line #3, #4, and #11 will be removed.

                          Regards,
                          Flo
                        • flo.gehrke
                          ... Oh yes, I do remember G ! Great discussion in Oct 2008 (see #18566) Probably, this could explain why, at times, they call (*COMMIT) an anchor. Thanks
                          Message 12 of 21 , Oct 17, 2011
                          • 0 Attachment
                            --- In ntb-clips@yahoogroups.com, Sheri <silvermoonwoman@...> wrote:
                            >
                            > Flo, do you remember the \G? I think (*COMMIT) is like that,...

                            Oh yes, I do remember '\G'! Great discussion in Oct 2008 (see #18566)

                            Probably, this could explain why, at times, they call '(*COMMIT)' an anchor.

                            Thanks again for your explanations. It's always a pleasure to learn more about NT's hidden secrets from you :-)

                            Flo
                          Your message has been successfully submitted and would be delivered to recipients shortly.