Loading ...
Sorry, an error occurred while loading the content.

[Clip] Re: FIND in NTL V7pr3 still acting odd

Expand Messages
  • joy8388608
    Wow, I think I actually understand enough about this to answer the question... If the s matches NLs and is greedy, it should never match anything ending with
    Message 1 of 15 , Apr 14 7:16 AM
    • 0 Attachment
      Wow, I think I actually understand enough about this to answer the question...

      If the \s matches NLs and is greedy, it should never match anything ending with \R since the match to \R would already have been eaten by the \s. This is not always happening so something is goofy.

      As a reminder, isn't anyone interested in all the problems I reported about FIND? I've reported three different issues so far and my reports just seem to keep falling into never-land. It makes no sense that everything works fine for me and I have these intermittent issues with V7. I've been doing this long enough to have a good feel for this type of thing and I'd put money on the fact that there is a problem here... there just are not enough people doing the things I'm doing or REPORTING the problems they see IF they, in fact, even notice them. When the wrong characters are selected during a find, this will obviously affect clips that use what is selected after a find. I know I did not test to see if what is selected on my screen is actually what NT THINKs is selected, but that's not the point.

      Of course, if someone IS looking into this, it would be nice to be told so I don't bring up issues that are already being addressed.

      Thanks, everyone. I know I've said it before, but the people on this group are very nice.

      Joy



      --- In ntb-clips@yahoogroups.com, "John Shotsky" <jshotsky@...> wrote:
      >
      > Hmmm…I'm not sure I understand the argument. All I said is that if \s* precedes a \R, the \R would never capture
      > anything, because the \s would capture any following contiguous whitespace. If you are saying that is not true, I can't
      > see why it would not be true.
      >
      > Regards,
      > John
      > RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/
      >
      > From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of flo.gehrke
      > Sent: Saturday, April 14, 2012 04:19
      > To: ntb-clips@yahoogroups.com
      > Subject: [Clip] Re: FIND in NTL V7pr3 still acting odd
      >
      >
      > --- In ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com> , "John Shotsky" <jshotsky@> wrote:
      > >
      > > \R is included in \s, so you need no \R if you have \s present.
      > > You will never capture a \R following a \s, because the
      > > \s will capture it first.
      >
      > > You will never capture a \R following a \s, because the
      > > \s will capture it first.
      >
      > So how do you explain that the two CRNL between 'xxx' and 'yyy' in...
      >
      > www
      > xxx
      >
      > yyy
      > zzz
      >
      > are both matched with '\R(\s)*\R' or '\R(\s)+\R' in NT 6.2 and NT 7.0 as well? And why, in NT 6.2, both CRNL are matched
      > with '\R\s*\R' or ''\R\s+\R'?
      >
      > So, IMHO, your statement is not fully convincing (this pertains to Alec #22593 too). See the PCRE documentation on "PCRE
      > Pattern / Newline sequences":
      >
      > "Outside a character class, by default, the escape sequence \R matches any Unicode newline sequence...In non-UTF-8 mode
      > \R is equivalent to the following: (?>\r\n|\n|\x0b|\f|\r|\x85)"
      >
      > Since CRNL, in Windows, consists of CR + NL, the sequence of two CRNL actually represents four characters. So this
      > explains the abovementioned matching. You can compare it with 'xxxx' being matched with 'x{1,2}xxx{1,2}'.
      >
      > If I'm not mistaken, the crux in Joy's problem was the DIFFERENCE between the behavior of NT 6.2 and NT 7.0 (see my
      > humble contribution in #22591). I think this difference needs some more explanation...
      >
      > Regards,
      > Flo
      >
      >
      >
      > [Non-text portions of this message have been removed]
      >
    • flo.gehrke
      ... John, If you test s* R (NT 6.2) against xxx yyy zzz the CRNL at the end of the first and the second line are matched. That is: We, obviously, can t say
      Message 2 of 15 , Apr 14 7:34 AM
      • 0 Attachment
        --- In ntb-clips@yahoogroups.com, "John Shotsky" <jshotsky@...> wrote:
        >
        > Hmmm…I'm not sure I understand the argument. All I said is that
        > if \s* precedes a \R, the \R would never capture anything,
        > because the \s would capture any following contiguous whitespace.
        > If you are saying that is not true, I can't see why it would
        > not be true.

        John,

        If you test '\s*\R' (NT 6.2) against

        xxx
        yyy
        zzz

        the CRNL at the end of the first and the second line are matched.

        That is: We, obviously, can't say "if \s* precedes a \R, the \R would never capture anything, because the \s would capture any following contiguous whitespace".

        If '\s*' (in NT 6.2) or '(\s)*' (in NT 7.0) would consume all white spaces between those lines, the RegEx engine would miss a following '\R' and fail -- but it doesn't!

        The reason is in backtracking: First, '\s*' matches both CR and NL contained in CRNL. Next, the engine will test the '\R'. Since it's missing, the engine tracks back and forces '\s' to give back at least one white space. So '\R' is satisfied with the NL (since it matches both CRNL and NL) -- and the whole pattern is true.

        Regards,
        Flo
      • Axel Berger
        ... I just wanted to say thank you to all you diligent testers for the benefit of us late adopters, who then have all the bugs fixed beforehand. Danke Axel
        Message 3 of 15 , Apr 14 8:04 AM
        • 0 Attachment
          "flo.gehrke" wrote:
          > The reason is in backtracking

          I just wanted to say thank you to all you diligent testers for the
          benefit of us late adopters, who then have all the bugs fixed
          beforehand.

          Danke
          Axel
        • John Shotsky
          I found an interesting thing while testing this: When finding with just s* on the pattern below, it stops at each character position until it gets to a
          Message 4 of 15 , Apr 14 8:57 AM
          • 0 Attachment
            I found an interesting thing while testing this:
            When finding with just \s* on the pattern below, it stops at each character position until it gets to a whitespace
            character, then finds that. Just keep clicking Find until it finally finds a whitespace. I tried adding a space after
            the letters, and found that it would find both the space and the CR, but only after it had stopped at each character
            position first. This seems wrong, such as \s? might produce. So \s* does not find a space or a CR until it has stepped
            through character positions to the point of the whitespace.

            In the past, I have noticed some erratic behavior with \s* in my clip library, and couldn't understand what was causing
            it. I changed my clips to not use \s in some of those cases, and now don't use it much. At least this little exercise
            demonstrates that there is some unexpected behavior, even if it is as intended.

            Regards,
            John
            RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/

            From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of flo.gehrke
            Sent: Saturday, April 14, 2012 07:35
            To: ntb-clips@yahoogroups.com
            Subject: [Clip] Re: FIND in NTL V7pr3 still acting odd


            --- In ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com> , "John Shotsky" <jshotsky@...> wrote:
            >
            > Hmmm�I'm not sure I understand the argument. All I said is that
            > if \s* precedes a \R, the \R would never capture anything,
            > because the \s would capture any following contiguous whitespace.
            > If you are saying that is not true, I can't see why it would
            > not be true.

            John,

            If you test '\s*\R' (NT 6.2) against

            xxx
            yyy
            zzz

            the CRNL at the end of the first and the second line are matched.

            That is: We, obviously, can't say "if \s* precedes a \R, the \R would never capture anything, because the \s would
            capture any following contiguous whitespace".

            If '\s*' (in NT 6.2) or '(\s)*' (in NT 7.0) would consume all white spaces between those lines, the RegEx engine would
            miss a following '\R' and fail -- but it doesn't!

            The reason is in backtracking: First, '\s*' matches both CR and NL contained in CRNL. Next, the engine will test the
            '\R'. Since it's missing, the engine tracks back and forces '\s' to give back at least one white space. So '\R' is
            satisfied with the NL (since it matches both CRNL and NL) -- and the whole pattern is true.

            Regards,
            Flo



            [Non-text portions of this message have been removed]
          • Eric Fookes
            Hi John, ... This works as designed. When a successful match has a length of zero characters, NoteTab then shifts position by one character on repeating the
            Message 5 of 15 , Apr 16 2:52 AM
            • 0 Attachment
              Hi John,

              > I found an interesting thing while testing this: When finding with
              > just \s* on the pattern below, it stops at each character position
              > until it gets to a whitespace character, then finds that. Just keep
              > clicking Find until it finally finds a whitespace. I tried adding a
              > space after the letters, and found that it would find both the space
              > and the CR, but only after it had stopped at each character position
              > first. This seems wrong, such as \s? might produce. So \s* does not
              > find a space or a CR until it has stepped through character positions
              > to the point of the whitespace.

              This works as designed. When a successful match has a length of zero
              characters, NoteTab then shifts position by one character on repeating
              the search. This is necessary to avoid getting stuck in an infinite loop
              when running a Replace All operation.

              Using \s+ avoids this issue because a successful match must be at least
              one character.

              > In the past, I have noticed some erratic behavior with \s* in my clip
              > library, and couldn't understand what was causing it. I changed my
              > clips to not use \s in some of those cases, and now don't use it
              > much. At least this little exercise demonstrates that there is some
              > unexpected behavior, even if it is as intended.

              I think in most cases it is best to avoid running RE searches that can
              produce 0-character matches.

              --
              Regards,

              Eric Fookes
              http://www.fookes.com/
            • Eric Fookes
              Here s an extract of the explanation I sent to Flo in private mail: NoteTab embeds the PCRE engine through the DIRegEx library developed by Ralf Junker:
              Message 6 of 15 , Apr 16 2:56 AM
              • 0 Attachment
                Here's an extract of the explanation I sent to Flo in private mail:

                NoteTab embeds the PCRE engine through the DIRegEx library developed by
                Ralf Junker:

                http://www.yunqa.de/delphi/doku.php/products/regex/index

                Although bugs can appear in the NoteTab code that integrates and invokes
                DIRegEx, it seems not to be the case with the latest issues you found.
                You can verify this by downloading and testing Ralf's DIRegEx_Workbench
                program from here (no install required):

                http://www.fookes.com/ftp/DIRegEx_Workbench.zip

                The default RE options are the same as those used in NoteTab.

                If you find a search that works in DIRegEx_Workbench but not in NoteTab,
                then the bug is probably in NoteTab. However, if a search fails in both
                programs, then the bug is probably in PCRE or DIRegEx.

                Or you have a mistake in your pattern.

                --
                Regards,

                Eric Fookes
                http://www.fookes.com/

                On 14/04/2012 13:19, flo.gehrke wrote:
                > --- In ntb-clips@yahoogroups.com, "John Shotsky"<jshotsky@...> wrote:
                >>
                >> \R is included in \s, so you need no \R if you have \s present.
                >> You will never capture a \R following a \s, because the
                >> \s will capture it first.
                >
                >> You will never capture a \R following a \s, because the
                >> \s will capture it first.
                >
                > So how do you explain that the two CRNL between 'xxx' and 'yyy' in...
                >
                > www
                > xxx
                >
                > yyy
                > zzz
                >
                > are both matched with '\R(\s)*\R' or '\R(\s)+\R' in NT 6.2 and NT 7.0 as well? And why, in NT 6.2, both CRNL are matched with '\R\s*\R' or ''\R\s+\R'?
                >
                > So, IMHO, your statement is not fully convincing (this pertains to Alec #22593 too). See the PCRE documentation on "PCRE Pattern / Newline sequences":
                >
                > "Outside a character class, by default, the escape sequence \R matches any Unicode newline sequence...In non-UTF-8 mode \R is equivalent to the following: (?>\r\n|\n|\x0b|\f|\r|\x85)"
                >
                > Since CRNL, in Windows, consists of CR + NL, the sequence of two CRNL actually represents four characters. So this explains the abovementioned matching. You can compare it with 'xxxx' being matched with 'x{1,2}xxx{1,2}'.
                >
                > If I'm not mistaken, the crux in Joy's problem was the DIFFERENCE between the behavior of NT 6.2 and NT 7.0 (see my humble contribution in #22591). I think this difference needs some more explanation...
                >
                > Regards,
                > Flo
              • flo.gehrke
                ... Since that workbench shows the same behavior as NTb 7.0, the problem is not in NTb 7.0. According to an information from the DIRegEx developer, PCRE is
                Message 7 of 15 , Apr 22 4:31 AM
                • 0 Attachment
                  --- In ntb-clips@yahoogroups.com, Eric Fookes <egroups@...> wrote:
                  >
                  > NoteTab embeds the PCRE engine through the DIRegEx library
                  > (...). Although bugs can appear in the NoteTab code that
                  > integrates and invokes DIRegEx, it seems not to be the case
                  > with the latest issues you found. You can verify this by
                  > downloading and testing Ralf's DIRegEx_Workbench
                  > program (...) If you find a search that works in
                  > DIRegEx_Workbench but not in NoteTab, then the bug is probably
                  > in NoteTab. However, if a search fails in both programs, then
                  > the bug is probably in PCRE or DIRegEx.

                  Since that workbench shows the same behavior as NTb 7.0, the problem is not in NTb 7.0.

                  According to an information from the DIRegEx developer, PCRE is responsible for this inconsistency. But they have found a solution already, and a forthcoming version of DIRegEx will be adjusted.

                  Please see the complete topic (#22589 ff) to see the work-arounds that have been discussed in this group...

                  Regards,
                  Flo
                Your message has been successfully submitted and would be delivered to recipients shortly.