Loading ...
Sorry, an error occurred while loading the content.

Re: [NTS] Re: NTB or RegEx Bug

Expand Messages
  • Art Kocsis
    Yes, a zero length match satisfies a{0,}. So does all of other the remaining partial matches. But the whole point is that a* is supposed to be greedy. Look at
    Message 1 of 11 , Sep 28, 2012
    • 0 Attachment
      Yes, a zero length match satisfies a{0,}. So does all of other the
      remaining partial matches. But the whole point is that a* is supposed to
      be greedy. Look at the definition of greedy - it clearly specifies the
      MAXIMUM match, not the minimum. A zero length match is the minimum not the
      maximum possibility. The question mark "lazy" metacharacter specifies the
      minimum match. There should be a difference between greedy and non-greedy.
      Even Eric agreed.

      Also, a non-match should not move the cursor. Try any other search that
      fails and
      observe the status bar cursor location. It does not move. Contriwise, a
      normal search starts at the current cursor position and searches forward
      (to the end of the file if necessary), looking for a possible match. a*
      doesn't seem to want to get off its duff to even start. [Does that mean
      that a* is lazy? <g>]

      BTW - You don't have to go thru all the hassle of defining, closing and
      reopening the Find window. Clicking on Find Next works quite well. As does
      F3 with the window open.

      BTW2 - Thanks for the heads up on DIRegEx. I will look at it. I assume that
      is the one you use. Is it freeware or shareware? I can't find any
      registration info on the site. What other RegEx apps have you tried?
      [http://www.yunqa.de/delphi/doku.php/products/regex/index#diregex%5d

      Art

      At 9/28/2012 07:56 AM, Flo wrote:
      >--- In <mailto:ntb-scripts%40yahoogroups.com>ntb-scripts@yahoogroups.com,
      >Art Kocsis <artkns@...> wrote:
      > >
      > > However, for the string: aaaaaaaa
      > > a* nothing!!!
      > >
      > > Is this a bug in RegEx or in NTB? (tested both NT
      > > Std 5.8/fv & 6.2/fv)
      >
      >No bug -- neither in PCRE nor in NTb.
      >
      >'a*' is equivalent to a{0,}. So, at the beginning of the subject string,
      >the engine achieves a match of zero length because in...
      >
      >However...aaa
      >
      >it finds a 'H' at the beginning, i.e. the absence of 'a'. Since this
      >doesn't consume any character you don't see it.
      >
      >Try this: Open the Find dialog, enter 'a*', and close it again. Now press
      >F3 repeatedly and watch the cursor moving forward. It stops at any
      >position where the pattern is true, i.e. where an 'a' is absent or where
      >the engine doesn't see an 'a' when looking to the right. Ntb will match
      >'aaa' as soon as the engine reaches that string.
      >
      >There is an PCRE_NotEmpty match option that changes this behavior. You can
      >test this with the Workbench for DIRegEx (the embedding of PCRE into Ntb).
      >But there is no way to activate that option in Ntb. When choosing that
      >option, the engine immediately selects 'aaa'.
      >
      >Regards,
      >Flo
    • flo.gehrke
      I understand that you are testing a subject string like... However ... aaa ... starting from the beginning of the line. So aaa is not at the start of line
      Message 2 of 11 , Sep 29, 2012
      • 0 Attachment
        I understand that you are testing a subject string like...

        However ... aaa ...

        starting from the beginning of the line. So 'aaa' is not at the start of line but somewhere behind 'However...'.

        > Yes, a zero length match satisfies a{0,}

        OK, so the difference between "no match" and "match of zero length (or 'zero match') can be assumed as clear.

        > But the whole point is that a* is supposed to
        > be greedy.

        No doubt, but a "sequence of zero matches" at the same position is not imaginable -- so greedyness doesn't matter here at the first positions. Greedyness matters the first time when the engine is reaching 'aaa'. At that position only, the pattern matches all 'a' since it is greedy.

        > Also, a non-match should not move the cursor.

        Possibly, there are two misunderstandings here: 1. Again, the engine doesn't achieve "non-matches" but matches of zero length. 2. The cursor isn't moved here by "non-matches" but by repeatedly re-starting Find.

        > Contriwise, a normal search starts at the current cursor
        > position and searches forward (to the end of the file if
        > necessary)

        No doubt, and that's exactly what the engine does, even in this case. Compare it with...

        ^!Replace "a*" >> "!" WARS

        tested against 'xxxxxx'. The result is '!x!x!x!x!x!x!' -- i.e., the engine achieves seven zero matches at any position where ' zero a' is true.

        Another question is: Is there a work-around in Ntb for that issue?

        You could use 'a*a' instead of 'a*'. In this case, you are forcing the engine to find zero or more 'a' being followed by another 'a'. Thus the zero matches don't act as a "brake" any more, and the engine will immediately find and select 'aaa'.

        > Thanks for the heads up on DIRegEx. I will look at it. I assume
        > that is the one you use. Is it freeware or shareware?

        For a short time, it was available for betatesters but now the link doesn't work any more.

        > What other RegEx apps have you tried?

        So far, I don't know any app that fully supports PCRE -- except that workbench and Ntb itself. Sometimes helpful is http://weitz.de/regex-coach/ but it isn't updated to the latest version of PCRE and doesn't support some PCRE features either. I think that's the same problem with RegexBuddy which has often been recommended by Alec Burgess. Please correct me if I'm wrong.

        Regards,
        Flo

        --- In ntb-scripts@yahoogroups.com, Art Kocsis <artkns@...> wrote:
        >
        > Yes, a zero length match satisfies a{0,}. So does all of other the
        > remaining partial matches. But the whole point is that a* is supposed to
        > be greedy. Look at the definition of greedy - it clearly specifies the
        > MAXIMUM match, not the minimum. A zero length match is the minimum not the
        > maximum possibility. The question mark "lazy" metacharacter specifies the
        > minimum match. There should be a difference between greedy and non-greedy.
        > Even Eric agreed.
        >
        > Also, a non-match should not move the cursor. Try any other search that
        > fails and
        > observe the status bar cursor location. It does not move. Contriwise, a
        > normal search starts at the current cursor position and searches forward
        > (to the end of the file if necessary), looking for a possible match. a*
        > doesn't seem to want to get off its duff to even start. [Does that mean
        > that a* is lazy? <g>]
        >
        > BTW - You don't have to go thru all the hassle of defining, closing and
        > reopening the Find window. Clicking on Find Next works quite well. As does
        > F3 with the window open.
        >
        > BTW2 - Thanks for the heads up on DIRegEx. I will look at it. I assume that
        > is the one you use. Is it freeware or shareware? I can't find any
        > registration info on the site. What other RegEx apps have you tried?
        > [http://www.yunqa.de/delphi/doku.php/products/regex/index#diregex%5d
        >
        > Art
        >
        > At 9/28/2012 07:56 AM, Flo wrote:
        > >--- In <mailto:ntb-scripts%40yahoogroups.com>ntb-scripts@yahoogroups.com,
        > >Art Kocsis <artkns@> wrote:
        > > >
        > > > However, for the string: aaaaaaaa
        > > > a* nothing!!!
        > > >
        > > > Is this a bug in RegEx or in NTB? (tested both NT
        > > > Std 5.8/fv & 6.2/fv)
        > >
        > >No bug -- neither in PCRE nor in NTb.
        > >
        > >'a*' is equivalent to a{0,}. So, at the beginning of the subject string,
        > >the engine achieves a match of zero length because in...
        > >
        > >However...aaa
        > >
        > >it finds a 'H' at the beginning, i.e. the absence of 'a'. Since this
        > >doesn't consume any character you don't see it.
        > >
        > >Try this: Open the Find dialog, enter 'a*', and close it again. Now press
        > >F3 repeatedly and watch the cursor moving forward. It stops at any
        > >position where the pattern is true, i.e. where an 'a' is absent or where
        > >the engine doesn't see an 'a' when looking to the right. Ntb will match
        > >'aaa' as soon as the engine reaches that string.
        > >
        > >There is an PCRE_NotEmpty match option that changes this behavior. You can
        > >test this with the Workbench for DIRegEx (the embedding of PCRE into Ntb).
        > >But there is no way to activate that option in Ntb. When choosing that
        > >option, the engine immediately selects 'aaa'.
        > >
        > >Regards,
        > >Flo
        >
      • Axel Berger
        ... I think in practice a star quantifier on its own is meaningless, there must be at least one other thing in the pattern. If you re interested in one
        Message 3 of 11 , Sep 29, 2012
        • 0 Attachment
          "flo.gehrke" wrote:
          > You could use 'a*a' instead of 'a*'.

          I think in practice a star quantifier on its own is meaningless, there
          must be at least one other thing in the pattern. If you're interested in
          one character only, then the quantifier has to be at least "+". All
          Art's "a*x" examples worked fine and so do all cases where I use the "*"
          or "?" quantifier.
          What can the possible use be for something that matches anywhere in
          anything?

          Axel
        • flo.gehrke
          ... In this context, I understood that a is just an element that, in practice , would primarily represent an element in a more complex pattern. In this
          Message 4 of 11 , Sep 30, 2012
          • 0 Attachment
            --- In ntb-scripts@yahoogroups.com, Axel Berger <Axel-Berger@...> wrote:
            >
            > "flo.gehrke" wrote:
            > > You could use 'a*a' instead of 'a*'.
            > I think in practice a star quantifier on its own is
            > meaningless, there must be at least one other thing in the
            > pattern...What can the possible use be for something that
            > matches anywhere in anything?

            In this context, I understood that 'a' is just an element that, "in practice", would primarily represent an element in a more complex pattern. In this respect, I agree with the objection you made.

            In order to match just a sequence of literal 'a', a pattern like 'a*a' wouldn't make much sense, indeed. And 'a{1,}' or 'a+' would certainly be more appropriate solutions.

            But to prevent any misunderstanding among beginners, we should stress that something like 'a*' is not at all useless under ANY circumstances. Quite often, we have to define that an element 'a' is there or it is not there.

            For example: (?<=<xxx>)\d*(?=</xxx>) matching the position between '>' and '<' in strings like...

            <xxx>12</xxx>
            <xxx></xxx>
            <xxx>9</xxx>

            no matter if there is a number or no number.

            Regards,
            Flo
          • John Shotsky
            I agree, and use the star heavily in my clip libraries. Regards, John RecipeTools Web Site: http://recipetools.gotdns.com/
            Message 5 of 11 , Sep 30, 2012
            • 0 Attachment
              I agree, and use the star heavily in my clip libraries.

              Regards,
              John
              RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/

              From: ntb-scripts@yahoogroups.com [mailto:ntb-scripts@yahoogroups.com] On Behalf Of flo.gehrke
              Sent: Sunday, September 30, 2012 05:39
              To: ntb-scripts@yahoogroups.com
              Subject: [NTS] Re: NTB or RegEx Bug


              --- In ntb-scripts@yahoogroups.com <mailto:ntb-scripts%40yahoogroups.com> , Axel Berger <Axel-Berger@...> wrote:
              >
              > "flo.gehrke" wrote:
              > > You could use 'a*a' instead of 'a*'.
              > I think in practice a star quantifier on its own is
              > meaningless, there must be at least one other thing in the
              > pattern...What can the possible use be for something that
              > matches anywhere in anything?

              In this context, I understood that 'a' is just an element that, "in practice", would primarily represent an element in a
              more complex pattern. In this respect, I agree with the objection you made.

              In order to match just a sequence of literal 'a', a pattern like 'a*a' wouldn't make much sense, indeed. And 'a{1,}' or
              'a+' would certainly be more appropriate solutions.

              But to prevent any misunderstanding among beginners, we should stress that something like 'a*' is not at all useless
              under ANY circumstances. Quite often, we have to define that an element 'a' is there or it is not there.

              For example: (?<=<xxx>)\d*(?=</xxx>) matching the position between '>' and '<' in strings like...

              <xxx>12</xxx>
              <xxx></xxx>
              <xxx>9</xxx>

              no matter if there is a number or no number.

              Regards,
              Flo



              [Non-text portions of this message have been removed]
            Your message has been successfully submitted and would be delivered to recipients shortly.