Loading ...
Sorry, an error occurred while loading the content.

Re: [Clip] Re: \W and underscore

Expand Messages
  • Don
    I think it should be as is. A _ is not a word boundary ... it is used to join the words. As Flo points out, they gave you a solution and as Axel points out
    Message 1 of 15 , Mar 16 8:59 PM
    • 0 Attachment
      I think it should be as is. A _ is not a word boundary ... it is used
      to join the words.

      As Flo points out, they gave you a solution and as Axel points out there
      is another easy solution. If they happened to conclude that you were
      right, that would require all manner of recoding ... which is what you
      are disinclined to do here for your libraries apparently and yet the
      entire world would have to do so if your thought carries the day.

      I'd say it matters not what we think, because as Flo says, it has roots
      in Perl.


      On 3/16/2013 11:26 PM, John Shotsky wrote:
      > I would rather rename all underscores in the beginning to avoid this problem than have to convert my whole library to use that
      > nomenclature when all I want is for \w to work as it should. For example, I could convert the underscores to [_] (including the
      > brackets) and then all would work as expected.
      >
    • John Shotsky
      Yet it works with b as a word boundary. If it is treated as a word boundary, it is NOT being treated as a letter or number in THAT case. That is, a b detects
      Message 2 of 15 , Mar 16 9:10 PM
      • 0 Attachment
        Yet it works with \b as a word boundary. If it is treated as a word boundary, it is NOT being treated as a letter or number in THAT
        case. That is, a \b detects that a word ends, but \w includes the [_]. I don't care about history � PCRE is already different than
        Perl. It is not selfish to think that \w, which is defined as all letters and numbers, should actually BE all numbers and letters
        AND NOT the underscore. Nowhere else, in all of PCRE (as far as I know) does a non-letter and non-number count as a letter or a
        number. That is just wrong.

        Regards,
        John
        RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/
        John's Mags Yahoo Group: <http://groups.yahoo.com/group/johnsmags/> http://groups.yahoo.com/group/johnsmags/

        From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Don
        Sent: Saturday, March 16, 2013 21:00
        To: ntb-clips@yahoogroups.com
        Subject: Re: [Clip] Re: \W and underscore


        I think it should be as is. A _ is not a word boundary ... it is used
        to join the words.

        As Flo points out, they gave you a solution and as Axel points out there
        is another easy solution. If they happened to conclude that you were
        right, that would require all manner of recoding ... which is what you
        are disinclined to do here for your libraries apparently and yet the
        entire world would have to do so if your thought carries the day.

        I'd say it matters not what we think, because as Flo says, it has roots
        in Perl.

        On 3/16/2013 11:26 PM, John Shotsky wrote:
        > I would rather rename all underscores in the beginning to avoid this problem than have to convert my whole library to use that
        > nomenclature when all I want is for \w to work as it should. For example, I could convert the underscores to [_] (including the
        > brackets) and then all would work as expected.
        >



        [Non-text portions of this message have been removed]
      • flo.gehrke
        ... This is misleading. A single character like the underscore can never be represent a word boundary. b is an assertion that matches at a position where a
        Message 3 of 15 , Mar 16 11:10 PM
        • 0 Attachment
          --- In ntb-clips@yahoogroups.com, "John Shotsky" <jshotsky@...> wrote:
          >
          > This also complicates the use of \b for word boundaries,
          > because \b DOES treat this character as a word boundary.

          This is misleading. A single character like the underscore can never be represent a word boundary. '\b' is an assertion that matches at a position where a non-word character is preceded resp. followed by a word character. Thus it signifies a position of zero length and no single character.

          As discussed here, the underscore is defined as a normal word character. So '\bJohn' doesn't match the string 'aaa _John', for example, because 'John' is not preceded by a word boundary in this case.

          Flo
        • Axel Berger
          ... You re absolutely right. I had taken John by his word and not tested this. In the text aaabbbccc aaa bbbccc aaabbb ccc aaa bbb ccc aaa_bbbccc aaabbb_ccc
          Message 4 of 15 , Mar 16 11:50 PM
          • 0 Attachment
            "flo.gehrke" wrote:
            > As discussed here, the underscore is defined as a normal word character.

            You're absolutely right. I had taken John by his word and not tested
            this.

            In the text

            aaabbbccc
            aaa bbbccc aaabbb ccc aaa bbb ccc
            aaa_bbbccc aaabbb_ccc aaa_bbb_ccc
            aaa _bbb ccc aaa bbb_ ccc aaa _bbb_ ccc
            aaa_ bbb_ccc aaa_bbb _ccc aaa_ bbb _ccc

            the pattern "\bbbb\b" (b was a bad letter choice in hindsight) matches
            the last string in the second and in the fifth line, nothing else.

            Axel
          • John Shotsky
            You re right, I was not paying attention. It was selecting the last character, which was the underscore and the boundary was the following character. If you do
            Message 5 of 15 , Mar 17 4:21 AM
            • 0 Attachment
              You're right, I was not paying attention. It was selecting the last character, which was the underscore and the boundary was the
              following character. If you do your test with a space following the underscore, you will see what I mean.

              Regards,
              John
              RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/
              John's Mags Yahoo Group: <http://groups.yahoo.com/group/johnsmags/> http://groups.yahoo.com/group/johnsmags/

              From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Axel Berger
              Sent: Saturday, March 16, 2013 23:51
              To: ntb-clips@yahoogroups.com
              Subject: Re: [Clip] Re: \W and underscore


              "flo.gehrke" wrote:
              > As discussed here, the underscore is defined as a normal word character.

              You're absolutely right. I had taken John by his word and not tested
              this.

              In the text

              aaabbbccc
              aaa bbbccc aaabbb ccc aaa bbb ccc
              aaa_bbbccc aaabbb_ccc aaa_bbb_ccc
              aaa _bbb ccc aaa bbb_ ccc aaa _bbb_ ccc
              aaa_ bbb_ccc aaa_bbb _ccc aaa_ bbb _ccc

              the pattern "\bbbb\b" (b was a bad letter choice in hindsight) matches
              the last string in the second and in the fifth line, nothing else.

              Axel



              [Non-text portions of this message have been removed]
            Your message has been successfully submitted and would be delivered to recipients shortly.