Loading ...
Sorry, an error occurred while loading the content.

regular expression:word boundary

Expand Messages
  • flashqa1
    Hello everyone: I have trouble to understand how word boundary b and B works. I search the internet and found a good explaination for word boundary There are
    Message 1 of 3 , May 30, 2006
    • 0 Attachment
      Hello everyone:

      I have trouble to understand how word boundary \b and \B works.

      I search the internet and found a good explaination for word boundary

      There are four different positions that qualify as word boundaries:

      a) Before the first character in the string, if the first
      character is a word character.
      b)After the last character in the string, if the last character is
      a word character.
      c)Between a word character and a non-word character following
      right after the word character.
      d)Between a non-word character and a word character following
      right after the non-word character.

      I understand first a and b, but have trouble to figure out C and D.

      If you can give me a illustration how case c) and d) works I will
      appreciate your help.
      ###############################################
      The example I use for case a and b is:

      1. "Charles the Brit raced his moped through the park."
      2. "The Park Ranger watched Charles do this."

      var reg3 = /\bt/; // " the" and " through" but not "Brit" or "watched"
      var reg4 = /\Bt/; // "Brit" or "watched" but not " the" or " through"

      \bt - pattern matches a word boundary followed by a char 't'
      the-get matched becuase there is a white space (separation point b/w a
      word and non-word chars) before chat 't'
      Brit - won't match because char i preceeds t is not a word boundary


      \Bt-pattern matches a non-word boundary followed by chart 't'

      the- won't match because white space before 't' is a word boundary. so
      it failed

      Brit-matches becuase i is a charter is not a word boundary so Brit matches
    • Jonas Raoni
      ... c) alert( x+ .match(/ w b W/)); d) alert( +x .match(/ W b w/)); -- Jonas Raoni Soares Silva http://www.jsfromhell.com
      Message 2 of 3 , May 30, 2006
      • 0 Attachment
        On 5/30/06, flashqa1 <flashqa1@...> wrote:
        > c)Between a word character and a non-word character following
        > right after the word character.
        > d)Between a non-word character and a word character following
        > right after the non-word character.

        c) alert("x+".match(/\w\b\W/));
        d) alert("+x".match(/\W\b\w/));


        --
        Jonas Raoni Soares Silva
        http://www.jsfromhell.com
      • liorean
        ... Let s try explaining it: First, something about strings... You ve been taught string characters are 0 indiced, right? First character is character number
        Message 3 of 3 , May 30, 2006
        • 0 Attachment
          On 30/05/06, flashqa1 <flashqa1@...> wrote:
          > Hello everyone:
          > There are four different positions that qualify as word boundaries:
          >
          > a) Before the first character in the string, if the first
          > character is a word character.
          > b)After the last character in the string, if the last character is
          > a word character.
          > c)Between a word character and a non-word character following
          > right after the word character.
          > d)Between a non-word character and a word character following
          > right after the non-word character.
          >
          > I understand first a and b, but have trouble to figure out C and D.

          Let's try explaining it:
          First, something about strings... You've been taught string characters
          are 0 indiced, right? First character is character number 0?
          Forget that. The index doesn't represent a character - the index
          represents the place between characters. Index 0 is the place before
          any characters in the string. Index 1 is the place between the first
          character and the second, and so on. The last index is the same as the
          length of the string. This index is the place after the last character
          in the string.

          "Why should I be thinking like this? I thought the old model worked
          just fine..."
          Well, the thing is, with regex you can match either a character before
          or after the index - or you can match the index. In other words you
          can match the place between characters. Start of string, end of
          string, start of line, end of line, word boundary. These are all
          examples of matching between characters.

          So, we're matching the places between characters and not the actual
          characters. How does the matching work? Well it's a boolean test
          really:

          isWordBoundary(index) = isWordCharacter(characterLeftOf(index))
          XOR isWordCharacter(characterRightOf(index))

          Which translated to English would read out: If, and only it, either
          the character to the left of the index or the character to the right
          of the index is a word character, this index is a word boundary. If
          both or neither are word characters, this index is not a word
          boundary.

          And a results table for that would look like this:
          Left
          is \w isn't \w
          Right ---- ----
          is \w : is \B is \b
          isn't \w: is \b is \B

          > If you can give me a illustration how case c) and d) works I will
          > appreciate your help.
          > ###############################################
          > The example I use for case a and b is:

          These examples actually show cases c and d. The cases a and b only
          occurs if the character you're looking at is the first or last
          character of the string, respectively.

          > 1. "Charles the Brit raced his moped through the park."
          > 2. "The Park Ranger watched Charles do this."
          >
          > var reg3 = /\bt/; // " the" and " through" but not "Brit" or "watched"

          /\bt/ = <A word boundary followed by a 't' character>
          Since 't' is a word character you can translate this into:
          /\bt/ = <A 't' character which directly follows a non-word character>

          Space is non-word character, thus the 't' characters in "the" and
          "through" both match. 'i' and 'a' are word characters though, thus the
          't' characters in "Brit" and "watched" don't match.

          > var reg4 = /\Bt/; // "Brit" or "watched" but not " the" or " through"

          /\bt/ = <A non-word-boundary followed by a 't' character>
          Since 't' is a word character you can translate this into:
          /\bt/ = <A 't' character which directly follows a word character>

          'i' and 'a' are word characters, thus the 't' characters in "Brit" and
          "watched" match. Space is a non-word character though, thus the 't'
          characters in "the" and "through" don't match.


          Cases a and b:
          var
          str='test',
          re0=/\bt/,
          re1=/\Bt/,
          re2=/t\b/,
          re3=/t\B/;

          function report(arr){
          var a=[],i;
          for(i in arr)
          a.push(i+': '+arr[i]);
          alert(a.join('\n'));
          }

          report(re0.exec(str)); // => 0:'t' index:0 input:'test'
          report(re1.exec(str)); // => 0:'t' index:3 input:'test'
          report(re2.exec(str)); // => 0:'t' index:3 input:'test'
          report(re3.exec(str)); // => 0:'t' index:0 input:'test'

          Here you see that start of string and end of string are entered in the
          results table I listed above as "isn't \w". Since 't' is a word
          character, thus /\bt/ and /t\b/ match where the 't' is preceded by
          start of string or followed by end of string respectively. /\Bt/ and
          /t\B/ are the opposite and don't match where the 't' i preceded by
          start of string or followed by end of string respectively.
          --
          David "liorean" Andersson
          <uri:http://liorean.web-graphics.com/>
        Your message has been successfully submitted and would be delivered to recipients shortly.