Loading ...
Sorry, an error occurred while loading the content.

RE: [NTB] Regular Expression: find between two words

Expand Messages
  • John Shotsky
    I don t think that R will work properly in a negated class in all cases. It is a character class, and R can include more than one character. I have had a
    Message 1 of 15 , Feb 23, 2012
    • 0 Attachment
      I don't think that \R will work properly in a negated class in all cases. It is a character class, and \R can include
      more than one character. I have had a number of errors occur because of this. It is better to use \r\n, as neither of
      those characters will then be included. If you can't duplicate it, just trust me � under certain circumstances, you can
      end up with single character line ends, when you expect the Windows standard, of \r\n.

      As this is actually a clip issue, I copied the clip group, where any further discussion of this functionality should be
      continued, for the benefit of other clippers.

      Regards,
      John
      RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/

      From: notetab@yahoogroups.com [mailto:notetab@yahoogroups.com] On Behalf Of Eb
      Sent: Thursday, February 23, 2012 11:21
      To: notetab@yahoogroups.com
      Subject: Re: [NTB] Regular Expression: find between two words


      James,

      I think you probably just want the text between the two words. That would require a search phrase like:

      "(?<=Word1 )[^\R]*(?= Word2)"

      In the search menu, you will not need the quotes.

      This expression is one of the most useful ways to use regular expressions, so I added a bit of explanation below (note
      the order of the explanations is different from the order of the expressions):

      1. "(?<=Word1 )" finds this word (and a following space) as a LEFT anchor (<=) to the desired text, but does not include
      it in the match.

      3. "(?= Word2)" is the RIGHT anchor to the target text, and will not be included. See NoteTab's Help on regular
      expressions, and look for "Assertions".

      2. "[^\R]*" between the two anchors searches for any and all character other than a newline sequence. Without the neline
      exclusion, the search may find Word2 on some other line, and include everything up to THAT Word2, instead of the one on
      the line you're interested in.

      If you're going to learn anything about regular expressions, THIS is the one I would recommend.

      Cheers,

      Eb

      --- In notetab@yahoogroups.com <mailto:notetab%40yahoogroups.com> , Don <don@...> wrote:
      >
      > Hit find and then tick the regular expression box and then search for this:
      > Word1 (.*) Word2
      >
      > That will find it and assign it to a temporary spot where if you were
      > using a replace you could insert it with $1.
      >
      > On 2/23/2012 12:42 PM, book7reader wrote:
      > > Hello,
      > >
      > > I want to find any text or numbers that exist between two words (the two words are always present in the line but
      what is between the changes.) For example in the line...:
      > >
      > > "Word1 some text or numbers-2 here Word2"
      > >
      > > ...I want to find:
      > >
      > > " some text or numbers-2 here " portion of the text based on the "Word1" and "Word2" delimiters.
      > >
      > > Thanks for your help,
      > >
      > > James



      [Non-text portions of this message have been removed]
    • Eb
      James, CORRECTION to the regular expression I posted. John Shotzky has correctly corrected my previous post. Use (?
      Message 2 of 15 , Feb 23, 2012
      • 0 Attachment
        James,

        CORRECTION to the regular expression I posted. John Shotzky has correctly corrected my previous post.

        Use

        "(?<=Word1 )[^\r\n]*(?= Word2)"

        as the search pattern described earlier.

        Eb
      • Don
        Using .* would be greedy by the way I believe ... is yours greedy? Or if Word2 was included in the to be captured would it stop prematurely?
        Message 3 of 15 , Feb 23, 2012
        • 0 Attachment
          Using .* would be greedy by the way I believe ... is yours greedy? Or
          if Word2 was included in the to be captured would it stop prematurely?

          On 2/23/2012 3:09 PM, Eb wrote:
          > James,
          >
          > CORRECTION to the regular expression I posted. John Shotzky has correctly corrected my previous post.
          >
          > Use
          >
          > "(?<=Word1 )[^\r\n]*(?= Word2)"
          >
          > as the search pattern described earlier.
          >
          > Eb
        • John Wallace
          What would it be if you wanted Word1 to the 1st occurance of the Word2? It seems to go to the last occurrance of Word2 now. John _____ From:
          Message 4 of 15 , Feb 23, 2012
          • 0 Attachment
            What would it be if you wanted Word1 to the 1st occurance of the Word2?

            It seems to go to the last occurrance of Word2 now.

            John

            _____

            From: notetab@yahoogroups.com [mailto:notetab@yahoogroups.com] On Behalf Of Eb
            Sent: Thursday, February 23, 2012 3:10 PM
            To: notetab@yahoogroups.com
            Subject: Re: [NTB] Regular Expression: find between two words




            James,

            CORRECTION to the regular expression I posted. John Shotzky has correctly corrected my previous post.

            Use

            "(?<=Word1 )[^\r\n]*(?= Word2)"

            as the search pattern described earlier.

            Eb






            [Non-text portions of this message have been removed]
          • book7reader
            ... If this is the actual target text: word1 some text here word2 And I paste in: (?
            Message 5 of 15 , Feb 24, 2012
            • 0 Attachment
              ...corrected my previous post.
              >
              > Use
              >
              > "(?<=Word1 )[^\r\n]*(?= Word2)"
              >
              > as the search pattern described earlier.

              If this is the actual target text:

              word1 some text here word2

              And I paste in:

              (?<=word1 )[^\r\n]*(?= word2)

              ...and click the Notetabpro Regular Expression check box in Find,

              It does not find " some text here "

              ?????

              James
            • Eb
              Ahhh! The two (or more) spaces pose a problem. Many computer algorithms interpret multiple spaces as one. NoteTab is confused here. Part of the regex engine
              Message 6 of 15 , Feb 25, 2012
              • 0 Attachment
                Ahhh!

                The two (or more) spaces pose a problem. Many computer algorithms interpret multiple spaces as one. NoteTab is confused here. Part of the regex engine sees the first space individually, but another part interprets the two spaces as one. Since that second part does not have a space left in the target string, it cannot match the string.


                The easy solution is to include one or more spaces in your match.

                If you're ok with including the space in the matches pattern,
                i.e. " some text here ", just eliminate the space from the assertion (anchors), and use "(?<=word1)[^\r\n]*(?=word2)". Or "(?<=word1).*(?=word2)"

                To eliminate a variable size of white space from the match is very tricky without replacing the original text. Better would be to include 1 space in the assertion, and include any remaining spaces in the matched pattern.

                i.e. "(?<=word1 ) *[^\r\n]* *(?= word2)"
                The space followed by an asterisk will allow for zero or more spaces.


                Eb


                --- In notetab@yahoogroups.com, "book7reader" <jim@...> wrote:
                >
                > If this is the actual target text:
                >
                > word1 some text here word2
                >
                > And I paste in:
                >
                > (?<=word1 )[^\r\n]*(?= word2)
                >
                > ...and click the Notetabpro Regular Expression check box in Find,
                >
                > It does not find " some text here "
                >
                > ?????
                >
                > James
                >
              • Eb
                John, Finding the last ocurrence is the default with regular expressions. If that is undesirable, follow the quantifier * with a question mark, i.e. *? .
                Message 7 of 15 , Feb 25, 2012
                • 0 Attachment
                  John,

                  Finding the last ocurrence is the default with regular expressions. If that is undesirable, follow the quantifier '*' with a question mark, i.e. "*?". Then it will find the first occurrence.

                  If a match can occur multple times on the same line / paragraph, and you want to restrict the match, change the pattern

                  from "(?<=Word1 )[^\r\n]*(?= Word2)"
                  to "(?<=Word1 ).*?(?= Word2)"
                  or "(?<=Word1 ) *.*? *(?= Word2)"


                  Eb


                  --- In notetab@yahoogroups.com, "John Wallace" <johnta1@...> wrote:
                  >
                  > What would it be if you wanted Word1 to the 1st occurance of the Word2?
                  >
                  > It seems to go to the last occurrance of Word2 now.
                  >
                  > John
                • flo.gehrke
                  ... There is no interpreting the two spaces as one here. NT is exactly following PCRE rules: The Lookbehind is matching the first space after word1 , and
                  Message 8 of 15 , Feb 25, 2012
                  • 0 Attachment
                    --- In notetab@yahoogroups.com, "Eb" <ebbtidalflats@...> wrote:
                    >
                    > Ahhh!
                    >
                    > The two (or more) spaces pose a problem. Many computer algorithms
                    > interpret multiple spaces as one. NoteTab is confused here. Part
                    > of the regex engine sees the first space individually, but another
                    > part interprets the two spaces as one. Since that second part does
                    > not have a space left in the target string, it cannot match the
                    > string.

                    There is no "interpreting the two spaces as one" here. NT is exactly following PCRE rules: The Lookbehind is matching the first space after 'word1', and the Character Class '[^\r\n]' matches the second space following 'word1'.

                    As you wrote, you just have to remove the spaces from both Assertions in order to include all spaces into the match.

                    Regards,
                    Flo





                    >
                    >
                    > The easy solution is to include one or more spaces in your match.
                    >
                    > If you're ok with including the space in the matches pattern,
                    > i.e. " some text here ", just eliminate the space from the assertion (anchors), and use "(?<=word1)[^\r\n]*(?=word2)". Or "(?<=word1).*(?=word2)"
                    >
                    > To eliminate a variable size of white space from the match is very tricky without replacing the original text. Better would be to include 1 space in the assertion, and include any remaining spaces in the matched pattern.
                    >
                    > i.e. "(?<=word1 ) *[^\r\n]* *(?= word2)"
                    > The space followed by an asterisk will allow for zero or more spaces.
                    >
                    >
                    > Eb
                    >
                    >
                    > --- In notetab@yahoogroups.com, "book7reader" <jim@> wrote:
                    > >
                    > > If this is the actual target text:
                    > >
                    > > word1 some text here word2
                    > >
                    > > And I paste in:
                    > >
                    > > (?<=word1 )[^\r\n]*(?= word2)
                    > >
                    > > ...and click the Notetabpro Regular Expression check box in Find,
                    > >
                    > > It does not find " some text here "
                    > >
                    > > ?????
                    > >
                    > > James
                    > >
                    >
                  • Eb
                    Flo, James reports, that the expression (?
                    Message 9 of 15 , Feb 29, 2012
                    • 0 Attachment
                      Flo,

                      James reports, that the expression

                      "(?<=word1 )[^\r\n]*(?= word2)"

                      fails to find " some text here "
                      (containing a double space on the left.

                      Tests on my PC confirm this bug.

                      Are you saying, YOUR NoteTab succeeds in finding the pattern?

                      That would imply the bug being INCONSISTENCY rather than regular expression handling.

                      On TWO of _my_ systems, NoteTab FAILS to find the double space pattern. This implies, that somewhere between finding the first space in the assertion, and the second space, NoteTab loses track of spaces.



                      Eb


                      --- In notetab@yahoogroups.com, "flo.gehrke" <flo.gehrke@...> wrote:
                      >
                      > --- In notetab@yahoogroups.com, "Eb" <ebbtidalflats@> wrote:
                      > > ... NoteTab is confused here.
                      >
                      > There is no "interpreting the two spaces as one" here. NT is exactly following PCRE rules: The Lookbehind is matching the first space after 'word1', and the Character Class '[^\r\n]' matches the second space following 'word1'.


                      > > --- In notetab@yahoogroups.com, "book7reader" <jim@> wrote:
                      > > >
                      > > > If this is the actual target text:
                      > > >
                      > > > word1 some text here word2

                      > > > (?<=word1 )[^\r\n]*(?= word2)


                      > It does not find " some text here "
                    • John Shotsky
                      I m probably getting into this late, but I m not understanding the problem exactly. I would implement this slightly differently though. If these are the
                      Message 10 of 15 , Feb 29, 2012
                      • 0 Attachment
                        I'm probably getting into this late, but I'm not understanding the problem exactly. I would implement this slightly
                        differently though.

                        If these are the strings: (Notice the extra spaces in the top one)
                        word1 some text here word2
                        word1 some text here word2

                        I would use this:
                        ^!Replace "\bword1\b\K[^\r\n]*(?=\x20word2\b)" >> "" AIRSW
                        Which removes everything between word1 and [sp]word2, leaving this:
                        word1 word2

                        The \b identifies word breaks, preventing partial matches with larger words. All preceding \K is kept. Everything
                        preceding [sp]word2 is removed as unwanted. Not sure why an assertion would be used on the first part. Is there a reason
                        for doing it that way?

                        Regards,
                        John
                        RecipeTools Web Site: http://recipetools.gotdns.com/

                        From: notetab@yahoogroups.com [mailto:notetab@yahoogroups.com] On Behalf Of Eb
                        Sent: Wednesday, February 29, 2012 05:17
                        To: notetab@yahoogroups.com
                        Subject: Re: [NTB] Regular Expression: find between two words

                         
                        Flo,

                        James reports, that the expression

                        "(?<=word1 )[^\r\n]*(?= word2)"

                        fails to find " some text here "
                        (containing a double space on the left.

                        Tests on my PC confirm this bug.

                        Are you saying, YOUR NoteTab succeeds in finding the pattern?

                        That would imply the bug being INCONSISTENCY rather than regular expression handling.

                        On TWO of _my_ systems, NoteTab FAILS to find the double space pattern. This implies, that somewhere between finding the
                        first space in the assertion, and the second space, NoteTab loses track of spaces.

                        Eb

                        --- In notetab@yahoogroups.com, "flo.gehrke" <flo.gehrke@...> wrote:
                        >
                        > --- In notetab@yahoogroups.com, "Eb" <ebbtidalflats@> wrote:
                        > > ... NoteTab is confused here.
                        >
                        > There is no "interpreting the two spaces as one" here. NT is exactly following PCRE rules: The Lookbehind is matching
                        the first space after 'word1', and the Character Class '[^\r\n]' matches the second space following 'word1'.

                        > > --- In notetab@yahoogroups.com, "book7reader" <jim@> wrote:
                        > > >
                        > > > If this is the actual target text:
                        > > >
                        > > > word1 some text here word2

                        > > > (?<=word1 )[^\r\n]*(?= word2)

                        > It does not find " some text here "
                      • Don
                        I m not sure that s/he was looking for a clip and I m not sure that s/he is even a member of clips. This Thread is a bit advanced for the general group and
                        Message 11 of 15 , Feb 29, 2012
                        • 0 Attachment
                          I'm not sure that s/he was looking for a clip and I'm not sure that s/he
                          is even a member of clips. This Thread is a bit advanced for the
                          general group and has been going both places ... I apologize for not
                          knowing where to send it.

                          You suggest this:
                          (?=\x20word2\b)

                          I think in addition to \b there you need to provide for end of file or
                          return at the end of word2? Or are those included in \b? \b in help
                          simply says: "word boundary (only ASCII letters recognized)."

                          I learn a lot in these discussions so thanks to all participating.
                        • flo.gehrke
                          ... Eb, I take for granted that word1 and word2 are part of the string -- without it the RegEx makes no sense. Although a Lookaround doesn t consume any
                          Message 12 of 15 , Feb 29, 2012
                          • 0 Attachment
                            --- In notetab@yahoogroups.com, "Eb" <ebbtidalflats@...> wrote:
                            >
                            > Flo,
                            >
                            > James reports, that the expression
                            >
                            > "(?<=word1 )[^\r\n]*(?= word2)"
                            >
                            > fails to find " some text here "
                            > (containing a double space on the left.
                            >
                            > Tests on my PC confirm this bug.
                            >
                            > Are you saying, YOUR NoteTab succeeds in finding the pattern?

                            Eb,

                            I take for granted that 'word1' and 'word2' are part of the string -- without it the RegEx makes no sense. Although a Lookaround doesn't consume any character those substrings must be there, of course.

                            So we have to test the RegEx against...

                            'word1 some text here word2'

                            (two spaces after 'word1', one space preceding 'word2').

                            As we've said before, after removing the spaces from the Assertions all characters between 'word1' and 'word2', including all spaces, are matched with '(?<=word1)[^\r\n]*(?=word2)'. This is - if I did'nt misunderstand him - what James aimed at, isn't it?

                            For me, this is valid for NT 6.2 Pro / Win XP.

                            Regards,
                            Flo
                          Your message has been successfully submitted and would be delivered to recipients shortly.