Loading ...
Sorry, an error occurred while loading the content.

Re: [NTB] Regular Expression: find between two words

Expand Messages
  • Don
    Hit find and then tick the regular expression box and then search for this: Word1 (.*) Word2 That will find it and assign it to a temporary spot where if you
    Message 1 of 15 , Feb 23, 2012
    • 0 Attachment
      Hit find and then tick the regular expression box and then search for this:
      Word1 (.*) Word2

      That will find it and assign it to a temporary spot where if you were
      using a replace you could insert it with $1.

      On 2/23/2012 12:42 PM, book7reader wrote:
      > Hello,
      >
      > I want to find any text or numbers that exist between two words (the two words are always present in the line but what is between the changes.) For example in the line...:
      >
      > "Word1 some text or numbers-2 here Word2"
      >
      > ...I want to find:
      >
      > " some text or numbers-2 here " portion of the text based on the "Word1" and "Word2" delimiters.
      >
      > Thanks for your help,
      >
      > James
      >
      >
      >
      >
      >
      > ------------------------------------
      >
      > Fookes Software: http://www.fookes.com/
      > NoteTab website: http://www.notetab.com/
      > NoteTab Discussion Lists: http://www.notetab.com/groups.php
      >
      > ***
      > Yahoo! Groups Links
      >
      >
      >
      >
    • Eb
      James, I think you probably just want the text between the two words. That would require a search phrase like: (?
      Message 2 of 15 , Feb 23, 2012
      • 0 Attachment
        James,

        I think you probably just want the text between the two words. That would require a search phrase like:

        "(?<=Word1 )[^\R]*(?= Word2)"

        In the search menu, you will not need the quotes.

        This expression is one of the most useful ways to use regular expressions, so I added a bit of explanation below (note the order of the explanations is different from the order of the expressions):

        1. "(?<=Word1 )" finds this word (and a following space) as a LEFT anchor (<=) to the desired text, but does not include it in the match.

        3. "(?= Word2)" is the RIGHT anchor to the target text, and will not be included. See NoteTab's Help on regular expressions, and look for "Assertions".

        2. "[^\R]*" between the two anchors searches for any and all character other than a newline sequence. Without the neline exclusion, the search may find Word2 on some other line, and include everything up to THAT Word2, instead of the one on the line you're interested in.


        If you're going to learn anything about regular expressions, THIS is the one I would recommend.


        Cheers,


        Eb

        --- In notetab@yahoogroups.com, Don <don@...> wrote:
        >
        > Hit find and then tick the regular expression box and then search for this:
        > Word1 (.*) Word2
        >
        > That will find it and assign it to a temporary spot where if you were
        > using a replace you could insert it with $1.
        >
        > On 2/23/2012 12:42 PM, book7reader wrote:
        > > Hello,
        > >
        > > I want to find any text or numbers that exist between two words (the two words are always present in the line but what is between the changes.) For example in the line...:
        > >
        > > "Word1 some text or numbers-2 here Word2"
        > >
        > > ...I want to find:
        > >
        > > " some text or numbers-2 here " portion of the text based on the "Word1" and "Word2" delimiters.
        > >
        > > Thanks for your help,
        > >
        > > James
      • John Shotsky
        I don t think that R will work properly in a negated class in all cases. It is a character class, and R can include more than one character. I have had a
        Message 3 of 15 , Feb 23, 2012
        • 0 Attachment
          I don't think that \R will work properly in a negated class in all cases. It is a character class, and \R can include
          more than one character. I have had a number of errors occur because of this. It is better to use \r\n, as neither of
          those characters will then be included. If you can't duplicate it, just trust me � under certain circumstances, you can
          end up with single character line ends, when you expect the Windows standard, of \r\n.

          As this is actually a clip issue, I copied the clip group, where any further discussion of this functionality should be
          continued, for the benefit of other clippers.

          Regards,
          John
          RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/

          From: notetab@yahoogroups.com [mailto:notetab@yahoogroups.com] On Behalf Of Eb
          Sent: Thursday, February 23, 2012 11:21
          To: notetab@yahoogroups.com
          Subject: Re: [NTB] Regular Expression: find between two words


          James,

          I think you probably just want the text between the two words. That would require a search phrase like:

          "(?<=Word1 )[^\R]*(?= Word2)"

          In the search menu, you will not need the quotes.

          This expression is one of the most useful ways to use regular expressions, so I added a bit of explanation below (note
          the order of the explanations is different from the order of the expressions):

          1. "(?<=Word1 )" finds this word (and a following space) as a LEFT anchor (<=) to the desired text, but does not include
          it in the match.

          3. "(?= Word2)" is the RIGHT anchor to the target text, and will not be included. See NoteTab's Help on regular
          expressions, and look for "Assertions".

          2. "[^\R]*" between the two anchors searches for any and all character other than a newline sequence. Without the neline
          exclusion, the search may find Word2 on some other line, and include everything up to THAT Word2, instead of the one on
          the line you're interested in.

          If you're going to learn anything about regular expressions, THIS is the one I would recommend.

          Cheers,

          Eb

          --- In notetab@yahoogroups.com <mailto:notetab%40yahoogroups.com> , Don <don@...> wrote:
          >
          > Hit find and then tick the regular expression box and then search for this:
          > Word1 (.*) Word2
          >
          > That will find it and assign it to a temporary spot where if you were
          > using a replace you could insert it with $1.
          >
          > On 2/23/2012 12:42 PM, book7reader wrote:
          > > Hello,
          > >
          > > I want to find any text or numbers that exist between two words (the two words are always present in the line but
          what is between the changes.) For example in the line...:
          > >
          > > "Word1 some text or numbers-2 here Word2"
          > >
          > > ...I want to find:
          > >
          > > " some text or numbers-2 here " portion of the text based on the "Word1" and "Word2" delimiters.
          > >
          > > Thanks for your help,
          > >
          > > James



          [Non-text portions of this message have been removed]
        • Eb
          James, CORRECTION to the regular expression I posted. John Shotzky has correctly corrected my previous post. Use (?
          Message 4 of 15 , Feb 23, 2012
          • 0 Attachment
            James,

            CORRECTION to the regular expression I posted. John Shotzky has correctly corrected my previous post.

            Use

            "(?<=Word1 )[^\r\n]*(?= Word2)"

            as the search pattern described earlier.

            Eb
          • Don
            Using .* would be greedy by the way I believe ... is yours greedy? Or if Word2 was included in the to be captured would it stop prematurely?
            Message 5 of 15 , Feb 23, 2012
            • 0 Attachment
              Using .* would be greedy by the way I believe ... is yours greedy? Or
              if Word2 was included in the to be captured would it stop prematurely?

              On 2/23/2012 3:09 PM, Eb wrote:
              > James,
              >
              > CORRECTION to the regular expression I posted. John Shotzky has correctly corrected my previous post.
              >
              > Use
              >
              > "(?<=Word1 )[^\r\n]*(?= Word2)"
              >
              > as the search pattern described earlier.
              >
              > Eb
            • John Wallace
              What would it be if you wanted Word1 to the 1st occurance of the Word2? It seems to go to the last occurrance of Word2 now. John _____ From:
              Message 6 of 15 , Feb 23, 2012
              • 0 Attachment
                What would it be if you wanted Word1 to the 1st occurance of the Word2?

                It seems to go to the last occurrance of Word2 now.

                John

                _____

                From: notetab@yahoogroups.com [mailto:notetab@yahoogroups.com] On Behalf Of Eb
                Sent: Thursday, February 23, 2012 3:10 PM
                To: notetab@yahoogroups.com
                Subject: Re: [NTB] Regular Expression: find between two words




                James,

                CORRECTION to the regular expression I posted. John Shotzky has correctly corrected my previous post.

                Use

                "(?<=Word1 )[^\r\n]*(?= Word2)"

                as the search pattern described earlier.

                Eb






                [Non-text portions of this message have been removed]
              • book7reader
                ... If this is the actual target text: word1 some text here word2 And I paste in: (?
                Message 7 of 15 , Feb 24, 2012
                • 0 Attachment
                  ...corrected my previous post.
                  >
                  > Use
                  >
                  > "(?<=Word1 )[^\r\n]*(?= Word2)"
                  >
                  > as the search pattern described earlier.

                  If this is the actual target text:

                  word1 some text here word2

                  And I paste in:

                  (?<=word1 )[^\r\n]*(?= word2)

                  ...and click the Notetabpro Regular Expression check box in Find,

                  It does not find " some text here "

                  ?????

                  James
                • Eb
                  Ahhh! The two (or more) spaces pose a problem. Many computer algorithms interpret multiple spaces as one. NoteTab is confused here. Part of the regex engine
                  Message 8 of 15 , Feb 25, 2012
                  • 0 Attachment
                    Ahhh!

                    The two (or more) spaces pose a problem. Many computer algorithms interpret multiple spaces as one. NoteTab is confused here. Part of the regex engine sees the first space individually, but another part interprets the two spaces as one. Since that second part does not have a space left in the target string, it cannot match the string.


                    The easy solution is to include one or more spaces in your match.

                    If you're ok with including the space in the matches pattern,
                    i.e. " some text here ", just eliminate the space from the assertion (anchors), and use "(?<=word1)[^\r\n]*(?=word2)". Or "(?<=word1).*(?=word2)"

                    To eliminate a variable size of white space from the match is very tricky without replacing the original text. Better would be to include 1 space in the assertion, and include any remaining spaces in the matched pattern.

                    i.e. "(?<=word1 ) *[^\r\n]* *(?= word2)"
                    The space followed by an asterisk will allow for zero or more spaces.


                    Eb


                    --- In notetab@yahoogroups.com, "book7reader" <jim@...> wrote:
                    >
                    > If this is the actual target text:
                    >
                    > word1 some text here word2
                    >
                    > And I paste in:
                    >
                    > (?<=word1 )[^\r\n]*(?= word2)
                    >
                    > ...and click the Notetabpro Regular Expression check box in Find,
                    >
                    > It does not find " some text here "
                    >
                    > ?????
                    >
                    > James
                    >
                  • Eb
                    John, Finding the last ocurrence is the default with regular expressions. If that is undesirable, follow the quantifier * with a question mark, i.e. *? .
                    Message 9 of 15 , Feb 25, 2012
                    • 0 Attachment
                      John,

                      Finding the last ocurrence is the default with regular expressions. If that is undesirable, follow the quantifier '*' with a question mark, i.e. "*?". Then it will find the first occurrence.

                      If a match can occur multple times on the same line / paragraph, and you want to restrict the match, change the pattern

                      from "(?<=Word1 )[^\r\n]*(?= Word2)"
                      to "(?<=Word1 ).*?(?= Word2)"
                      or "(?<=Word1 ) *.*? *(?= Word2)"


                      Eb


                      --- In notetab@yahoogroups.com, "John Wallace" <johnta1@...> wrote:
                      >
                      > What would it be if you wanted Word1 to the 1st occurance of the Word2?
                      >
                      > It seems to go to the last occurrance of Word2 now.
                      >
                      > John
                    • flo.gehrke
                      ... There is no interpreting the two spaces as one here. NT is exactly following PCRE rules: The Lookbehind is matching the first space after word1 , and
                      Message 10 of 15 , Feb 25, 2012
                      • 0 Attachment
                        --- In notetab@yahoogroups.com, "Eb" <ebbtidalflats@...> wrote:
                        >
                        > Ahhh!
                        >
                        > The two (or more) spaces pose a problem. Many computer algorithms
                        > interpret multiple spaces as one. NoteTab is confused here. Part
                        > of the regex engine sees the first space individually, but another
                        > part interprets the two spaces as one. Since that second part does
                        > not have a space left in the target string, it cannot match the
                        > string.

                        There is no "interpreting the two spaces as one" here. NT is exactly following PCRE rules: The Lookbehind is matching the first space after 'word1', and the Character Class '[^\r\n]' matches the second space following 'word1'.

                        As you wrote, you just have to remove the spaces from both Assertions in order to include all spaces into the match.

                        Regards,
                        Flo





                        >
                        >
                        > The easy solution is to include one or more spaces in your match.
                        >
                        > If you're ok with including the space in the matches pattern,
                        > i.e. " some text here ", just eliminate the space from the assertion (anchors), and use "(?<=word1)[^\r\n]*(?=word2)". Or "(?<=word1).*(?=word2)"
                        >
                        > To eliminate a variable size of white space from the match is very tricky without replacing the original text. Better would be to include 1 space in the assertion, and include any remaining spaces in the matched pattern.
                        >
                        > i.e. "(?<=word1 ) *[^\r\n]* *(?= word2)"
                        > The space followed by an asterisk will allow for zero or more spaces.
                        >
                        >
                        > Eb
                        >
                        >
                        > --- In notetab@yahoogroups.com, "book7reader" <jim@> wrote:
                        > >
                        > > If this is the actual target text:
                        > >
                        > > word1 some text here word2
                        > >
                        > > And I paste in:
                        > >
                        > > (?<=word1 )[^\r\n]*(?= word2)
                        > >
                        > > ...and click the Notetabpro Regular Expression check box in Find,
                        > >
                        > > It does not find " some text here "
                        > >
                        > > ?????
                        > >
                        > > James
                        > >
                        >
                      • Eb
                        Flo, James reports, that the expression (?
                        Message 11 of 15 , Feb 29, 2012
                        • 0 Attachment
                          Flo,

                          James reports, that the expression

                          "(?<=word1 )[^\r\n]*(?= word2)"

                          fails to find " some text here "
                          (containing a double space on the left.

                          Tests on my PC confirm this bug.

                          Are you saying, YOUR NoteTab succeeds in finding the pattern?

                          That would imply the bug being INCONSISTENCY rather than regular expression handling.

                          On TWO of _my_ systems, NoteTab FAILS to find the double space pattern. This implies, that somewhere between finding the first space in the assertion, and the second space, NoteTab loses track of spaces.



                          Eb


                          --- In notetab@yahoogroups.com, "flo.gehrke" <flo.gehrke@...> wrote:
                          >
                          > --- In notetab@yahoogroups.com, "Eb" <ebbtidalflats@> wrote:
                          > > ... NoteTab is confused here.
                          >
                          > There is no "interpreting the two spaces as one" here. NT is exactly following PCRE rules: The Lookbehind is matching the first space after 'word1', and the Character Class '[^\r\n]' matches the second space following 'word1'.


                          > > --- In notetab@yahoogroups.com, "book7reader" <jim@> wrote:
                          > > >
                          > > > If this is the actual target text:
                          > > >
                          > > > word1 some text here word2

                          > > > (?<=word1 )[^\r\n]*(?= word2)


                          > It does not find " some text here "
                        • John Shotsky
                          I m probably getting into this late, but I m not understanding the problem exactly. I would implement this slightly differently though. If these are the
                          Message 12 of 15 , Feb 29, 2012
                          • 0 Attachment
                            I'm probably getting into this late, but I'm not understanding the problem exactly. I would implement this slightly
                            differently though.

                            If these are the strings: (Notice the extra spaces in the top one)
                            word1 some text here word2
                            word1 some text here word2

                            I would use this:
                            ^!Replace "\bword1\b\K[^\r\n]*(?=\x20word2\b)" >> "" AIRSW
                            Which removes everything between word1 and [sp]word2, leaving this:
                            word1 word2

                            The \b identifies word breaks, preventing partial matches with larger words. All preceding \K is kept. Everything
                            preceding [sp]word2 is removed as unwanted. Not sure why an assertion would be used on the first part. Is there a reason
                            for doing it that way?

                            Regards,
                            John
                            RecipeTools Web Site: http://recipetools.gotdns.com/

                            From: notetab@yahoogroups.com [mailto:notetab@yahoogroups.com] On Behalf Of Eb
                            Sent: Wednesday, February 29, 2012 05:17
                            To: notetab@yahoogroups.com
                            Subject: Re: [NTB] Regular Expression: find between two words

                             
                            Flo,

                            James reports, that the expression

                            "(?<=word1 )[^\r\n]*(?= word2)"

                            fails to find " some text here "
                            (containing a double space on the left.

                            Tests on my PC confirm this bug.

                            Are you saying, YOUR NoteTab succeeds in finding the pattern?

                            That would imply the bug being INCONSISTENCY rather than regular expression handling.

                            On TWO of _my_ systems, NoteTab FAILS to find the double space pattern. This implies, that somewhere between finding the
                            first space in the assertion, and the second space, NoteTab loses track of spaces.

                            Eb

                            --- In notetab@yahoogroups.com, "flo.gehrke" <flo.gehrke@...> wrote:
                            >
                            > --- In notetab@yahoogroups.com, "Eb" <ebbtidalflats@> wrote:
                            > > ... NoteTab is confused here.
                            >
                            > There is no "interpreting the two spaces as one" here. NT is exactly following PCRE rules: The Lookbehind is matching
                            the first space after 'word1', and the Character Class '[^\r\n]' matches the second space following 'word1'.

                            > > --- In notetab@yahoogroups.com, "book7reader" <jim@> wrote:
                            > > >
                            > > > If this is the actual target text:
                            > > >
                            > > > word1 some text here word2

                            > > > (?<=word1 )[^\r\n]*(?= word2)

                            > It does not find " some text here "
                          • Don
                            I m not sure that s/he was looking for a clip and I m not sure that s/he is even a member of clips. This Thread is a bit advanced for the general group and
                            Message 13 of 15 , Feb 29, 2012
                            • 0 Attachment
                              I'm not sure that s/he was looking for a clip and I'm not sure that s/he
                              is even a member of clips. This Thread is a bit advanced for the
                              general group and has been going both places ... I apologize for not
                              knowing where to send it.

                              You suggest this:
                              (?=\x20word2\b)

                              I think in addition to \b there you need to provide for end of file or
                              return at the end of word2? Or are those included in \b? \b in help
                              simply says: "word boundary (only ASCII letters recognized)."

                              I learn a lot in these discussions so thanks to all participating.
                            • flo.gehrke
                              ... Eb, I take for granted that word1 and word2 are part of the string -- without it the RegEx makes no sense. Although a Lookaround doesn t consume any
                              Message 14 of 15 , Feb 29, 2012
                              • 0 Attachment
                                --- In notetab@yahoogroups.com, "Eb" <ebbtidalflats@...> wrote:
                                >
                                > Flo,
                                >
                                > James reports, that the expression
                                >
                                > "(?<=word1 )[^\r\n]*(?= word2)"
                                >
                                > fails to find " some text here "
                                > (containing a double space on the left.
                                >
                                > Tests on my PC confirm this bug.
                                >
                                > Are you saying, YOUR NoteTab succeeds in finding the pattern?

                                Eb,

                                I take for granted that 'word1' and 'word2' are part of the string -- without it the RegEx makes no sense. Although a Lookaround doesn't consume any character those substrings must be there, of course.

                                So we have to test the RegEx against...

                                'word1 some text here word2'

                                (two spaces after 'word1', one space preceding 'word2').

                                As we've said before, after removing the spaces from the Assertions all characters between 'word1' and 'word2', including all spaces, are matched with '(?<=word1)[^\r\n]*(?=word2)'. This is - if I did'nt misunderstand him - what James aimed at, isn't it?

                                For me, this is valid for NT 6.2 Pro / Win XP.

                                Regards,
                                Flo
                              Your message has been successfully submitted and would be delivered to recipients shortly.