Loading ...
Sorry, an error occurred while loading the content.

Regular Expression: find between two words

Expand Messages
  • book7reader
    Hello, I want to find any text or numbers that exist between two words (the two words are always present in the line but what is between the changes.) For
    Message 1 of 15 , Feb 23, 2012
    • 0 Attachment
      Hello,

      I want to find any text or numbers that exist between two words (the two words are always present in the line but what is between the changes.) For example in the line...:

      "Word1 some text or numbers-2 here Word2"

      ...I want to find:

      " some text or numbers-2 here " portion of the text based on the "Word1" and "Word2" delimiters.

      Thanks for your help,

      James
    • Don
      Hit find and then tick the regular expression box and then search for this: Word1 (.*) Word2 That will find it and assign it to a temporary spot where if you
      Message 2 of 15 , Feb 23, 2012
      • 0 Attachment
        Hit find and then tick the regular expression box and then search for this:
        Word1 (.*) Word2

        That will find it and assign it to a temporary spot where if you were
        using a replace you could insert it with $1.

        On 2/23/2012 12:42 PM, book7reader wrote:
        > Hello,
        >
        > I want to find any text or numbers that exist between two words (the two words are always present in the line but what is between the changes.) For example in the line...:
        >
        > "Word1 some text or numbers-2 here Word2"
        >
        > ...I want to find:
        >
        > " some text or numbers-2 here " portion of the text based on the "Word1" and "Word2" delimiters.
        >
        > Thanks for your help,
        >
        > James
        >
        >
        >
        >
        >
        > ------------------------------------
        >
        > Fookes Software: http://www.fookes.com/
        > NoteTab website: http://www.notetab.com/
        > NoteTab Discussion Lists: http://www.notetab.com/groups.php
        >
        > ***
        > Yahoo! Groups Links
        >
        >
        >
        >
      • Eb
        James, I think you probably just want the text between the two words. That would require a search phrase like: (?
        Message 3 of 15 , Feb 23, 2012
        • 0 Attachment
          James,

          I think you probably just want the text between the two words. That would require a search phrase like:

          "(?<=Word1 )[^\R]*(?= Word2)"

          In the search menu, you will not need the quotes.

          This expression is one of the most useful ways to use regular expressions, so I added a bit of explanation below (note the order of the explanations is different from the order of the expressions):

          1. "(?<=Word1 )" finds this word (and a following space) as a LEFT anchor (<=) to the desired text, but does not include it in the match.

          3. "(?= Word2)" is the RIGHT anchor to the target text, and will not be included. See NoteTab's Help on regular expressions, and look for "Assertions".

          2. "[^\R]*" between the two anchors searches for any and all character other than a newline sequence. Without the neline exclusion, the search may find Word2 on some other line, and include everything up to THAT Word2, instead of the one on the line you're interested in.


          If you're going to learn anything about regular expressions, THIS is the one I would recommend.


          Cheers,


          Eb

          --- In notetab@yahoogroups.com, Don <don@...> wrote:
          >
          > Hit find and then tick the regular expression box and then search for this:
          > Word1 (.*) Word2
          >
          > That will find it and assign it to a temporary spot where if you were
          > using a replace you could insert it with $1.
          >
          > On 2/23/2012 12:42 PM, book7reader wrote:
          > > Hello,
          > >
          > > I want to find any text or numbers that exist between two words (the two words are always present in the line but what is between the changes.) For example in the line...:
          > >
          > > "Word1 some text or numbers-2 here Word2"
          > >
          > > ...I want to find:
          > >
          > > " some text or numbers-2 here " portion of the text based on the "Word1" and "Word2" delimiters.
          > >
          > > Thanks for your help,
          > >
          > > James
        • John Shotsky
          I don t think that R will work properly in a negated class in all cases. It is a character class, and R can include more than one character. I have had a
          Message 4 of 15 , Feb 23, 2012
          • 0 Attachment
            I don't think that \R will work properly in a negated class in all cases. It is a character class, and \R can include
            more than one character. I have had a number of errors occur because of this. It is better to use \r\n, as neither of
            those characters will then be included. If you can't duplicate it, just trust me � under certain circumstances, you can
            end up with single character line ends, when you expect the Windows standard, of \r\n.

            As this is actually a clip issue, I copied the clip group, where any further discussion of this functionality should be
            continued, for the benefit of other clippers.

            Regards,
            John
            RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/

            From: notetab@yahoogroups.com [mailto:notetab@yahoogroups.com] On Behalf Of Eb
            Sent: Thursday, February 23, 2012 11:21
            To: notetab@yahoogroups.com
            Subject: Re: [NTB] Regular Expression: find between two words


            James,

            I think you probably just want the text between the two words. That would require a search phrase like:

            "(?<=Word1 )[^\R]*(?= Word2)"

            In the search menu, you will not need the quotes.

            This expression is one of the most useful ways to use regular expressions, so I added a bit of explanation below (note
            the order of the explanations is different from the order of the expressions):

            1. "(?<=Word1 )" finds this word (and a following space) as a LEFT anchor (<=) to the desired text, but does not include
            it in the match.

            3. "(?= Word2)" is the RIGHT anchor to the target text, and will not be included. See NoteTab's Help on regular
            expressions, and look for "Assertions".

            2. "[^\R]*" between the two anchors searches for any and all character other than a newline sequence. Without the neline
            exclusion, the search may find Word2 on some other line, and include everything up to THAT Word2, instead of the one on
            the line you're interested in.

            If you're going to learn anything about regular expressions, THIS is the one I would recommend.

            Cheers,

            Eb

            --- In notetab@yahoogroups.com <mailto:notetab%40yahoogroups.com> , Don <don@...> wrote:
            >
            > Hit find and then tick the regular expression box and then search for this:
            > Word1 (.*) Word2
            >
            > That will find it and assign it to a temporary spot where if you were
            > using a replace you could insert it with $1.
            >
            > On 2/23/2012 12:42 PM, book7reader wrote:
            > > Hello,
            > >
            > > I want to find any text or numbers that exist between two words (the two words are always present in the line but
            what is between the changes.) For example in the line...:
            > >
            > > "Word1 some text or numbers-2 here Word2"
            > >
            > > ...I want to find:
            > >
            > > " some text or numbers-2 here " portion of the text based on the "Word1" and "Word2" delimiters.
            > >
            > > Thanks for your help,
            > >
            > > James



            [Non-text portions of this message have been removed]
          • Eb
            James, CORRECTION to the regular expression I posted. John Shotzky has correctly corrected my previous post. Use (?
            Message 5 of 15 , Feb 23, 2012
            • 0 Attachment
              James,

              CORRECTION to the regular expression I posted. John Shotzky has correctly corrected my previous post.

              Use

              "(?<=Word1 )[^\r\n]*(?= Word2)"

              as the search pattern described earlier.

              Eb
            • Don
              Using .* would be greedy by the way I believe ... is yours greedy? Or if Word2 was included in the to be captured would it stop prematurely?
              Message 6 of 15 , Feb 23, 2012
              • 0 Attachment
                Using .* would be greedy by the way I believe ... is yours greedy? Or
                if Word2 was included in the to be captured would it stop prematurely?

                On 2/23/2012 3:09 PM, Eb wrote:
                > James,
                >
                > CORRECTION to the regular expression I posted. John Shotzky has correctly corrected my previous post.
                >
                > Use
                >
                > "(?<=Word1 )[^\r\n]*(?= Word2)"
                >
                > as the search pattern described earlier.
                >
                > Eb
              • John Wallace
                What would it be if you wanted Word1 to the 1st occurance of the Word2? It seems to go to the last occurrance of Word2 now. John _____ From:
                Message 7 of 15 , Feb 23, 2012
                • 0 Attachment
                  What would it be if you wanted Word1 to the 1st occurance of the Word2?

                  It seems to go to the last occurrance of Word2 now.

                  John

                  _____

                  From: notetab@yahoogroups.com [mailto:notetab@yahoogroups.com] On Behalf Of Eb
                  Sent: Thursday, February 23, 2012 3:10 PM
                  To: notetab@yahoogroups.com
                  Subject: Re: [NTB] Regular Expression: find between two words




                  James,

                  CORRECTION to the regular expression I posted. John Shotzky has correctly corrected my previous post.

                  Use

                  "(?<=Word1 )[^\r\n]*(?= Word2)"

                  as the search pattern described earlier.

                  Eb






                  [Non-text portions of this message have been removed]
                • book7reader
                  ... If this is the actual target text: word1 some text here word2 And I paste in: (?
                  Message 8 of 15 , Feb 24, 2012
                  • 0 Attachment
                    ...corrected my previous post.
                    >
                    > Use
                    >
                    > "(?<=Word1 )[^\r\n]*(?= Word2)"
                    >
                    > as the search pattern described earlier.

                    If this is the actual target text:

                    word1 some text here word2

                    And I paste in:

                    (?<=word1 )[^\r\n]*(?= word2)

                    ...and click the Notetabpro Regular Expression check box in Find,

                    It does not find " some text here "

                    ?????

                    James
                  • Eb
                    Ahhh! The two (or more) spaces pose a problem. Many computer algorithms interpret multiple spaces as one. NoteTab is confused here. Part of the regex engine
                    Message 9 of 15 , Feb 25, 2012
                    • 0 Attachment
                      Ahhh!

                      The two (or more) spaces pose a problem. Many computer algorithms interpret multiple spaces as one. NoteTab is confused here. Part of the regex engine sees the first space individually, but another part interprets the two spaces as one. Since that second part does not have a space left in the target string, it cannot match the string.


                      The easy solution is to include one or more spaces in your match.

                      If you're ok with including the space in the matches pattern,
                      i.e. " some text here ", just eliminate the space from the assertion (anchors), and use "(?<=word1)[^\r\n]*(?=word2)". Or "(?<=word1).*(?=word2)"

                      To eliminate a variable size of white space from the match is very tricky without replacing the original text. Better would be to include 1 space in the assertion, and include any remaining spaces in the matched pattern.

                      i.e. "(?<=word1 ) *[^\r\n]* *(?= word2)"
                      The space followed by an asterisk will allow for zero or more spaces.


                      Eb


                      --- In notetab@yahoogroups.com, "book7reader" <jim@...> wrote:
                      >
                      > If this is the actual target text:
                      >
                      > word1 some text here word2
                      >
                      > And I paste in:
                      >
                      > (?<=word1 )[^\r\n]*(?= word2)
                      >
                      > ...and click the Notetabpro Regular Expression check box in Find,
                      >
                      > It does not find " some text here "
                      >
                      > ?????
                      >
                      > James
                      >
                    • Eb
                      John, Finding the last ocurrence is the default with regular expressions. If that is undesirable, follow the quantifier * with a question mark, i.e. *? .
                      Message 10 of 15 , Feb 25, 2012
                      • 0 Attachment
                        John,

                        Finding the last ocurrence is the default with regular expressions. If that is undesirable, follow the quantifier '*' with a question mark, i.e. "*?". Then it will find the first occurrence.

                        If a match can occur multple times on the same line / paragraph, and you want to restrict the match, change the pattern

                        from "(?<=Word1 )[^\r\n]*(?= Word2)"
                        to "(?<=Word1 ).*?(?= Word2)"
                        or "(?<=Word1 ) *.*? *(?= Word2)"


                        Eb


                        --- In notetab@yahoogroups.com, "John Wallace" <johnta1@...> wrote:
                        >
                        > What would it be if you wanted Word1 to the 1st occurance of the Word2?
                        >
                        > It seems to go to the last occurrance of Word2 now.
                        >
                        > John
                      • flo.gehrke
                        ... There is no interpreting the two spaces as one here. NT is exactly following PCRE rules: The Lookbehind is matching the first space after word1 , and
                        Message 11 of 15 , Feb 25, 2012
                        • 0 Attachment
                          --- In notetab@yahoogroups.com, "Eb" <ebbtidalflats@...> wrote:
                          >
                          > Ahhh!
                          >
                          > The two (or more) spaces pose a problem. Many computer algorithms
                          > interpret multiple spaces as one. NoteTab is confused here. Part
                          > of the regex engine sees the first space individually, but another
                          > part interprets the two spaces as one. Since that second part does
                          > not have a space left in the target string, it cannot match the
                          > string.

                          There is no "interpreting the two spaces as one" here. NT is exactly following PCRE rules: The Lookbehind is matching the first space after 'word1', and the Character Class '[^\r\n]' matches the second space following 'word1'.

                          As you wrote, you just have to remove the spaces from both Assertions in order to include all spaces into the match.

                          Regards,
                          Flo





                          >
                          >
                          > The easy solution is to include one or more spaces in your match.
                          >
                          > If you're ok with including the space in the matches pattern,
                          > i.e. " some text here ", just eliminate the space from the assertion (anchors), and use "(?<=word1)[^\r\n]*(?=word2)". Or "(?<=word1).*(?=word2)"
                          >
                          > To eliminate a variable size of white space from the match is very tricky without replacing the original text. Better would be to include 1 space in the assertion, and include any remaining spaces in the matched pattern.
                          >
                          > i.e. "(?<=word1 ) *[^\r\n]* *(?= word2)"
                          > The space followed by an asterisk will allow for zero or more spaces.
                          >
                          >
                          > Eb
                          >
                          >
                          > --- In notetab@yahoogroups.com, "book7reader" <jim@> wrote:
                          > >
                          > > If this is the actual target text:
                          > >
                          > > word1 some text here word2
                          > >
                          > > And I paste in:
                          > >
                          > > (?<=word1 )[^\r\n]*(?= word2)
                          > >
                          > > ...and click the Notetabpro Regular Expression check box in Find,
                          > >
                          > > It does not find " some text here "
                          > >
                          > > ?????
                          > >
                          > > James
                          > >
                          >
                        • Eb
                          Flo, James reports, that the expression (?
                          Message 12 of 15 , Feb 29, 2012
                          • 0 Attachment
                            Flo,

                            James reports, that the expression

                            "(?<=word1 )[^\r\n]*(?= word2)"

                            fails to find " some text here "
                            (containing a double space on the left.

                            Tests on my PC confirm this bug.

                            Are you saying, YOUR NoteTab succeeds in finding the pattern?

                            That would imply the bug being INCONSISTENCY rather than regular expression handling.

                            On TWO of _my_ systems, NoteTab FAILS to find the double space pattern. This implies, that somewhere between finding the first space in the assertion, and the second space, NoteTab loses track of spaces.



                            Eb


                            --- In notetab@yahoogroups.com, "flo.gehrke" <flo.gehrke@...> wrote:
                            >
                            > --- In notetab@yahoogroups.com, "Eb" <ebbtidalflats@> wrote:
                            > > ... NoteTab is confused here.
                            >
                            > There is no "interpreting the two spaces as one" here. NT is exactly following PCRE rules: The Lookbehind is matching the first space after 'word1', and the Character Class '[^\r\n]' matches the second space following 'word1'.


                            > > --- In notetab@yahoogroups.com, "book7reader" <jim@> wrote:
                            > > >
                            > > > If this is the actual target text:
                            > > >
                            > > > word1 some text here word2

                            > > > (?<=word1 )[^\r\n]*(?= word2)


                            > It does not find " some text here "
                          • John Shotsky
                            I m probably getting into this late, but I m not understanding the problem exactly. I would implement this slightly differently though. If these are the
                            Message 13 of 15 , Feb 29, 2012
                            • 0 Attachment
                              I'm probably getting into this late, but I'm not understanding the problem exactly. I would implement this slightly
                              differently though.

                              If these are the strings: (Notice the extra spaces in the top one)
                              word1 some text here word2
                              word1 some text here word2

                              I would use this:
                              ^!Replace "\bword1\b\K[^\r\n]*(?=\x20word2\b)" >> "" AIRSW
                              Which removes everything between word1 and [sp]word2, leaving this:
                              word1 word2

                              The \b identifies word breaks, preventing partial matches with larger words. All preceding \K is kept. Everything
                              preceding [sp]word2 is removed as unwanted. Not sure why an assertion would be used on the first part. Is there a reason
                              for doing it that way?

                              Regards,
                              John
                              RecipeTools Web Site: http://recipetools.gotdns.com/

                              From: notetab@yahoogroups.com [mailto:notetab@yahoogroups.com] On Behalf Of Eb
                              Sent: Wednesday, February 29, 2012 05:17
                              To: notetab@yahoogroups.com
                              Subject: Re: [NTB] Regular Expression: find between two words

                               
                              Flo,

                              James reports, that the expression

                              "(?<=word1 )[^\r\n]*(?= word2)"

                              fails to find " some text here "
                              (containing a double space on the left.

                              Tests on my PC confirm this bug.

                              Are you saying, YOUR NoteTab succeeds in finding the pattern?

                              That would imply the bug being INCONSISTENCY rather than regular expression handling.

                              On TWO of _my_ systems, NoteTab FAILS to find the double space pattern. This implies, that somewhere between finding the
                              first space in the assertion, and the second space, NoteTab loses track of spaces.

                              Eb

                              --- In notetab@yahoogroups.com, "flo.gehrke" <flo.gehrke@...> wrote:
                              >
                              > --- In notetab@yahoogroups.com, "Eb" <ebbtidalflats@> wrote:
                              > > ... NoteTab is confused here.
                              >
                              > There is no "interpreting the two spaces as one" here. NT is exactly following PCRE rules: The Lookbehind is matching
                              the first space after 'word1', and the Character Class '[^\r\n]' matches the second space following 'word1'.

                              > > --- In notetab@yahoogroups.com, "book7reader" <jim@> wrote:
                              > > >
                              > > > If this is the actual target text:
                              > > >
                              > > > word1 some text here word2

                              > > > (?<=word1 )[^\r\n]*(?= word2)

                              > It does not find " some text here "
                            • Don
                              I m not sure that s/he was looking for a clip and I m not sure that s/he is even a member of clips. This Thread is a bit advanced for the general group and
                              Message 14 of 15 , Feb 29, 2012
                              • 0 Attachment
                                I'm not sure that s/he was looking for a clip and I'm not sure that s/he
                                is even a member of clips. This Thread is a bit advanced for the
                                general group and has been going both places ... I apologize for not
                                knowing where to send it.

                                You suggest this:
                                (?=\x20word2\b)

                                I think in addition to \b there you need to provide for end of file or
                                return at the end of word2? Or are those included in \b? \b in help
                                simply says: "word boundary (only ASCII letters recognized)."

                                I learn a lot in these discussions so thanks to all participating.
                              • flo.gehrke
                                ... Eb, I take for granted that word1 and word2 are part of the string -- without it the RegEx makes no sense. Although a Lookaround doesn t consume any
                                Message 15 of 15 , Feb 29, 2012
                                • 0 Attachment
                                  --- In notetab@yahoogroups.com, "Eb" <ebbtidalflats@...> wrote:
                                  >
                                  > Flo,
                                  >
                                  > James reports, that the expression
                                  >
                                  > "(?<=word1 )[^\r\n]*(?= word2)"
                                  >
                                  > fails to find " some text here "
                                  > (containing a double space on the left.
                                  >
                                  > Tests on my PC confirm this bug.
                                  >
                                  > Are you saying, YOUR NoteTab succeeds in finding the pattern?

                                  Eb,

                                  I take for granted that 'word1' and 'word2' are part of the string -- without it the RegEx makes no sense. Although a Lookaround doesn't consume any character those substrings must be there, of course.

                                  So we have to test the RegEx against...

                                  'word1 some text here word2'

                                  (two spaces after 'word1', one space preceding 'word2').

                                  As we've said before, after removing the spaces from the Assertions all characters between 'word1' and 'word2', including all spaces, are matched with '(?<=word1)[^\r\n]*(?=word2)'. This is - if I did'nt misunderstand him - what James aimed at, isn't it?

                                  For me, this is valid for NT 6.2 Pro / Win XP.

                                  Regards,
                                  Flo
                                Your message has been successfully submitted and would be delivered to recipients shortly.