Loading ...
Sorry, an error occurred while loading the content.

Can a Reg Exp handle 123 AND not a|b|c followed by x?

Expand Messages
  • mycroftj
    Is there a relatively simple reg exp to find the following? Date in format mm/dd/yyyy OR yyyy/mm/dd followed by a space followed by an open paren followed by
    Message 1 of 13 , May 9, 2012
    • 0 Attachment
      Is there a relatively simple reg exp to find the following?

      Date in format mm/dd/yyyy OR yyyy/mm/dd followed by a space followed by an open paren followed by three characters that are *NOT* MON|TUE|WED|THU|FRI|SAT|SUN followed by a closing paren?

      I've gotten as far as the following that does all the above except for checking for the closing paren.

      ^!Find "(?:(\d\d/\d\d/(?:19|20)\d\d)|((?:19|20)\d\d/\d\d/\d\d))\x20\((?!(MON|TUE|WED|THU|FRI|SAT|SUN))" RSTI

      Note: (?: is for a non-capturing group

      Example:

      05/12/1995 (Fri) <-- This should not match
      05/13/1995 (Sat) <-- This should not match
      05/14/1995 (zzzday) <-- This should not match
      05/15/1995 (Mox) <-- ***This should match***
      05/16/1995 (Tue) <-- This should not match
      05/17/2005 (Tue) <-- This should not match

      I did get the clip working using the above FIND and a few extra lines, but now I'd like to know if this is (easily) possible to do with one FIND.

      Thanks!

      Joy
    • John Shotsky
      Since negative character classes don t support strings, I would approach this by first inserting a special character in front of each of the day strings, then
      Message 2 of 13 , May 9, 2012
      • 0 Attachment
        Since negative character classes don't support strings, I would approach this by first inserting a special character in
        front of each of the day strings, then use a negative character clase that included that special character. Afterwards,
        I'd remove the special character.

        I wish there were a 'negative string' feature in PCRE, but the substitution of special characters does work well.

        Regards,
        John
        RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/

        From: ntb-scripts@yahoogroups.com [mailto:ntb-scripts@yahoogroups.com] On Behalf Of mycroftj
        Sent: Wednesday, May 09, 2012 18:55
        To: ntb-scripts@yahoogroups.com
        Subject: [NTS] Can a Reg Exp handle 123 AND not a|b|c followed by x?


        Is there a relatively simple reg exp to find the following?

        Date in format mm/dd/yyyy OR yyyy/mm/dd followed by a space followed by an open paren followed by three characters that
        are *NOT* MON|TUE|WED|THU|FRI|SAT|SUN followed by a closing paren?

        I've gotten as far as the following that does all the above except for checking for the closing paren.

        ^!Find "(?:(\d\d/\d\d/(?:19|20)\d\d)|((?:19|20)\d\d/\d\d/\d\d))\x20\((?!(MON|TUE|WED|THU|FRI|SAT|SUN))" RSTI

        Note: (?: is for a non-capturing group

        Example:

        05/12/1995 (Fri) <-- This should not match
        05/13/1995 (Sat) <-- This should not match
        05/14/1995 (zzzday) <-- This should not match
        05/15/1995 (Mox) <-- ***This should match***
        05/16/1995 (Tue) <-- This should not match
        05/17/2005 (Tue) <-- This should not match

        I did get the clip working using the above FIND and a few extra lines, but now I'd like to know if this is (easily)
        possible to do with one FIND.

        Thanks!

        Joy



        [Non-text portions of this message have been removed]
      • Alec Burgess
        On 2012-05-09 21:55, mycroftj wroteTest data ... Developed and tested in RegexBuddyI and not (yet) in Notetab. Added these three test lines to make sure the
        Message 3 of 13 , May 9, 2012
        • 0 Attachment
          On 2012-05-09 21:55, mycroftj wroteTest data
          > Attempt:
          > :^!Find
          > "(?:(\d\d/\d\d/(?:19|20)\d\d)|((?:19|20)\d\d/\d\d/\d\d))\x20\((?!(MON|TUE|WED|THU|FRI|SAT|SUN))"
          > RSTI

          > 05/12/1995 (Fri) <-- This should not match
          > 05/13/1995 (Sat) <-- This should not match
          > 05/14/1995 (zzzday) <-- This should not match
          > 05/15/1995 (Mox) <-- ***This should match***
          > 05/16/1995 (Tue) <-- This should not match
          > 05/17/2005 (Tue) <-- This should not match
          Developed and tested in RegexBuddyI and not (yet) in Notetab.
          Added these three test lines to make sure the yyyy-first was handled
          correctly
          > 2005/05/16 (Mox)
          > 2005/05/16 (Mon)
          > 05/12/1995 (Mon)
          The free format regex (always easier to see how the bits and pieces work
          together by enabling and disabling parts with a leading "#":
          (?i)
          (?:\d{2}/\d{2}/\d{4}|\d{4}/\d{2}/\d{2})\x20\(
          (?!
          mon|tue|wed|thu|fri|sat|sun
          )
          .{3}
          \)

          change to equivalent single line regexp Hopefully this will not trigger
          yahoo line wrapping:
          (?i)(?:\d{2}/\d{2}/\d{4}|\d{4}/\d{2}/\d{2})\x20\((?!mon|tue|wed|thu|fri|sat|sun).{3}\)


          Key thing I think you were missing is to do a negative look-ahead for
          day names BEFORE doing a MUST MATCH on any 3 characters followed by
          closing ")" parenthesis.

          Later ... works in Notetab Find dialog and Replace dialog gives
          expected count of 2.
          So here is the ^!Find statement
          ^!Find
          "(?i)(?:\d{2}/\d{2}/\d{4}|\d{4}/\d{2}/\d{2})\x20\((?!mon|tue|wed|thu|fri|sat|sun).{3}\)"
          RIS
          Note: AFAIK the T modifier has no effect when using R option.

          --
          Regards ... Alec (buralex@gmail & WinLiveMess - alec.m.burgess@skype)
        • flo.gehrke
          ... With I and (?i) , the ignore case option is applied even twice in your pattern -- although, like T , it isn t needed here. The RegEx matches MON or
          Message 4 of 13 , May 10, 2012
          • 0 Attachment
            --- In ntb-scripts@yahoogroups.com, Alec Burgess <buralex@...> wrote:
            > So here is the ^!Find statement
            > ^!Find
            > "(?i)(?:\d{2}/\d{2}/\d{4}|\d{4}/\d{2}/\d{2})\x20\((?!mon|tue|wed|thu|fri|sat|sun).{3}\)"
            > RIS
            > Note: AFAIK the T modifier has no effect when using R option.

            With 'I' and '(?i)', the 'ignore case' option is applied even twice in your pattern -- although, like 'T', it isn't needed here. The RegEx matches 'MON' or 'mon' as well.

            If there's no need to capture anything, you could make not only the date but the whole pattern non-capturing...

            (?:(\d{2}/\d{2}/\d{4}|\d{4}/\d{2}/\d{2})\x20\((?!mon|tue|wed|thu|fri|sat|sun).{3}\))

            or enclose the whole pattern in an Atomic Group...

            ^!Find "^(?>\d{2,4}/?){3}\x20\((?!Mon|Tue|Wed|Thu|Fri|Sat|Sun).{3}\)" RS

            Regards,
            Flo
          • flo.gehrke
            ... Upps, sorry, I meant... ^(? ( d{2,4}/?){3} x20 ((?!Mon|Tue|Wed|Thu|Fri|Sat|Sun).{3} )) of course ;-) Flo
            Message 5 of 13 , May 10, 2012
            • 0 Attachment
              --- In ntb-scripts@yahoogroups.com, "flo.gehrke" <flo.gehrke@...> wrote:
              >
              > or enclose the whole pattern in an Atomic Group...
              >
              > ^!Find "^(?>\d{2,4}/?){3}\x20\((?!Mon|Tue|Wed|Thu|Fri|Sat|Sun).{3}\)" RS

              Upps, sorry, I meant...

              ^(?>(\d{2,4}/?){3}\x20\((?!Mon|Tue|Wed|Thu|Fri|Sat|Sun).{3}\))

              of course ;-)

              Flo
            • John Shotsky
              I am not understanding something here – The criteria was: three characters that are *NOT* MON|TUE|WED|THU|FRI|SAT|SUN How is this avoiding those strings?
              Message 6 of 13 , May 10, 2012
              • 0 Attachment
                I am not understanding something here � The criteria was:
                three characters that are *NOT* MON|TUE|WED|THU|FRI|SAT|SUN

                How is this avoiding those strings? I've wanted to do this text that didn't contain a certain string on multiple
                occasions.

                Regards,
                John
                RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/

                From: ntb-scripts@yahoogroups.com [mailto:ntb-scripts@yahoogroups.com] On Behalf Of flo.gehrke
                Sent: Thursday, May 10, 2012 05:48
                To: ntb-scripts@yahoogroups.com
                Subject: Re: [NTS] Can a Reg Exp handle 123 AND not a|b|c followed by x?


                --- In ntb-scripts@yahoogroups.com <mailto:ntb-scripts%40yahoogroups.com> , "flo.gehrke" <flo.gehrke@...> wrote:
                >
                > or enclose the whole pattern in an Atomic Group...
                >
                > ^!Find "^(?>\d{2,4}/?){3}\x20\((?!Mon|Tue|Wed|Thu|Fri|Sat|Sun).{3}\)" RS

                Upps, sorry, I meant...

                ^(?>(\d{2,4}/?){3}\x20\((?!Mon|Tue|Wed|Thu|Fri|Sat|Sun).{3}\))

                of course ;-)

                Flo



                [Non-text portions of this message have been removed]
              • flo.gehrke
                ... John, The second part of that RegEx... ((?!Mon|Tue|Wed|Thu|Fri|Sat|Sun).{3} ) matches an opening and a closing literal bracket (...) embracing three
                Message 7 of 13 , May 10, 2012
                • 0 Attachment
                  --- In ntb-scripts@yahoogroups.com, "John Shotsky" <jshotsky@...> wrote:
                  >
                  > I am not understanding something here – The criteria was:
                  > three characters that are *NOT* MON|TUE|WED|THU|FRI|SAT|SUN
                  >
                  > How is this avoiding those strings? I've wanted to do this text
                  > that didn't contain a certain string on multiple occasions.
                  >

                  John,

                  The second part of that RegEx...

                  \((?!Mon|Tue|Wed|Thu|Fri|Sat|Sun).{3}\)

                  matches an opening and a closing literal bracket '(...)' embracing three digits '.{3}' that are NOT 'Mon', 'Tue' etc, as Joy demanded.

                  The 3-digit-days are excluded with a Negative Lookahead. Since a Lookahead doesn't consume any character, any different 3-digit-string will match at the same position between the opening and the closing bracket. That's why, for example,..

                  'John' is matched with '(?!Mary)John'

                  that is: Find 'John' at a position where you don't see 'Mary' when looking ahead.

                  Regards,
                  Flo
                • John Shotsky
                  Flo, Thank you. Always nice to learn something new. I will play around with this until I have it fully internalized. I have needed this function quite a few
                  Message 8 of 13 , May 10, 2012
                  • 0 Attachment
                    Flo,
                    Thank you. Always nice to learn something new. I will play around with this until I have it fully internalized. I have
                    needed this function quite a few times and have 'tokenized' and then used a character class instead. (And then
                    untokenized.) This is obviously a better way to do it.

                    Regards,
                    John
                    RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/

                    From: ntb-scripts@yahoogroups.com [mailto:ntb-scripts@yahoogroups.com] On Behalf Of flo.gehrke
                    Sent: Thursday, May 10, 2012 07:46
                    To: ntb-scripts@yahoogroups.com
                    Subject: Re: [NTS] Can a Reg Exp handle 123 AND not a|b|c followed by x?


                    --- In ntb-scripts@yahoogroups.com <mailto:ntb-scripts%40yahoogroups.com> , "John Shotsky" <jshotsky@...> wrote:
                    >
                    > I am not understanding something here � The criteria was:
                    > three characters that are *NOT* MON|TUE|WED|THU|FRI|SAT|SUN
                    >
                    > How is this avoiding those strings? I've wanted to do this text
                    > that didn't contain a certain string on multiple occasions.
                    >

                    John,

                    The second part of that RegEx...

                    \((?!Mon|Tue|Wed|Thu|Fri|Sat|Sun).{3}\)

                    matches an opening and a closing literal bracket '(...)' embracing three digits '.{3}' that are NOT 'Mon', 'Tue' etc, as
                    Joy demanded.

                    The 3-digit-days are excluded with a Negative Lookahead. Since a Lookahead doesn't consume any character, any different
                    3-digit-string will match at the same position between the opening and the closing bracket. That's why, for example,..

                    'John' is matched with '(?!Mary)John'

                    that is: Find 'John' at a position where you don't see 'Mary' when looking ahead.

                    Regards,
                    Flo



                    [Non-text portions of this message have been removed]
                  • Art Kocsis
                    ... Don t you mean three CHARACTERS instead of three DIGITS? The . matches any character, d is used match any digit. Just trying to keep things clear for
                    Message 9 of 13 , May 10, 2012
                    • 0 Attachment
                      At 5/10/2012 07:46 AM, Flo wrote:
                      >The second part of that RegEx...
                      >
                      >\((?!Mon|Tue|Wed|Thu|Fri|Sat|Sun).{3}\)
                      >
                      >matches an opening and a closing literal bracket '(...)' embracing three
                      >digits '.{3}' that are NOT 'Mon', 'Tue' etc, as Joy demanded.

                      Don't you mean three CHARACTERS instead of three DIGITS? The "." matches
                      any character, "\d" is used match any digit.

                      Just trying to keep things clear for future readers. ;)

                      Namaste', Art
                    • flo.gehrke
                      ... Art, Thanks for correcting my bad English! Of course, .{3} means any character, not numbers (digits) only. Please give me a helping hand: 2012 is a
                      Message 10 of 13 , May 10, 2012
                      • 0 Attachment
                        --- In ntb-scripts@yahoogroups.com, Art Kocsis <artkns@...> wrote:
                        >
                        >> matches an opening and a closing literal bracket '(...)' embracing
                        >> three digits '.{3}'...
                        >
                        > Don't you mean three CHARACTERS instead of three DIGITS?

                        Art,

                        Thanks for correcting my bad English! Of course, '.{3}' means any character, not numbers (digits) only.

                        Please give me a helping hand: '2012' is a four-digit number, 'Peter' is a four-letter name -- correct? But what is 'Boeing-707'? A four-letter name, a four-digit string? :-(

                        Flo
                      • Art Kocsis
                        ... Correct. ... No, it has five letters. ... Expensive!!! He he he. Namaste , Art Young at heart Slightly older in other places A wise ass all throughout!
                        Message 11 of 13 , May 10, 2012
                        • 0 Attachment
                          At 5/10/2012 03:05 PM, Flo wrote:
                          >--- In <mailto:ntb-scripts%40yahoogroups.com>ntb-scripts@yahoogroups.com,
                          >Art Kocsis <artkns@...> wrote:
                          > >
                          > >> matches an opening and a closing literal bracket '(...)' embracing
                          > >> three digits '.{3}'...
                          > >
                          > > Don't you mean three CHARACTERS instead of three DIGITS?
                          >
                          >Art,
                          >
                          >Thanks for correcting my bad English! Of course, '.{3}' means any
                          >character, not numbers (digits) only.
                          >
                          >Please give me a helping hand: '2012' is a four-digit number,
                          Correct.

                          >'Peter' is a four-letter name -- correct?
                          No, it has five letters.

                          >But what is 'Boeing-707'? A four-letter name, a four-digit string? :-(
                          Expensive!!!


                          He he he.

                          Namaste', Art

                          Young at heart
                          Slightly older in other places
                          A wise ass all throughout!
                        • Computerhusky
                          Hi Flo, I d call Boeing-747 a 10-character string (or a large aeroplane). And Peter is a 5 letter name (or string) :-) Kind regards Thomas Von iPad
                          Message 12 of 13 , May 10, 2012
                          • 0 Attachment
                            Hi Flo,
                            I'd call 'Boeing-747' a 10-character string (or a large aeroplane).
                            And 'Peter' is a 5 letter name (or string) :-)
                            Kind regards
                            Thomas

                            Von iPad gesendet / sent from iPad

                            Am 11.05.2012 um 00:05 schrieb "flo.gehrke" <flo.gehrke@...>:

                            > --- In ntb-scripts@yahoogroups.com, Art Kocsis <artkns@...> wrote:
                            > >
                            > >> matches an opening and a closing literal bracket '(...)' embracing
                            > >> three digits '.{3}'...
                            > >
                            > > Don't you mean three CHARACTERS instead of three DIGITS?
                            >
                            > Art,
                            >
                            > Thanks for correcting my bad English! Of course, '.{3}' means any character, not numbers (digits) only.
                            >
                            > Please give me a helping hand: '2012' is a four-digit number, 'Peter' is a four-letter name -- correct? But what is 'Boeing-707'? A four-letter name, a four-digit string? :-(
                            >
                            > Flo
                            >
                            >


                            [Non-text portions of this message have been removed]
                          • mycroftj
                            ... Flo, Thank you so much for that. I was close! For some reason, I never realized a look-behind or look-ahead could come in the middle of a regexp. I don t
                            Message 13 of 13 , May 12, 2012
                            • 0 Attachment
                              --- In ntb-scripts@yahoogroups.com, "flo.gehrke" <flo.gehrke@...> wrote:
                              >
                              > --- In ntb-scripts@yahoogroups.com, Alec Burgess <buralex@> wrote:
                              > > So here is the ^!Find statement
                              > > ^!Find
                              > > "(?i)(?:\d{2}/\d{2}/\d{4}|\d{4}/\d{2}/\d{2})\x20\((?!mon|tue|wed|thu|fri|sat|sun).{3}\)"
                              > > RIS
                              > > Note: AFAIK the T modifier has no effect when using R option.
                              >
                              > With 'I' and '(?i)', the 'ignore case' option is applied even twice in your pattern -- although, like 'T', it isn't needed here. The RegEx matches 'MON' or 'mon' as well.
                              >
                              > If there's no need to capture anything, you could make not only the date but the whole pattern non-capturing...
                              >
                              > (?:(\d{2}/\d{2}/\d{4}|\d{4}/\d{2}/\d{2})\x20\((?!mon|tue|wed|thu|fri|sat|sun).{3}\))
                              >
                              > or enclose the whole pattern in an Atomic Group...
                              >
                              > ^!Find "^(?>\d{2,4}/?){3}\x20\((?!Mon|Tue|Wed|Thu|Fri|Sat|Sun).{3}\)" RS
                              >
                              > Regards,
                              > Flo
                              >


                              Flo,

                              Thank you so much for that. I was close! For some reason, I never realized a look-behind or look-ahead could come in the middle of a regexp. I don't recall ever seeing that in an example. But it can and is perfect and I learned a very important thing.

                              Thank you always for your answers and remember how many people learn by seeing others discuss EVERYTHING here.

                              Regards

                              Joy
                            Your message has been successfully submitted and would be delivered to recipients shortly.