Loading ...
Sorry, an error occurred while loading the content.

Re: [PBML] Compact digital range RexEx - And complement?

Expand Messages
  • Allan Dystrup
    Hi JE, No, unfortunately not. I have to pass the negated RE as a new RE to the parsing program, so i ll have to come up with a complement RE for :
    Message 1 of 20 , Dec 1, 2004
    • 0 Attachment
      Hi JE,

      No, unfortunately not.

      I have to pass the "negated" RE as a new RE to the parsing program,
      so i'll have to come up with a "complement RE" for :

      (CX36(5|6))|(JA30[0-2])|(JA3(([2-8]\d)|(9[0-4])))|(JA5.*)|
      (JA6((0\d)|(1[0-3])))|(JA64[7-9])|(JA687.*)|(JA74[0-3])|(JB5.*)|(JY
      (((1|2)\d\d)|(3[0-3]\d))))|(JY[3-9][5-9]\d)|(JZ51(3|4)00.*)

      If that proves too hard, i'll have to hack the parser,
      -- but i'd rather not...

      Allan


      --- In perl-beginner@yahoogroups.com, "J.E. Cripps" <cycmn@n...>
      wrote:
      >
      > > --------------------------------
      > > And then i face another problem:
      > > --------------------------------
      >
      > > Given the combined regex (infra) how do i express
      the "complement" (ie
      > > all strings NOT MATCHING this regex) as a new regex ?
      >
      > as a new regex? the !~ op. won't do what you want?
      >
      > > COMBINED REGEX:
    • J.E. Cripps
      ... Hmmm... here s what I have so far: # Looking for selected matches on a five character string # (What s the name of this string? What does it mean?) # 2
      Message 2 of 20 , Dec 1, 2004
      • 0 Attachment
        On Wed, 1 Dec 2004, Allan Dystrup wrote:

        > I have to pass the "negated" RE as a new RE to the parsing program,
        > so i'll have to come up with a "complement RE" for :
        >
        > (CX36(5|6))|(JA30[0-2])|(JA3(([2-8]\d)|(9[0-4])))|(JA5.*)|
        > (JA6((0\d)|(1[0-3])))|(JA64[7-9])|(JA687.*)|(JA74[0-3])|(JB5.*)|(JY
        > (((1|2)\d\d)|(3[0-3]\d))))|(JY[3-9][5-9]\d)|(JZ51(3|4)00.*)

        Hmmm... here's what I have so far:

        # Looking for selected matches on a five character string
        # (What's the name of this string? What does it mean?)

        # 2 letters 3 digits e.g. JB523
        # would be a good idea to show the the original matches but
        # I do not have the original post. IOW,

        # To match all strings except JB360-JB356 ... etc. etc.

        # using the x modifier to include comments and spacing
        # see page 57 of the Camel books and perlretut

        ############# This is the start of the complement regexp #################

        / # The opening /

        (([AB]|[D-I]|[K-Z])\w)(\d{3}) | #match all but initial J, C
        # then match the three digits
        # the \w will match any alphanumeric,
        # do you have to worry about anomalous data e.g. Kz301 L0500
        #or maybe
        # (^[CJ]\w{1})(\d{3})|

        # or maybe
        # (^[CJ])(...)
        # or (^[CJ])(.*)
        # since you have a similar
        # regexp in one of
        # the J cases

        (C^[X])(\d{3})| # match the initial C other
        # than those followed by X
        # or (C^[X])(...)

        ((CX) ( ([0-2]|[4-9]) \d{1})) | # match CX followed by
        # any digit except 3

        (CX3)([2345789]\d{1})| # match CX3 followed by
        # any digit except 6

        (CX36([0-4][7-9]))| # match CX36 except when
        # followed by 5 or 6

        (J^[ABYZ]))(\d{1})| # match J followed by
        # letters other than ABYZ
        # or (J^[ABYZ])(...)

        (JA)([012489](...)| # match JA followed by
        # digits other than
        # 3,4,6 or 7

        # this one is not the next in your original regexp
        (JB^[5])(..)| # JB except if
        # followed by a 5
        # skipping the J cases that remain
        # I do not have the original post and cannot reconstruct the
        # target data from memory

        # the J cases for which I have not tried to do a complement:
        #(JA30[0-2])|
        #(JA3(([2-8]\d)|(9[0-4])))|
        #(JA6((0\d)|(1[0-3])))|
        #(JA64[7-9])|
        #(JA687.*)|
        #(JA74[0-3])|
        #(JB5.*)|
        #(JY(((1|2)\d\d)|(3[0-3]\d))))|
        #(JY[3-9][5-9]\d)|
        #(JZ51(3|4)00.*)
        # (JZ51(3|4)00.*) # does this have more than three digits in it?
        # # what are those 00s?

        # let's pretend all the complement matches are written
        # and we'll close with the /x

        /x # The closing /x
        ############ This is the end of the Complement Regexp #################
      • J.E. Cripps
        errata: #((CX) ( ([0-2]|[4-9]) d{1})) | # match CX followed by # any digit except 3 #should be ((CX) (([0-2]|[4-9]) d{2}) | # or end with (..)
        Message 3 of 20 , Dec 1, 2004
        • 0 Attachment
          errata:

          #((CX) ( ([0-2]|[4-9]) \d{1})) | # match CX followed by
          # any digit except 3
          #should be
          ((CX) (([0-2]|[4-9]) \d{2}) | # or end with (..)


          #(J^[ABYZ]))(\d{1})| # match J followed by
          # letters other than ABYZ
          #should be
          (J^[ABYZ])(\d{3})|
        • Allan Dystrup
          Hi JE, Yes i can follow your line of reasoning here, step by step building up the negated Regex es. It is workable (actually more work than able to my
          Message 4 of 20 , Dec 1, 2004
          • 0 Attachment
            Hi JE,

            Yes i can follow your line of reasoning here, step by step building up
            the "negated" Regex'es.

            It is workable (actually more 'work' than 'able' to my taste...), so
            what i was really looking for was a built-in Regex
            operator/metacharacter like [^ ] for char. classes or (?! ) for
            lookahead, a feature that would just "invert" any given Regex, so to
            speak.

            I haven't been able to uncover such a feature though; Instead i've
            now opened the source of the parser and reversed the Perl matching
            operator (=~ to !=). This is a hack, and doesn't solve the general
            problem of feeding the parser any RE (in a one line textbox),
            including any complemented RE. I still chew on that one.

            Thanks,
            Allan


            --- In perl-beginner@yahoogroups.com, "J.E. Cripps" <cycmn@n...>
            wrote:
            >
            >
            > On Wed, 1 Dec 2004, Allan Dystrup wrote:
            >
            > > I have to pass the "negated" RE as a new RE to the parsing
            program,
            > > so i'll have to come up with a "complement RE" for :
            > >
            > > (CX36(5|6))|(JA30[0-2])|(JA3(([2-8]\d)|(9[0-4])))|(JA5.*)|
            > > (JA6((0\d)|(1[0-3])))|(JA64[7-9])|(JA687.*)|(JA74[0-3])|(JB5.*)|
            (JY
            > > (((1|2)\d\d)|(3[0-3]\d))))|(JY[3-9][5-9]\d)|(JZ51(3|4)00.*)
            >
            > Hmmm... here's what I have so far:
            >
            > # Looking for selected matches on a five character string
            > # (What's the name of this string? What does it mean?)
            >
            > # 2 letters 3 digits e.g. JB523
            > # would be a good idea to show the the original matches but
            > # I do not have the original post. IOW,
            >
            > # To match all strings except JB360-JB356 ... etc. etc.
            >
            > # using the x modifier to include comments and spacing
            > # see page 57 of the Camel books and perlretut
            >
            > ############# This is the start of the complement regexp
            #################
            >
            > / # The opening /
            >
            > (([AB]|[D-I]|[K-Z])\w)(\d{3}) | #match all but initial J, C
            > # then match the three digits
            > # the \w will match any alphanumeric,
            > # do you have to worry about anomalous data e.g. Kz301 L0500
            > #or maybe
            > # (^[CJ]\w{1})(\d{3})|
            >
            > # or maybe
            > # (^[CJ])(...)
            > # or (^[CJ])(.*)
            > # since you have a similar
            > # regexp in one of
            > # the J cases
            >
            > (C^[X])(\d{3})| # match the initial C
            other
            > # than those followed by X
            > # or (C^[X])(...)
            >
            > ((CX) ( ([0-2]|[4-9]) \d{1})) | # match CX followed by
            > # any digit except 3
            >
            > (CX3)([2345789]\d{1})| # match CX3 followed
            by
            > # any digit except 6
            >
            > (CX36([0-4][7-9]))| # match CX36 except when
            > # followed by 5 or 6
            >
            > (J^[ABYZ]))(\d{1})| # match J followed by
            > # letters other than ABYZ
            > # or (J^[ABYZ])(...)
            >
            > (JA)([012489](...)| # match JA followed by
            > # digits other than
            > # 3,4,6 or 7
            >
            > # this one is not the next in your original regexp
            > (JB^[5])(..)| # JB except if
            > # followed by a 5
            > # skipping the J cases that remain
            > # I do not have the original post and cannot reconstruct the
            > # target data from memory
            >
            > # the J cases for which I have not tried to do a complement:
            > #(JA30[0-2])|
            > #(JA3(([2-8]\d)|(9[0-4])))|
            > #(JA6((0\d)|(1[0-3])))|
            > #(JA64[7-9])|
            > #(JA687.*)|
            > #(JA74[0-3])|
            > #(JB5.*)|
            > #(JY(((1|2)\d\d)|(3[0-3]\d))))|
            > #(JY[3-9][5-9]\d)|
            > #(JZ51(3|4)00.*)
            > # (JZ51(3|4)00.*) # does this have more than three digits in it?
            > # # what are those 00s?
            >
            > # let's pretend all the complement matches are written
            > # and we'll close with the /x
            >
            > /x # The closing /x
            > ############ This is the end of the Complement Regexp
            #################
          • J.E. Cripps
            ... everything I ve seen indicates that there isn t any such feature, and the complementing a regexp tends to be laborious, messy or both ... the ^ should be
            Message 5 of 20 , Dec 1, 2004
            • 0 Attachment
              On Wed, 1 Dec 2004, Allan Dystrup wrote:

              > Yes i can follow your line of reasoning here, step by step building up
              > the "negated" Regex'es. ...
              > It is workable (actually more 'work' than 'able' to my taste...), so
              > what i was really looking for was a built-in Regex
              > operator/metacharacter like [^ ] for char. classes or (?! ) for
              > lookahead, a feature that would just "invert" any given Regex, so to
              > speak.

              everything I've seen indicates that there isn't any such feature,
              and the complementing a regexp tends to be laborious, messy or both


              another error in a previous message of mine:

              > > (C^[X])(\d{3})| # match the initial C

              the ^ should be _inside_ the [ ], i.e.

              (C[^X])(\d{3})

              > > (J^[ABYZ]))(\d{1})| # match J followed by

              which should be

              (J[^ABYZ])(\d{1})
            • Jonathan Paton
              Dear Allan, I think you are looking for: (?!pattern) A zero-width negative look-ahead assertion. For example /foo(?!bar)/ matches any occurrence of foo
              Message 6 of 20 , Dec 1, 2004
              • 0 Attachment
                Dear Allan,

                I think you are looking for:

                "(?!pattern)"
                A zero-width negative look-ahead assertion. For example
                "/foo(?!bar)/" matches any occurrence of "foo" that isn't
                followed by "bar". Note however that look-ahead and look-
                behind are NOT the same thing. You cannot use this for
                look-behind.

                ...

                from perldoc perlre

                You might need to wrap the regex with ^ and $ assertions.

                Jonathan Paton
              • Allan Dystrup
                Hi Jonathan , Yes indeed, i ve reached the same conclusion. The (?!pattern) can solve the issue in as clean a way as it s probably possible with Regex es. Eg:
                Message 7 of 20 , Dec 1, 2004
                • 0 Attachment
                  Hi Jonathan ,

                  Yes indeed, i've reached the same conclusion.
                  The (?!pattern) can solve the issue in as clean a way
                  as it's probably possible with Regex'es. Eg:

                  Range RegEx Complement
                  ----------- ----------- ----------
                  CX365-CX366 CX36(5|6) ^(?!CX36(5|6))
                  JA300-JA302 JA30[0-2] ^(?!JA30[0-2])
                  JA320-JA394 JA3(([2-8]\d)|(9[0-4])) ^(?!(JA3(([2-8]\d)|(9[0-4]))
                  etc.

                  Thanks a lot,
                  Allan


                  --- In perl-beginner@yahoogroups.com, Jonathan Paton <jepaton@g...>
                  wrote:
                  > Dear Allan,
                  >
                  > I think you are looking for:
                  >
                  > "(?!pattern)"
                  > A zero-width negative look-ahead assertion. For
                  example
                  > "/foo(?!bar)/" matches any occurrence of "foo"
                  that isn't
                  > followed by "bar". Note however that look-ahead
                  and look-
                  > behind are NOT the same thing. You cannot use
                  this for
                  > look-behind.
                  >
                  > ...
                  >
                  > from perldoc perlre
                  >
                  > You might need to wrap the regex with ^ and $ assertions.
                  >
                  > Jonathan Paton
                Your message has been successfully submitted and would be delivered to recipients shortly.