Loading ...
Sorry, an error occurred while loading the content.

Re: [PBML] Compact digital range RexEx - And complement?

Expand Messages
  • J.E. Cripps
    ... as a new regex? the !~ op. won t do what you want?
    Message 1 of 20 , Nov 30, 2004
    View Source
    • 0 Attachment
      > --------------------------------
      > And then i face another problem:
      > --------------------------------

      > Given the combined regex (infra) how do i express the "complement" (ie
      > all strings NOT MATCHING this regex) as a new regex ?

      as a new regex? the !~ op. won't do what you want?

      > COMBINED REGEX:
      > (CX36(5|6))|(JA30[0-2])|(JA3(([2-8]\d)|(9[0-4])))|(JA5.*)|
      > (JA6((0\d)|(1[0-3])))|(JA64[7-9])|(JA687.*)|(JA74[0-3])|(JB5.*)|(JY
      > (((1|2)\d\d)|(3[0-3]\d))))|(JY[3-9][5-9]\d)|(JZ51(3|4)00.*)
    • Allan Dystrup
      Hi JE, No, unfortunately not. I have to pass the negated RE as a new RE to the parsing program, so i ll have to come up with a complement RE for :
      Message 2 of 20 , Dec 1, 2004
      View Source
      • 0 Attachment
        Hi JE,

        No, unfortunately not.

        I have to pass the "negated" RE as a new RE to the parsing program,
        so i'll have to come up with a "complement RE" for :

        (CX36(5|6))|(JA30[0-2])|(JA3(([2-8]\d)|(9[0-4])))|(JA5.*)|
        (JA6((0\d)|(1[0-3])))|(JA64[7-9])|(JA687.*)|(JA74[0-3])|(JB5.*)|(JY
        (((1|2)\d\d)|(3[0-3]\d))))|(JY[3-9][5-9]\d)|(JZ51(3|4)00.*)

        If that proves too hard, i'll have to hack the parser,
        -- but i'd rather not...

        Allan


        --- In perl-beginner@yahoogroups.com, "J.E. Cripps" <cycmn@n...>
        wrote:
        >
        > > --------------------------------
        > > And then i face another problem:
        > > --------------------------------
        >
        > > Given the combined regex (infra) how do i express
        the "complement" (ie
        > > all strings NOT MATCHING this regex) as a new regex ?
        >
        > as a new regex? the !~ op. won't do what you want?
        >
        > > COMBINED REGEX:
      • J.E. Cripps
        ... Hmmm... here s what I have so far: # Looking for selected matches on a five character string # (What s the name of this string? What does it mean?) # 2
        Message 3 of 20 , Dec 1, 2004
        View Source
        • 0 Attachment
          On Wed, 1 Dec 2004, Allan Dystrup wrote:

          > I have to pass the "negated" RE as a new RE to the parsing program,
          > so i'll have to come up with a "complement RE" for :
          >
          > (CX36(5|6))|(JA30[0-2])|(JA3(([2-8]\d)|(9[0-4])))|(JA5.*)|
          > (JA6((0\d)|(1[0-3])))|(JA64[7-9])|(JA687.*)|(JA74[0-3])|(JB5.*)|(JY
          > (((1|2)\d\d)|(3[0-3]\d))))|(JY[3-9][5-9]\d)|(JZ51(3|4)00.*)

          Hmmm... here's what I have so far:

          # Looking for selected matches on a five character string
          # (What's the name of this string? What does it mean?)

          # 2 letters 3 digits e.g. JB523
          # would be a good idea to show the the original matches but
          # I do not have the original post. IOW,

          # To match all strings except JB360-JB356 ... etc. etc.

          # using the x modifier to include comments and spacing
          # see page 57 of the Camel books and perlretut

          ############# This is the start of the complement regexp #################

          / # The opening /

          (([AB]|[D-I]|[K-Z])\w)(\d{3}) | #match all but initial J, C
          # then match the three digits
          # the \w will match any alphanumeric,
          # do you have to worry about anomalous data e.g. Kz301 L0500
          #or maybe
          # (^[CJ]\w{1})(\d{3})|

          # or maybe
          # (^[CJ])(...)
          # or (^[CJ])(.*)
          # since you have a similar
          # regexp in one of
          # the J cases

          (C^[X])(\d{3})| # match the initial C other
          # than those followed by X
          # or (C^[X])(...)

          ((CX) ( ([0-2]|[4-9]) \d{1})) | # match CX followed by
          # any digit except 3

          (CX3)([2345789]\d{1})| # match CX3 followed by
          # any digit except 6

          (CX36([0-4][7-9]))| # match CX36 except when
          # followed by 5 or 6

          (J^[ABYZ]))(\d{1})| # match J followed by
          # letters other than ABYZ
          # or (J^[ABYZ])(...)

          (JA)([012489](...)| # match JA followed by
          # digits other than
          # 3,4,6 or 7

          # this one is not the next in your original regexp
          (JB^[5])(..)| # JB except if
          # followed by a 5
          # skipping the J cases that remain
          # I do not have the original post and cannot reconstruct the
          # target data from memory

          # the J cases for which I have not tried to do a complement:
          #(JA30[0-2])|
          #(JA3(([2-8]\d)|(9[0-4])))|
          #(JA6((0\d)|(1[0-3])))|
          #(JA64[7-9])|
          #(JA687.*)|
          #(JA74[0-3])|
          #(JB5.*)|
          #(JY(((1|2)\d\d)|(3[0-3]\d))))|
          #(JY[3-9][5-9]\d)|
          #(JZ51(3|4)00.*)
          # (JZ51(3|4)00.*) # does this have more than three digits in it?
          # # what are those 00s?

          # let's pretend all the complement matches are written
          # and we'll close with the /x

          /x # The closing /x
          ############ This is the end of the Complement Regexp #################
        • J.E. Cripps
          errata: #((CX) ( ([0-2]|[4-9]) d{1})) | # match CX followed by # any digit except 3 #should be ((CX) (([0-2]|[4-9]) d{2}) | # or end with (..)
          Message 4 of 20 , Dec 1, 2004
          View Source
          • 0 Attachment
            errata:

            #((CX) ( ([0-2]|[4-9]) \d{1})) | # match CX followed by
            # any digit except 3
            #should be
            ((CX) (([0-2]|[4-9]) \d{2}) | # or end with (..)


            #(J^[ABYZ]))(\d{1})| # match J followed by
            # letters other than ABYZ
            #should be
            (J^[ABYZ])(\d{3})|
          • Allan Dystrup
            Hi JE, Yes i can follow your line of reasoning here, step by step building up the negated Regex es. It is workable (actually more work than able to my
            Message 5 of 20 , Dec 1, 2004
            View Source
            • 0 Attachment
              Hi JE,

              Yes i can follow your line of reasoning here, step by step building up
              the "negated" Regex'es.

              It is workable (actually more 'work' than 'able' to my taste...), so
              what i was really looking for was a built-in Regex
              operator/metacharacter like [^ ] for char. classes or (?! ) for
              lookahead, a feature that would just "invert" any given Regex, so to
              speak.

              I haven't been able to uncover such a feature though; Instead i've
              now opened the source of the parser and reversed the Perl matching
              operator (=~ to !=). This is a hack, and doesn't solve the general
              problem of feeding the parser any RE (in a one line textbox),
              including any complemented RE. I still chew on that one.

              Thanks,
              Allan


              --- In perl-beginner@yahoogroups.com, "J.E. Cripps" <cycmn@n...>
              wrote:
              >
              >
              > On Wed, 1 Dec 2004, Allan Dystrup wrote:
              >
              > > I have to pass the "negated" RE as a new RE to the parsing
              program,
              > > so i'll have to come up with a "complement RE" for :
              > >
              > > (CX36(5|6))|(JA30[0-2])|(JA3(([2-8]\d)|(9[0-4])))|(JA5.*)|
              > > (JA6((0\d)|(1[0-3])))|(JA64[7-9])|(JA687.*)|(JA74[0-3])|(JB5.*)|
              (JY
              > > (((1|2)\d\d)|(3[0-3]\d))))|(JY[3-9][5-9]\d)|(JZ51(3|4)00.*)
              >
              > Hmmm... here's what I have so far:
              >
              > # Looking for selected matches on a five character string
              > # (What's the name of this string? What does it mean?)
              >
              > # 2 letters 3 digits e.g. JB523
              > # would be a good idea to show the the original matches but
              > # I do not have the original post. IOW,
              >
              > # To match all strings except JB360-JB356 ... etc. etc.
              >
              > # using the x modifier to include comments and spacing
              > # see page 57 of the Camel books and perlretut
              >
              > ############# This is the start of the complement regexp
              #################
              >
              > / # The opening /
              >
              > (([AB]|[D-I]|[K-Z])\w)(\d{3}) | #match all but initial J, C
              > # then match the three digits
              > # the \w will match any alphanumeric,
              > # do you have to worry about anomalous data e.g. Kz301 L0500
              > #or maybe
              > # (^[CJ]\w{1})(\d{3})|
              >
              > # or maybe
              > # (^[CJ])(...)
              > # or (^[CJ])(.*)
              > # since you have a similar
              > # regexp in one of
              > # the J cases
              >
              > (C^[X])(\d{3})| # match the initial C
              other
              > # than those followed by X
              > # or (C^[X])(...)
              >
              > ((CX) ( ([0-2]|[4-9]) \d{1})) | # match CX followed by
              > # any digit except 3
              >
              > (CX3)([2345789]\d{1})| # match CX3 followed
              by
              > # any digit except 6
              >
              > (CX36([0-4][7-9]))| # match CX36 except when
              > # followed by 5 or 6
              >
              > (J^[ABYZ]))(\d{1})| # match J followed by
              > # letters other than ABYZ
              > # or (J^[ABYZ])(...)
              >
              > (JA)([012489](...)| # match JA followed by
              > # digits other than
              > # 3,4,6 or 7
              >
              > # this one is not the next in your original regexp
              > (JB^[5])(..)| # JB except if
              > # followed by a 5
              > # skipping the J cases that remain
              > # I do not have the original post and cannot reconstruct the
              > # target data from memory
              >
              > # the J cases for which I have not tried to do a complement:
              > #(JA30[0-2])|
              > #(JA3(([2-8]\d)|(9[0-4])))|
              > #(JA6((0\d)|(1[0-3])))|
              > #(JA64[7-9])|
              > #(JA687.*)|
              > #(JA74[0-3])|
              > #(JB5.*)|
              > #(JY(((1|2)\d\d)|(3[0-3]\d))))|
              > #(JY[3-9][5-9]\d)|
              > #(JZ51(3|4)00.*)
              > # (JZ51(3|4)00.*) # does this have more than three digits in it?
              > # # what are those 00s?
              >
              > # let's pretend all the complement matches are written
              > # and we'll close with the /x
              >
              > /x # The closing /x
              > ############ This is the end of the Complement Regexp
              #################
            • J.E. Cripps
              ... everything I ve seen indicates that there isn t any such feature, and the complementing a regexp tends to be laborious, messy or both ... the ^ should be
              Message 6 of 20 , Dec 1, 2004
              View Source
              • 0 Attachment
                On Wed, 1 Dec 2004, Allan Dystrup wrote:

                > Yes i can follow your line of reasoning here, step by step building up
                > the "negated" Regex'es. ...
                > It is workable (actually more 'work' than 'able' to my taste...), so
                > what i was really looking for was a built-in Regex
                > operator/metacharacter like [^ ] for char. classes or (?! ) for
                > lookahead, a feature that would just "invert" any given Regex, so to
                > speak.

                everything I've seen indicates that there isn't any such feature,
                and the complementing a regexp tends to be laborious, messy or both


                another error in a previous message of mine:

                > > (C^[X])(\d{3})| # match the initial C

                the ^ should be _inside_ the [ ], i.e.

                (C[^X])(\d{3})

                > > (J^[ABYZ]))(\d{1})| # match J followed by

                which should be

                (J[^ABYZ])(\d{1})
              • Jonathan Paton
                Dear Allan, I think you are looking for: (?!pattern) A zero-width negative look-ahead assertion. For example /foo(?!bar)/ matches any occurrence of foo
                Message 7 of 20 , Dec 1, 2004
                View Source
                • 0 Attachment
                  Dear Allan,

                  I think you are looking for:

                  "(?!pattern)"
                  A zero-width negative look-ahead assertion. For example
                  "/foo(?!bar)/" matches any occurrence of "foo" that isn't
                  followed by "bar". Note however that look-ahead and look-
                  behind are NOT the same thing. You cannot use this for
                  look-behind.

                  ...

                  from perldoc perlre

                  You might need to wrap the regex with ^ and $ assertions.

                  Jonathan Paton
                • Allan Dystrup
                  Hi Jonathan , Yes indeed, i ve reached the same conclusion. The (?!pattern) can solve the issue in as clean a way as it s probably possible with Regex es. Eg:
                  Message 8 of 20 , Dec 1, 2004
                  View Source
                  • 0 Attachment
                    Hi Jonathan ,

                    Yes indeed, i've reached the same conclusion.
                    The (?!pattern) can solve the issue in as clean a way
                    as it's probably possible with Regex'es. Eg:

                    Range RegEx Complement
                    ----------- ----------- ----------
                    CX365-CX366 CX36(5|6) ^(?!CX36(5|6))
                    JA300-JA302 JA30[0-2] ^(?!JA30[0-2])
                    JA320-JA394 JA3(([2-8]\d)|(9[0-4])) ^(?!(JA3(([2-8]\d)|(9[0-4]))
                    etc.

                    Thanks a lot,
                    Allan


                    --- In perl-beginner@yahoogroups.com, Jonathan Paton <jepaton@g...>
                    wrote:
                    > Dear Allan,
                    >
                    > I think you are looking for:
                    >
                    > "(?!pattern)"
                    > A zero-width negative look-ahead assertion. For
                    example
                    > "/foo(?!bar)/" matches any occurrence of "foo"
                    that isn't
                    > followed by "bar". Note however that look-ahead
                    and look-
                    > behind are NOT the same thing. You cannot use
                    this for
                    > look-behind.
                    >
                    > ...
                    >
                    > from perldoc perlre
                    >
                    > You might need to wrap the regex with ^ and $ assertions.
                    >
                    > Jonathan Paton
                  Your message has been successfully submitted and would be delivered to recipients shortly.