Loading ...
Sorry, an error occurred while loading the content.

Re: [PBML] Compact digital range RexEx - And complement?

Expand Messages
  • J.E. Cripps
    ... Hmmm... here s what I have so far: # Looking for selected matches on a five character string # (What s the name of this string? What does it mean?) # 2
    Message 1 of 20 , Dec 1, 2004
    • 0 Attachment
      On Wed, 1 Dec 2004, Allan Dystrup wrote:

      > I have to pass the "negated" RE as a new RE to the parsing program,
      > so i'll have to come up with a "complement RE" for :
      >
      > (CX36(5|6))|(JA30[0-2])|(JA3(([2-8]\d)|(9[0-4])))|(JA5.*)|
      > (JA6((0\d)|(1[0-3])))|(JA64[7-9])|(JA687.*)|(JA74[0-3])|(JB5.*)|(JY
      > (((1|2)\d\d)|(3[0-3]\d))))|(JY[3-9][5-9]\d)|(JZ51(3|4)00.*)

      Hmmm... here's what I have so far:

      # Looking for selected matches on a five character string
      # (What's the name of this string? What does it mean?)

      # 2 letters 3 digits e.g. JB523
      # would be a good idea to show the the original matches but
      # I do not have the original post. IOW,

      # To match all strings except JB360-JB356 ... etc. etc.

      # using the x modifier to include comments and spacing
      # see page 57 of the Camel books and perlretut

      ############# This is the start of the complement regexp #################

      / # The opening /

      (([AB]|[D-I]|[K-Z])\w)(\d{3}) | #match all but initial J, C
      # then match the three digits
      # the \w will match any alphanumeric,
      # do you have to worry about anomalous data e.g. Kz301 L0500
      #or maybe
      # (^[CJ]\w{1})(\d{3})|

      # or maybe
      # (^[CJ])(...)
      # or (^[CJ])(.*)
      # since you have a similar
      # regexp in one of
      # the J cases

      (C^[X])(\d{3})| # match the initial C other
      # than those followed by X
      # or (C^[X])(...)

      ((CX) ( ([0-2]|[4-9]) \d{1})) | # match CX followed by
      # any digit except 3

      (CX3)([2345789]\d{1})| # match CX3 followed by
      # any digit except 6

      (CX36([0-4][7-9]))| # match CX36 except when
      # followed by 5 or 6

      (J^[ABYZ]))(\d{1})| # match J followed by
      # letters other than ABYZ
      # or (J^[ABYZ])(...)

      (JA)([012489](...)| # match JA followed by
      # digits other than
      # 3,4,6 or 7

      # this one is not the next in your original regexp
      (JB^[5])(..)| # JB except if
      # followed by a 5
      # skipping the J cases that remain
      # I do not have the original post and cannot reconstruct the
      # target data from memory

      # the J cases for which I have not tried to do a complement:
      #(JA30[0-2])|
      #(JA3(([2-8]\d)|(9[0-4])))|
      #(JA6((0\d)|(1[0-3])))|
      #(JA64[7-9])|
      #(JA687.*)|
      #(JA74[0-3])|
      #(JB5.*)|
      #(JY(((1|2)\d\d)|(3[0-3]\d))))|
      #(JY[3-9][5-9]\d)|
      #(JZ51(3|4)00.*)
      # (JZ51(3|4)00.*) # does this have more than three digits in it?
      # # what are those 00s?

      # let's pretend all the complement matches are written
      # and we'll close with the /x

      /x # The closing /x
      ############ This is the end of the Complement Regexp #################
    • J.E. Cripps
      errata: #((CX) ( ([0-2]|[4-9]) d{1})) | # match CX followed by # any digit except 3 #should be ((CX) (([0-2]|[4-9]) d{2}) | # or end with (..)
      Message 2 of 20 , Dec 1, 2004
      • 0 Attachment
        errata:

        #((CX) ( ([0-2]|[4-9]) \d{1})) | # match CX followed by
        # any digit except 3
        #should be
        ((CX) (([0-2]|[4-9]) \d{2}) | # or end with (..)


        #(J^[ABYZ]))(\d{1})| # match J followed by
        # letters other than ABYZ
        #should be
        (J^[ABYZ])(\d{3})|
      • Allan Dystrup
        Hi JE, Yes i can follow your line of reasoning here, step by step building up the negated Regex es. It is workable (actually more work than able to my
        Message 3 of 20 , Dec 1, 2004
        • 0 Attachment
          Hi JE,

          Yes i can follow your line of reasoning here, step by step building up
          the "negated" Regex'es.

          It is workable (actually more 'work' than 'able' to my taste...), so
          what i was really looking for was a built-in Regex
          operator/metacharacter like [^ ] for char. classes or (?! ) for
          lookahead, a feature that would just "invert" any given Regex, so to
          speak.

          I haven't been able to uncover such a feature though; Instead i've
          now opened the source of the parser and reversed the Perl matching
          operator (=~ to !=). This is a hack, and doesn't solve the general
          problem of feeding the parser any RE (in a one line textbox),
          including any complemented RE. I still chew on that one.

          Thanks,
          Allan


          --- In perl-beginner@yahoogroups.com, "J.E. Cripps" <cycmn@n...>
          wrote:
          >
          >
          > On Wed, 1 Dec 2004, Allan Dystrup wrote:
          >
          > > I have to pass the "negated" RE as a new RE to the parsing
          program,
          > > so i'll have to come up with a "complement RE" for :
          > >
          > > (CX36(5|6))|(JA30[0-2])|(JA3(([2-8]\d)|(9[0-4])))|(JA5.*)|
          > > (JA6((0\d)|(1[0-3])))|(JA64[7-9])|(JA687.*)|(JA74[0-3])|(JB5.*)|
          (JY
          > > (((1|2)\d\d)|(3[0-3]\d))))|(JY[3-9][5-9]\d)|(JZ51(3|4)00.*)
          >
          > Hmmm... here's what I have so far:
          >
          > # Looking for selected matches on a five character string
          > # (What's the name of this string? What does it mean?)
          >
          > # 2 letters 3 digits e.g. JB523
          > # would be a good idea to show the the original matches but
          > # I do not have the original post. IOW,
          >
          > # To match all strings except JB360-JB356 ... etc. etc.
          >
          > # using the x modifier to include comments and spacing
          > # see page 57 of the Camel books and perlretut
          >
          > ############# This is the start of the complement regexp
          #################
          >
          > / # The opening /
          >
          > (([AB]|[D-I]|[K-Z])\w)(\d{3}) | #match all but initial J, C
          > # then match the three digits
          > # the \w will match any alphanumeric,
          > # do you have to worry about anomalous data e.g. Kz301 L0500
          > #or maybe
          > # (^[CJ]\w{1})(\d{3})|
          >
          > # or maybe
          > # (^[CJ])(...)
          > # or (^[CJ])(.*)
          > # since you have a similar
          > # regexp in one of
          > # the J cases
          >
          > (C^[X])(\d{3})| # match the initial C
          other
          > # than those followed by X
          > # or (C^[X])(...)
          >
          > ((CX) ( ([0-2]|[4-9]) \d{1})) | # match CX followed by
          > # any digit except 3
          >
          > (CX3)([2345789]\d{1})| # match CX3 followed
          by
          > # any digit except 6
          >
          > (CX36([0-4][7-9]))| # match CX36 except when
          > # followed by 5 or 6
          >
          > (J^[ABYZ]))(\d{1})| # match J followed by
          > # letters other than ABYZ
          > # or (J^[ABYZ])(...)
          >
          > (JA)([012489](...)| # match JA followed by
          > # digits other than
          > # 3,4,6 or 7
          >
          > # this one is not the next in your original regexp
          > (JB^[5])(..)| # JB except if
          > # followed by a 5
          > # skipping the J cases that remain
          > # I do not have the original post and cannot reconstruct the
          > # target data from memory
          >
          > # the J cases for which I have not tried to do a complement:
          > #(JA30[0-2])|
          > #(JA3(([2-8]\d)|(9[0-4])))|
          > #(JA6((0\d)|(1[0-3])))|
          > #(JA64[7-9])|
          > #(JA687.*)|
          > #(JA74[0-3])|
          > #(JB5.*)|
          > #(JY(((1|2)\d\d)|(3[0-3]\d))))|
          > #(JY[3-9][5-9]\d)|
          > #(JZ51(3|4)00.*)
          > # (JZ51(3|4)00.*) # does this have more than three digits in it?
          > # # what are those 00s?
          >
          > # let's pretend all the complement matches are written
          > # and we'll close with the /x
          >
          > /x # The closing /x
          > ############ This is the end of the Complement Regexp
          #################
        • J.E. Cripps
          ... everything I ve seen indicates that there isn t any such feature, and the complementing a regexp tends to be laborious, messy or both ... the ^ should be
          Message 4 of 20 , Dec 1, 2004
          • 0 Attachment
            On Wed, 1 Dec 2004, Allan Dystrup wrote:

            > Yes i can follow your line of reasoning here, step by step building up
            > the "negated" Regex'es. ...
            > It is workable (actually more 'work' than 'able' to my taste...), so
            > what i was really looking for was a built-in Regex
            > operator/metacharacter like [^ ] for char. classes or (?! ) for
            > lookahead, a feature that would just "invert" any given Regex, so to
            > speak.

            everything I've seen indicates that there isn't any such feature,
            and the complementing a regexp tends to be laborious, messy or both


            another error in a previous message of mine:

            > > (C^[X])(\d{3})| # match the initial C

            the ^ should be _inside_ the [ ], i.e.

            (C[^X])(\d{3})

            > > (J^[ABYZ]))(\d{1})| # match J followed by

            which should be

            (J[^ABYZ])(\d{1})
          • Jonathan Paton
            Dear Allan, I think you are looking for: (?!pattern) A zero-width negative look-ahead assertion. For example /foo(?!bar)/ matches any occurrence of foo
            Message 5 of 20 , Dec 1, 2004
            • 0 Attachment
              Dear Allan,

              I think you are looking for:

              "(?!pattern)"
              A zero-width negative look-ahead assertion. For example
              "/foo(?!bar)/" matches any occurrence of "foo" that isn't
              followed by "bar". Note however that look-ahead and look-
              behind are NOT the same thing. You cannot use this for
              look-behind.

              ...

              from perldoc perlre

              You might need to wrap the regex with ^ and $ assertions.

              Jonathan Paton
            • Allan Dystrup
              Hi Jonathan , Yes indeed, i ve reached the same conclusion. The (?!pattern) can solve the issue in as clean a way as it s probably possible with Regex es. Eg:
              Message 6 of 20 , Dec 1, 2004
              • 0 Attachment
                Hi Jonathan ,

                Yes indeed, i've reached the same conclusion.
                The (?!pattern) can solve the issue in as clean a way
                as it's probably possible with Regex'es. Eg:

                Range RegEx Complement
                ----------- ----------- ----------
                CX365-CX366 CX36(5|6) ^(?!CX36(5|6))
                JA300-JA302 JA30[0-2] ^(?!JA30[0-2])
                JA320-JA394 JA3(([2-8]\d)|(9[0-4])) ^(?!(JA3(([2-8]\d)|(9[0-4]))
                etc.

                Thanks a lot,
                Allan


                --- In perl-beginner@yahoogroups.com, Jonathan Paton <jepaton@g...>
                wrote:
                > Dear Allan,
                >
                > I think you are looking for:
                >
                > "(?!pattern)"
                > A zero-width negative look-ahead assertion. For
                example
                > "/foo(?!bar)/" matches any occurrence of "foo"
                that isn't
                > followed by "bar". Note however that look-ahead
                and look-
                > behind are NOT the same thing. You cannot use
                this for
                > look-behind.
                >
                > ...
                >
                > from perldoc perlre
                >
                > You might need to wrap the regex with ^ and $ assertions.
                >
                > Jonathan Paton
              Your message has been successfully submitted and would be delivered to recipients shortly.