Loading ...
Sorry, an error occurred while loading the content.

Re: [Clip] Need help figuring out special RegExpr rule

Expand Messages
  • Alec Burgess
    This turns out to require a neat usage of positive lookaheads. I think it was sometime in the last year that I gained the AHA insight that a positive
    Message 1 of 8 , May 7, 2013
    • 0 Attachment
      This turns out to require a neat usage of positive lookaheads. I think
      it was sometime in the last year that I gained the "AHA" insight that a
      positive lookahead could precede the desired match rather than follow
      it. (from Flo? or perhaps RegexBuddy forum?) and be used to discard
      potential matches with a less restrictive description

      Here's the regexp to identify matching 8 character groups (on multiple
      lines for readability):
      \b
      (?=.{0,8}[a-z].{0,8}[a-z]) # within 2 runs of up to 8 characters insist
      on 2 lowercase else NOMATCH
      (?=.{0,8}[A-Z].{0,8}[A-Z]) # ditto for uppercase else NOMATCH
      (?=.{0,8}[0-9]) # within 1 run of 8 characters insist on 1 digit (more
      than 1 is still allowed but not mandatory) else NOMATCH
      ([a-zA-Z0-9]{8}) # finally match the 8 upper/lower/digit combo subject
      to preceding checks.
      \b

      Astute readers will notice this is not quite perfect - eg the second
      lower case could actually be matched in a second word.

      On one line suitable for use in a clip:
      \b(?=.{0,8}[a-z].{0,8}[a-z])(?=.{0,8}[A-Z].{0,8}[A-Z])(?=.{0,8}[0-9])([a-zA-Z0-9]{8})\b

      And same thing within the wrapping that requires the match be within a
      URL as Wizcraft requested (probably long-line):
      "http://[a-z0-9-_.]+\.[a-z]{2,4}/(?=.{0,8}[a-z].{0,8}[a-z])(?=.{0,8}[A-Z].{0,8}[A-Z])(?=.{0,8}[0-9])([a-zA-Z0-9]{8})/index\.html"

      This matches: --> "http://somedomain.comx/aaAA112b/index.html"
      but not ------------> "http://somedomain.comx/Mytrip12/index.html"

      Note \.[a-z]{2,4} to match .com, .uk, .net etc isn't quite correct since
      it misses .on.ca etc but could be tweaked if needed.

      On 2013-05-07 18:16, Don Daugherty wrote:
      > On 5/26/2012 12:43 AM, Wizcrafts wrote:
      > > I am stumped trying to create a regular expression filter that will
      > match all of the following conditions. It is part of a larger spam filter.
      > >
      > > Conditions:
      > >
      > > 1: There is a continuous group consisting of exactly 8 alphanumeric
      > characters, all standard ASCII. Inside this group are the following
      > conditions:
      > >
      > > 2: There are at least 2 uppercase letters
      > >
      > > 3: There are at least 2 lowercase letters
      > >
      > > 4: there is at least one number between 0-9 (usually 1 or 2)
      > >
      > > 5: There are no other characters of any kind (no spaces,
      > punctuation, etc)
      > >
      > > Here are 3 examples of the group I am trying to match:
      > >
      > > tR4hGGUK
      > > 2UJeiy9m
      > > WbNDSk9e
      > >
      > > These groups of 8 are all different from one another and are used in
      > spam runs leading to the BlackHole Exploit Kit. I need a specific
      > Regular Expression that will detect this type of mixed case
      > alphanumeric directory name and trigger my filter.
      > >
      > > Right now I am using the simplistic condition: (?i)[a-zA-Z0-9]{8}
      > >
      > > The group rests inside the following filter:
      > >
      > > "http://[a-z0-9-_.]+\.[a-z]{2,4}/(?i)[a-zA-Z0-9]{8}/index\.html"
      > >
      > > The only switch I can use is the case switch: (?i)
      > >
      > > The trailing letter switches appended in NoteTab REs will not work
      > in the program I write the filters for (no RAWS, etc).
      > >
      > > Unfortunately, my current rule would also match Mytrip12, which is
      > nothing like the gibberish characters shown above. In all of the URLs
      > I have analyzed, they always have a mix of upper and lower case
      > letters and one or two numbers, probably formed by a random character
      > generator. I have not seen a recognizable word yet.
      > >
      > > Thanks in advance for any help.
      > >
      > >
      > I recently came across your message from almost a year ago. Did you get
      > a solution?
    • flo.gehrke
      ... Alec, That s an interesting alternative to the pattern I posted with... http://tech.groups.yahoo.com/group/ntb-clips/message/22759 on May 28,2012. However,
      Message 2 of 8 , May 10, 2013
      • 0 Attachment
        --- In ntb-clips@yahoogroups.com, Alec Burgess <buralex@...> wrote:
        >
        > On one line suitable for use in a clip:
        > \b(?=.{0,8}[a-z].{0,8}[a-z])(?=.{0,8}[A-Z].{0,8}[A-Z])(?=.{0,8}[0-9])([a-zA-Z0-9]{8})\b

        Alec,

        That's an interesting alternative to the pattern I posted with...

        http://tech.groups.yahoo.com/group/ntb-clips/message/22759

        on May 28,2012.

        However, it causes some backtracking that you may want to avoid. Tested with Regex Coach against 'tR4HGGUK', your pattern needs 120 steps to achieve a match.

        My proposal of May 2012 -- without Posix character classes and changed a bit...

        \b(?=.*?[a-z].*?[a-z])(?=.*?[A-Z].*?[A-Z])(?=[^\d]*\d)[a-zA-Z0-9]{8}\b

        needs only 45 steps in this case. This could also be written as...

        \b(?=.*?([[:lower:]]).*?(?1))(?=.*?([[:upper:]]).*?(?2))(?=[^\d]*\d.*$)[[:alnum:]]{8}\b

        For checking...

        > This matches: --> "http://somedomain.comx/aaAA112b/index.html"
        > but not ------------> "http://somedomain.comx/Mytrip12/index.html"

        we would have to use...

        http://somedomain.comx/(?=.*?([[:lower:]]).*?(?1))(?=.*?([[:upper:]]).*?(?2))(?=[^\d]*\d.*$)[[:alnum:]]{8}/index.html

        (All patterns in one long line -- no line breaks!)

        Regards,
        Flo
      Your message has been successfully submitted and would be delivered to recipients shortly.