Loading ...
Sorry, an error occurred while loading the content.

Re: [Clip] Need help figuring out special RegExpr rule

Expand Messages
  • Don Daugherty
    ... I recently came across your message from almost a year ago. Did you get a solution?
    Message 1 of 8 , May 7, 2013
    • 0 Attachment
      On 5/26/2012 12:43 AM, Wizcrafts wrote:
      > I am stumped trying to create a regular expression filter that will match all of the following conditions. It is part of a larger spam filter.
      >
      > Conditions:
      >
      > 1: There is a continuous group consisting of exactly 8 alphanumeric characters, all standard ASCII. Inside this group are the following conditions:
      >
      > 2: There are at least 2 uppercase letters
      >
      > 3: There are at least 2 lowercase letters
      >
      > 4: there is at least one number between 0-9 (usually 1 or 2)
      >
      > 5: There are no other characters of any kind (no spaces, punctuation, etc)
      >
      > Here are 3 examples of the group I am trying to match:
      >
      > tR4hGGUK
      > 2UJeiy9m
      > WbNDSk9e
      >
      > These groups of 8 are all different from one another and are used in spam runs leading to the BlackHole Exploit Kit. I need a specific Regular Expression that will detect this type of mixed case alphanumeric directory name and trigger my filter.
      >
      > Right now I am using the simplistic condition: (?i)[a-zA-Z0-9]{8}
      >
      > The group rests inside the following filter:
      >
      > "http://[a-z0-9-_.]+\.[a-z]{2,4}/(?i)[a-zA-Z0-9]{8}/index\.html"
      >
      > The only switch I can use is the case switch: (?i)
      >
      > The trailing letter switches appended in NoteTab REs will not work in the program I write the filters for (no RAWS, etc).
      >
      > Unfortunately, my current rule would also match Mytrip12, which is nothing like the gibberish characters shown above. In all of the URLs I have analyzed, they always have a mix of upper and lower case letters and one or two numbers, probably formed by a random character generator. I have not seen a recognizable word yet.
      >
      > Thanks in advance for any help.
      >
      >
      I recently came across your message from almost a year ago. Did you get
      a solution?
    • Alec Burgess
      This turns out to require a neat usage of positive lookaheads. I think it was sometime in the last year that I gained the AHA insight that a positive
      Message 2 of 8 , May 7, 2013
      • 0 Attachment
        This turns out to require a neat usage of positive lookaheads. I think
        it was sometime in the last year that I gained the "AHA" insight that a
        positive lookahead could precede the desired match rather than follow
        it. (from Flo? or perhaps RegexBuddy forum?) and be used to discard
        potential matches with a less restrictive description

        Here's the regexp to identify matching 8 character groups (on multiple
        lines for readability):
        \b
        (?=.{0,8}[a-z].{0,8}[a-z]) # within 2 runs of up to 8 characters insist
        on 2 lowercase else NOMATCH
        (?=.{0,8}[A-Z].{0,8}[A-Z]) # ditto for uppercase else NOMATCH
        (?=.{0,8}[0-9]) # within 1 run of 8 characters insist on 1 digit (more
        than 1 is still allowed but not mandatory) else NOMATCH
        ([a-zA-Z0-9]{8}) # finally match the 8 upper/lower/digit combo subject
        to preceding checks.
        \b

        Astute readers will notice this is not quite perfect - eg the second
        lower case could actually be matched in a second word.

        On one line suitable for use in a clip:
        \b(?=.{0,8}[a-z].{0,8}[a-z])(?=.{0,8}[A-Z].{0,8}[A-Z])(?=.{0,8}[0-9])([a-zA-Z0-9]{8})\b

        And same thing within the wrapping that requires the match be within a
        URL as Wizcraft requested (probably long-line):
        "http://[a-z0-9-_.]+\.[a-z]{2,4}/(?=.{0,8}[a-z].{0,8}[a-z])(?=.{0,8}[A-Z].{0,8}[A-Z])(?=.{0,8}[0-9])([a-zA-Z0-9]{8})/index\.html"

        This matches: --> "http://somedomain.comx/aaAA112b/index.html"
        but not ------------> "http://somedomain.comx/Mytrip12/index.html"

        Note \.[a-z]{2,4} to match .com, .uk, .net etc isn't quite correct since
        it misses .on.ca etc but could be tweaked if needed.

        On 2013-05-07 18:16, Don Daugherty wrote:
        > On 5/26/2012 12:43 AM, Wizcrafts wrote:
        > > I am stumped trying to create a regular expression filter that will
        > match all of the following conditions. It is part of a larger spam filter.
        > >
        > > Conditions:
        > >
        > > 1: There is a continuous group consisting of exactly 8 alphanumeric
        > characters, all standard ASCII. Inside this group are the following
        > conditions:
        > >
        > > 2: There are at least 2 uppercase letters
        > >
        > > 3: There are at least 2 lowercase letters
        > >
        > > 4: there is at least one number between 0-9 (usually 1 or 2)
        > >
        > > 5: There are no other characters of any kind (no spaces,
        > punctuation, etc)
        > >
        > > Here are 3 examples of the group I am trying to match:
        > >
        > > tR4hGGUK
        > > 2UJeiy9m
        > > WbNDSk9e
        > >
        > > These groups of 8 are all different from one another and are used in
        > spam runs leading to the BlackHole Exploit Kit. I need a specific
        > Regular Expression that will detect this type of mixed case
        > alphanumeric directory name and trigger my filter.
        > >
        > > Right now I am using the simplistic condition: (?i)[a-zA-Z0-9]{8}
        > >
        > > The group rests inside the following filter:
        > >
        > > "http://[a-z0-9-_.]+\.[a-z]{2,4}/(?i)[a-zA-Z0-9]{8}/index\.html"
        > >
        > > The only switch I can use is the case switch: (?i)
        > >
        > > The trailing letter switches appended in NoteTab REs will not work
        > in the program I write the filters for (no RAWS, etc).
        > >
        > > Unfortunately, my current rule would also match Mytrip12, which is
        > nothing like the gibberish characters shown above. In all of the URLs
        > I have analyzed, they always have a mix of upper and lower case
        > letters and one or two numbers, probably formed by a random character
        > generator. I have not seen a recognizable word yet.
        > >
        > > Thanks in advance for any help.
        > >
        > >
        > I recently came across your message from almost a year ago. Did you get
        > a solution?
      • flo.gehrke
        ... Alec, That s an interesting alternative to the pattern I posted with... http://tech.groups.yahoo.com/group/ntb-clips/message/22759 on May 28,2012. However,
        Message 3 of 8 , May 10, 2013
        • 0 Attachment
          --- In ntb-clips@yahoogroups.com, Alec Burgess <buralex@...> wrote:
          >
          > On one line suitable for use in a clip:
          > \b(?=.{0,8}[a-z].{0,8}[a-z])(?=.{0,8}[A-Z].{0,8}[A-Z])(?=.{0,8}[0-9])([a-zA-Z0-9]{8})\b

          Alec,

          That's an interesting alternative to the pattern I posted with...

          http://tech.groups.yahoo.com/group/ntb-clips/message/22759

          on May 28,2012.

          However, it causes some backtracking that you may want to avoid. Tested with Regex Coach against 'tR4HGGUK', your pattern needs 120 steps to achieve a match.

          My proposal of May 2012 -- without Posix character classes and changed a bit...

          \b(?=.*?[a-z].*?[a-z])(?=.*?[A-Z].*?[A-Z])(?=[^\d]*\d)[a-zA-Z0-9]{8}\b

          needs only 45 steps in this case. This could also be written as...

          \b(?=.*?([[:lower:]]).*?(?1))(?=.*?([[:upper:]]).*?(?2))(?=[^\d]*\d.*$)[[:alnum:]]{8}\b

          For checking...

          > This matches: --> "http://somedomain.comx/aaAA112b/index.html"
          > but not ------------> "http://somedomain.comx/Mytrip12/index.html"

          we would have to use...

          http://somedomain.comx/(?=.*?([[:lower:]]).*?(?1))(?=.*?([[:upper:]]).*?(?2))(?=[^\d]*\d.*$)[[:alnum:]]{8}/index.html

          (All patterns in one long line -- no line breaks!)

          Regards,
          Flo
        Your message has been successfully submitted and would be delivered to recipients shortly.