Loading ...
Sorry, an error occurred while loading the content.

20624[Clip] Re: regex help ....

Expand Messages
  • diodeom
    May 3, 2010
    • 0 Attachment
      I see that you've already edited the first capturing pattern to account for possible periods and dashes in school names. To handle apostrophes in first names (Ja'Naye) the third pattern (\pL+) could be broadened as well into ([\pL']+).

      So the whole search part would read:
      "^([\w \.\-]+?)(\p{Lu}\p{Ll}+), ([\pL']+)(\d+) ([\d\.]+)\pL?"

      To break it down, find:
      at a line's beginning
      only as many "word" characters, spaces and specified punctuation
      ([\w \.\-]+?)
      until you stumble upon a capitalized word (single uppercase letter followed by a maximum number of lowercase letters) just before a comma and a space
      followed by a maximum number of letters and/or apostrophes
      followed by a maximum number of digits before a space,
      and after this space, max number of digits and/or dots
      followed by an optional single letter

      Parentheses specify substrings to capture (and refer to later with $1, $2, etc.), so any undesirable fluff can be eliminated in the replacement.

      I'm afraid this pattern could yet grow quite a bit, e.g. if we were to consider a possible case of a dashed school's name (Trotwood-Mad) and similarly dashed runner's last name (Vasques-Ramirez)...
      I think that Flo's solution could avoid these issues altogether, though (as is) it doesn't look like it would know NOT to split e.g. "Oak Ridge" with a tab.

      --- Don <don@...> wrote:
      > Wish I had a clue what you just did ...
      > it worked great ... except there are some other formats I guess ... that
      > did not convert ... periods or hypens it appears.
      > Day. ThurgooScott, Ja'Naye3 12.77q
      > Ft. Wayne WaPerry, Jasmine4 12.82q
      > Tol. BowsherFranklin, Joy6 12.91q
      > Trotwood-MadDavis, Alexis7 12.93q
      > Ft. Wayne PaHardy, Victoria8 13.04q
      > Day. DunbarCherry, Meghan9 13.14
      > Det. MumfordGreen, Rosalynd10 13.16
      > > I think I'd indeed do just what you suggested.
      > >
      > > Search for:
      > > "^([\w \.\-]+?)(\p{Lu}\p{Ll}+), (\pL+)(\d+) ([\d\.]+)\pL?"
      > >
      > > Replace with:
      > > "$1\t$2\t$3\t$4\t$5"
      > >
      > >
    • Show all 6 messages in this topic