Loading ...
Sorry, an error occurred while loading the content.

Re: regex help ....

Expand Messages
  • flo.gehrke
    ... Just another idea (one long line)... ^!Replace ((?
    Message 1 of 6 , May 3, 2010
    • 0 Attachment
      --- In ntb-clips@yahoogroups.com, Don <don@...> wrote:
      >
      > ReynoldsburgGause, Destinee1 12.35q
      >
      > to become
      >
      > Reynoldsburg[tab]Gause[tab]Destinee[tab]1[tab]12.35
      >
      > There may or may note be a q at the end.
      >
      > Another example:
      > Oak RidgeDelong, Christina22 13.86
      >
      > to
      >
      > Oak Ridge[tab]Delong[tab]Christina[tab]22[tab]13.86
      >
      > Look back from the comma for a capital letter or something?
      >

      Just another idea (one long line)...

      ^!Replace "((?<=[[:lower:]])(?=[[:upper:]])|\x20|(?<=[[:lower:]])(?=\d))" >> "\t" AWRS

      Flo
    • Don
      Wish I had a clue what you just did ... it worked great ... except there are some other formats I guess ... that did not convert ... periods or hypens it
      Message 2 of 6 , May 3, 2010
      • 0 Attachment
        Wish I had a clue what you just did ...
        it worked great ... except there are some other formats I guess ... that
        did not convert ... periods or hypens it appears.
        Day. ThurgooScott, Ja'Naye3 12.77q
        Ft. Wayne WaPerry, Jasmine4 12.82q
        Tol. BowsherFranklin, Joy6 12.91q
        Trotwood-MadDavis, Alexis7 12.93q
        Ft. Wayne PaHardy, Victoria8 13.04q
        Day. DunbarCherry, Meghan9 13.14
        Det. MumfordGreen, Rosalynd10 13.16

        > I think I'd indeed do just what you suggested.
        >
        > Search for:
        > "^([\w \.\-]+?)(\p{Lu}\p{Ll}+), (\pL+)(\d+) ([\d\.]+)\pL?"
        >
        > Replace with:
        > "$1\t$2\t$3\t$4\t$5"
        >
        >
      • diodeom
        I see that you ve already edited the first capturing pattern to account for possible periods and dashes in school names. To handle apostrophes in first names
        Message 3 of 6 , May 3, 2010
        • 0 Attachment
          I see that you've already edited the first capturing pattern to account for possible periods and dashes in school names. To handle apostrophes in first names (Ja'Naye) the third pattern (\pL+) could be broadened as well into ([\pL']+).

          So the whole search part would read:
          "^([\w \.\-]+?)(\p{Lu}\p{Ll}+), ([\pL']+)(\d+) ([\d\.]+)\pL?"

          To break it down, find:
          at a line's beginning
          ^
          only as many "word" characters, spaces and specified punctuation
          ([\w \.\-]+?)
          until you stumble upon a capitalized word (single uppercase letter followed by a maximum number of lowercase letters) just before a comma and a space
          (\p{Lu}\p{Ll}+),
          followed by a maximum number of letters and/or apostrophes
          ([\pL']+)
          followed by a maximum number of digits before a space,
          (\d+)
          and after this space, max number of digits and/or dots
          ([\d\.]+)
          followed by an optional single letter
          \pL?

          Parentheses specify substrings to capture (and refer to later with $1, $2, etc.), so any undesirable fluff can be eliminated in the replacement.

          I'm afraid this pattern could yet grow quite a bit, e.g. if we were to consider a possible case of a dashed school's name (Trotwood-Mad) and similarly dashed runner's last name (Vasques-Ramirez)...
          I think that Flo's solution could avoid these issues altogether, though (as is) it doesn't look like it would know NOT to split e.g. "Oak Ridge" with a tab.

          --- Don <don@...> wrote:
          >
          > Wish I had a clue what you just did ...
          > it worked great ... except there are some other formats I guess ... that
          > did not convert ... periods or hypens it appears.
          > Day. ThurgooScott, Ja'Naye3 12.77q
          > Ft. Wayne WaPerry, Jasmine4 12.82q
          > Tol. BowsherFranklin, Joy6 12.91q
          > Trotwood-MadDavis, Alexis7 12.93q
          > Ft. Wayne PaHardy, Victoria8 13.04q
          > Day. DunbarCherry, Meghan9 13.14
          > Det. MumfordGreen, Rosalynd10 13.16
          >
          > > I think I'd indeed do just what you suggested.
          > >
          > > Search for:
          > > "^([\w \.\-]+?)(\p{Lu}\p{Ll}+), (\pL+)(\d+) ([\d\.]+)\pL?"
          > >
          > > Replace with:
          > > "$1\t$2\t$3\t$4\t$5"
          > >
          > >
          >
        • diodeom
          ... This seems to work: Search for: p{Ll} K(?= p{Lu})|, | pL K(?= d)| d K (?= d) Replace (WARS) with: t It just leaves an occasional q at the end of
          Message 4 of 6 , May 3, 2010
          • 0 Attachment
            I wrote:
            >
            > I think that Flo's solution could avoid these issues altogether, though (as is) it doesn't look like it would know NOT to split e.g. "Oak Ridge" with a tab.
            >

            This seems to work:

            Search for:
            "\p{Ll}\K(?=\p{Lu})|, |\pL\K(?=\d)|\d\K (?=\d)"

            Replace (WARS) with:
            "\t"

            It just leaves an occasional "q" at the end of some lines to clean.
          Your message has been successfully submitted and would be delivered to recipients shortly.