Loading ...
Sorry, an error occurred while loading the content.

Re: regex help ....

Expand Messages
  • diodeom
    ... I think I d indeed do just what you suggested. Search for: ^([ w ]+?)( p{Lu} p{Ll}+), ( pL+)( d+) ([ d .]+) pL? Replace with: $1 t$2 t$3 t$4 t$5
    Message 1 of 6 , May 2, 2010
    • 0 Attachment
      Don <don@...> wrote:
      >
      > Oak RidgeDelong, Christina22 13.86
      >
      > to
      >
      > Oak Ridge[tab]Delong[tab]Christina[tab]22[tab]13.86
      >
      > Look back from the comma for a capital letter or something?
      >

      I think I'd indeed do just what you suggested.

      Search for:
      "^([\w ]+?)(\p{Lu}\p{Ll}+), (\pL+)(\d+) ([\d\.]+)\pL?"

      Replace with:
      "$1\t$2\t$3\t$4\t$5"
    • flo.gehrke
      ... Just another idea (one long line)... ^!Replace ((?
      Message 2 of 6 , May 3, 2010
      • 0 Attachment
        --- In ntb-clips@yahoogroups.com, Don <don@...> wrote:
        >
        > ReynoldsburgGause, Destinee1 12.35q
        >
        > to become
        >
        > Reynoldsburg[tab]Gause[tab]Destinee[tab]1[tab]12.35
        >
        > There may or may note be a q at the end.
        >
        > Another example:
        > Oak RidgeDelong, Christina22 13.86
        >
        > to
        >
        > Oak Ridge[tab]Delong[tab]Christina[tab]22[tab]13.86
        >
        > Look back from the comma for a capital letter or something?
        >

        Just another idea (one long line)...

        ^!Replace "((?<=[[:lower:]])(?=[[:upper:]])|\x20|(?<=[[:lower:]])(?=\d))" >> "\t" AWRS

        Flo
      • Don
        Wish I had a clue what you just did ... it worked great ... except there are some other formats I guess ... that did not convert ... periods or hypens it
        Message 3 of 6 , May 3, 2010
        • 0 Attachment
          Wish I had a clue what you just did ...
          it worked great ... except there are some other formats I guess ... that
          did not convert ... periods or hypens it appears.
          Day. ThurgooScott, Ja'Naye3 12.77q
          Ft. Wayne WaPerry, Jasmine4 12.82q
          Tol. BowsherFranklin, Joy6 12.91q
          Trotwood-MadDavis, Alexis7 12.93q
          Ft. Wayne PaHardy, Victoria8 13.04q
          Day. DunbarCherry, Meghan9 13.14
          Det. MumfordGreen, Rosalynd10 13.16

          > I think I'd indeed do just what you suggested.
          >
          > Search for:
          > "^([\w \.\-]+?)(\p{Lu}\p{Ll}+), (\pL+)(\d+) ([\d\.]+)\pL?"
          >
          > Replace with:
          > "$1\t$2\t$3\t$4\t$5"
          >
          >
        • diodeom
          I see that you ve already edited the first capturing pattern to account for possible periods and dashes in school names. To handle apostrophes in first names
          Message 4 of 6 , May 3, 2010
          • 0 Attachment
            I see that you've already edited the first capturing pattern to account for possible periods and dashes in school names. To handle apostrophes in first names (Ja'Naye) the third pattern (\pL+) could be broadened as well into ([\pL']+).

            So the whole search part would read:
            "^([\w \.\-]+?)(\p{Lu}\p{Ll}+), ([\pL']+)(\d+) ([\d\.]+)\pL?"

            To break it down, find:
            at a line's beginning
            ^
            only as many "word" characters, spaces and specified punctuation
            ([\w \.\-]+?)
            until you stumble upon a capitalized word (single uppercase letter followed by a maximum number of lowercase letters) just before a comma and a space
            (\p{Lu}\p{Ll}+),
            followed by a maximum number of letters and/or apostrophes
            ([\pL']+)
            followed by a maximum number of digits before a space,
            (\d+)
            and after this space, max number of digits and/or dots
            ([\d\.]+)
            followed by an optional single letter
            \pL?

            Parentheses specify substrings to capture (and refer to later with $1, $2, etc.), so any undesirable fluff can be eliminated in the replacement.

            I'm afraid this pattern could yet grow quite a bit, e.g. if we were to consider a possible case of a dashed school's name (Trotwood-Mad) and similarly dashed runner's last name (Vasques-Ramirez)...
            I think that Flo's solution could avoid these issues altogether, though (as is) it doesn't look like it would know NOT to split e.g. "Oak Ridge" with a tab.

            --- Don <don@...> wrote:
            >
            > Wish I had a clue what you just did ...
            > it worked great ... except there are some other formats I guess ... that
            > did not convert ... periods or hypens it appears.
            > Day. ThurgooScott, Ja'Naye3 12.77q
            > Ft. Wayne WaPerry, Jasmine4 12.82q
            > Tol. BowsherFranklin, Joy6 12.91q
            > Trotwood-MadDavis, Alexis7 12.93q
            > Ft. Wayne PaHardy, Victoria8 13.04q
            > Day. DunbarCherry, Meghan9 13.14
            > Det. MumfordGreen, Rosalynd10 13.16
            >
            > > I think I'd indeed do just what you suggested.
            > >
            > > Search for:
            > > "^([\w \.\-]+?)(\p{Lu}\p{Ll}+), (\pL+)(\d+) ([\d\.]+)\pL?"
            > >
            > > Replace with:
            > > "$1\t$2\t$3\t$4\t$5"
            > >
            > >
            >
          • diodeom
            ... This seems to work: Search for: p{Ll} K(?= p{Lu})|, | pL K(?= d)| d K (?= d) Replace (WARS) with: t It just leaves an occasional q at the end of
            Message 5 of 6 , May 3, 2010
            • 0 Attachment
              I wrote:
              >
              > I think that Flo's solution could avoid these issues altogether, though (as is) it doesn't look like it would know NOT to split e.g. "Oak Ridge" with a tab.
              >

              This seems to work:

              Search for:
              "\p{Ll}\K(?=\p{Lu})|, |\pL\K(?=\d)|\d\K (?=\d)"

              Replace (WARS) with:
              "\t"

              It just leaves an occasional "q" at the end of some lines to clean.
            Your message has been successfully submitted and would be delivered to recipients shortly.