Loading ...
Sorry, an error occurred while loading the content.

regex help ....

Expand Messages
  • Don
    ReynoldsburgGause, Destinee1 12.35q to become Reynoldsburg[tab]Gause[tab]Destinee[tab]1[tab]12.35 There may or may note be a q at the end. Another example: Oak
    Message 1 of 6 , May 2 8:55 PM
    • 0 Attachment
      ReynoldsburgGause, Destinee1 12.35q

      to become

      Reynoldsburg[tab]Gause[tab]Destinee[tab]1[tab]12.35

      There may or may note be a q at the end.

      Another example:
      Oak RidgeDelong, Christina22 13.86

      to

      Oak Ridge[tab]Delong[tab]Christina[tab]22[tab]13.86

      Look back from the comma for a capital letter or something?
    • diodeom
      ... I think I d indeed do just what you suggested. Search for: ^([ w ]+?)( p{Lu} p{Ll}+), ( pL+)( d+) ([ d .]+) pL? Replace with: $1 t$2 t$3 t$4 t$5
      Message 2 of 6 , May 2 9:32 PM
      • 0 Attachment
        Don <don@...> wrote:
        >
        > Oak RidgeDelong, Christina22 13.86
        >
        > to
        >
        > Oak Ridge[tab]Delong[tab]Christina[tab]22[tab]13.86
        >
        > Look back from the comma for a capital letter or something?
        >

        I think I'd indeed do just what you suggested.

        Search for:
        "^([\w ]+?)(\p{Lu}\p{Ll}+), (\pL+)(\d+) ([\d\.]+)\pL?"

        Replace with:
        "$1\t$2\t$3\t$4\t$5"
      • flo.gehrke
        ... Just another idea (one long line)... ^!Replace ((?
        Message 3 of 6 , May 3 2:06 AM
        • 0 Attachment
          --- In ntb-clips@yahoogroups.com, Don <don@...> wrote:
          >
          > ReynoldsburgGause, Destinee1 12.35q
          >
          > to become
          >
          > Reynoldsburg[tab]Gause[tab]Destinee[tab]1[tab]12.35
          >
          > There may or may note be a q at the end.
          >
          > Another example:
          > Oak RidgeDelong, Christina22 13.86
          >
          > to
          >
          > Oak Ridge[tab]Delong[tab]Christina[tab]22[tab]13.86
          >
          > Look back from the comma for a capital letter or something?
          >

          Just another idea (one long line)...

          ^!Replace "((?<=[[:lower:]])(?=[[:upper:]])|\x20|(?<=[[:lower:]])(?=\d))" >> "\t" AWRS

          Flo
        • Don
          Wish I had a clue what you just did ... it worked great ... except there are some other formats I guess ... that did not convert ... periods or hypens it
          Message 4 of 6 , May 3 8:09 AM
          • 0 Attachment
            Wish I had a clue what you just did ...
            it worked great ... except there are some other formats I guess ... that
            did not convert ... periods or hypens it appears.
            Day. ThurgooScott, Ja'Naye3 12.77q
            Ft. Wayne WaPerry, Jasmine4 12.82q
            Tol. BowsherFranklin, Joy6 12.91q
            Trotwood-MadDavis, Alexis7 12.93q
            Ft. Wayne PaHardy, Victoria8 13.04q
            Day. DunbarCherry, Meghan9 13.14
            Det. MumfordGreen, Rosalynd10 13.16

            > I think I'd indeed do just what you suggested.
            >
            > Search for:
            > "^([\w \.\-]+?)(\p{Lu}\p{Ll}+), (\pL+)(\d+) ([\d\.]+)\pL?"
            >
            > Replace with:
            > "$1\t$2\t$3\t$4\t$5"
            >
            >
          • diodeom
            I see that you ve already edited the first capturing pattern to account for possible periods and dashes in school names. To handle apostrophes in first names
            Message 5 of 6 , May 3 9:08 AM
            • 0 Attachment
              I see that you've already edited the first capturing pattern to account for possible periods and dashes in school names. To handle apostrophes in first names (Ja'Naye) the third pattern (\pL+) could be broadened as well into ([\pL']+).

              So the whole search part would read:
              "^([\w \.\-]+?)(\p{Lu}\p{Ll}+), ([\pL']+)(\d+) ([\d\.]+)\pL?"

              To break it down, find:
              at a line's beginning
              ^
              only as many "word" characters, spaces and specified punctuation
              ([\w \.\-]+?)
              until you stumble upon a capitalized word (single uppercase letter followed by a maximum number of lowercase letters) just before a comma and a space
              (\p{Lu}\p{Ll}+),
              followed by a maximum number of letters and/or apostrophes
              ([\pL']+)
              followed by a maximum number of digits before a space,
              (\d+)
              and after this space, max number of digits and/or dots
              ([\d\.]+)
              followed by an optional single letter
              \pL?

              Parentheses specify substrings to capture (and refer to later with $1, $2, etc.), so any undesirable fluff can be eliminated in the replacement.

              I'm afraid this pattern could yet grow quite a bit, e.g. if we were to consider a possible case of a dashed school's name (Trotwood-Mad) and similarly dashed runner's last name (Vasques-Ramirez)...
              I think that Flo's solution could avoid these issues altogether, though (as is) it doesn't look like it would know NOT to split e.g. "Oak Ridge" with a tab.

              --- Don <don@...> wrote:
              >
              > Wish I had a clue what you just did ...
              > it worked great ... except there are some other formats I guess ... that
              > did not convert ... periods or hypens it appears.
              > Day. ThurgooScott, Ja'Naye3 12.77q
              > Ft. Wayne WaPerry, Jasmine4 12.82q
              > Tol. BowsherFranklin, Joy6 12.91q
              > Trotwood-MadDavis, Alexis7 12.93q
              > Ft. Wayne PaHardy, Victoria8 13.04q
              > Day. DunbarCherry, Meghan9 13.14
              > Det. MumfordGreen, Rosalynd10 13.16
              >
              > > I think I'd indeed do just what you suggested.
              > >
              > > Search for:
              > > "^([\w \.\-]+?)(\p{Lu}\p{Ll}+), (\pL+)(\d+) ([\d\.]+)\pL?"
              > >
              > > Replace with:
              > > "$1\t$2\t$3\t$4\t$5"
              > >
              > >
              >
            • diodeom
              ... This seems to work: Search for: p{Ll} K(?= p{Lu})|, | pL K(?= d)| d K (?= d) Replace (WARS) with: t It just leaves an occasional q at the end of
              Message 6 of 6 , May 3 2:48 PM
              • 0 Attachment
                I wrote:
                >
                > I think that Flo's solution could avoid these issues altogether, though (as is) it doesn't look like it would know NOT to split e.g. "Oak Ridge" with a tab.
                >

                This seems to work:

                Search for:
                "\p{Ll}\K(?=\p{Lu})|, |\pL\K(?=\d)|\d\K (?=\d)"

                Replace (WARS) with:
                "\t"

                It just leaves an occasional "q" at the end of some lines to clean.
              Your message has been successfully submitted and would be delivered to recipients shortly.