Loading ...
Sorry, an error occurred while loading the content.
 

Re: Converting Raw Columnarl Text Data to CSV

Expand Messages
  • jane_sedgewick
    ... If you are using the PCRE version of the Regex you should be able to use a regex match of: ^( d+) s( w+) s( w+) s( w+) s([^ d r n]+) s(.+)$ and replace
    Message 1 of 11 , Apr 2, 2007
      --- In ntb-clips@yahoogroups.com, "Jon Moss" <mossjon@...> wrote:
      >
      > Basically, I have a list which was copied and pasted from a telnet
      > session. It's not tab delimited but is lined up well in Courier type
      > fonts.
      >
      > I want to convert this columnar list to a CSV file to be imported into
      > a database (probably MySQL).
      >
      > I've search through this list and a couple of others and read through
      > some of NoteTabLite's help file, but I'm just not finding anything
      > helpful.
      >
      > Here's a short example:
      >
      > No. Name Current Rank Lvl Class Race Last On
      > --- ------------ -------------------- --- ---------- ----------
      ----------
      > 1 zzzz Novice 105 Mage Half-griff 22 Mar
      > 2007
      > 2 xxxxxx Novice 48 Thief Triton 20 Dec
      > 2006
      > 3 yyyyyy Novice 123 Warrior Wolfen 20 Jan
      > 2007
      > 4 wwwwww Novice 10 Thief Wolfen 22 Mar
      > 2007
      >
      > Any suggestions would be very helpful.
      >
      > Jon Moss
      >

      If you are using the PCRE version of the Regex you should be able to
      use a regex match of:
      ^(\d+)\s(\w+)\s(\w+)\s(\w+)\s([^\d\r\n]+)\s(.+)$

      and replace with:
      $1,$2,$3,$4,$5,$6

      Using that I get
      1 zzzz Novice 105 Mage Half-griff 22 Mar 2007
      2 xxxxxx Novice 48 Thief Triton 20 Dec 2006
      3 yyyyyy Novice 123 Warrior Wolfen 20 Jan 2007
      4 wwwwww Novice 10 Thief Wolfen 22 Mar 2007

      converted to:

      1,zzzz,Novice,105,Mage Half-griff,22 Mar 2007
      2,xxxxxx,Novice,48,Thief Triton,20 Dec 2006
      3,yyyyyy,Novice,123,Warrior Wolfen,20 Jan 2007
      4,wwwwww,Novice,10,Thief Wolfen,22 Mar 2007

      \d is shorthand character class for [0-9]
      \w is shorthand character class for [a-zA-Z0-9]

      It will handle any size number (1st field) and separates the first
      4 fields on the spaces between them. Because the 5th field can contain
      spaces it uses the start of the date number to determine the end of
      the fifth field and start of the sixth.
      I don't have version 5 of Notetab so you will have to tweak it as
      necessary, but it works in other PCRE applications as well as PERL itself.
      Cordially,
      Jane
    Your message has been successfully submitted and would be delivered to recipients shortly.