Loading ...
Sorry, an error occurred while loading the content.

Converting Raw Columnarl Text Data to CSV

Expand Messages
  • Jon Moss
    Basically, I have a list which was copied and pasted from a telnet session. It s not tab delimited but is lined up well in Courier type fonts. I want to
    Message 1 of 11 , Mar 23, 2007
    • 0 Attachment
      Basically, I have a list which was copied and pasted from a telnet
      session. It's not tab delimited but is lined up well in Courier type
      fonts.

      I want to convert this columnar list to a CSV file to be imported into
      a database (probably MySQL).

      I've search through this list and a couple of others and read through
      some of NoteTabLite's help file, but I'm just not finding anything
      helpful.

      Here's a short example:

      No. Name Current Rank Lvl Class Race Last On
      --- ------------ -------------------- --- ---------- ---------- ----------
      1 zzzz Novice 105 Mage Half-griff 22 Mar
      2007
      2 xxxxxx Novice 48 Thief Triton 20 Dec
      2006
      3 yyyyyy Novice 123 Warrior Wolfen 20 Jan
      2007
      4 wwwwww Novice 10 Thief Wolfen 22 Mar
      2007

      Any suggestions would be very helpful.

      Jon Moss
    • Jeff Scism
      Hmm. Roll two D12 and take the damage, after subtracting your stamina... (Jeff who was playing Dungeons and Dragons in the mid 70 s before it had a name....
      Message 2 of 11 , Mar 23, 2007
      • 0 Attachment
        Hmm. Roll two D12 and take the damage, after subtracting your stamina...

        (Jeff who was playing Dungeons and Dragons in the mid 70's before it had
        a name.... (The Arduin Grimoire was our bible))


        Jon Moss wrote:
        >
        > Basically, I have a list which was copied and pasted from a telnet
        > session. It's not tab delimited but is lined up well in Courier type
        > fonts.
        >
        > I want to convert this columnar list to a CSV file to be imported into
        > a database (probably MySQL).
        >
        > I've search through this list and a couple of others and read through
        > some of NoteTabLite's help file, but I'm just not finding anything
        > helpful.
        >
        > Here's a short example:
        >
        > No. Name Current Rank Lvl Class Race Last On
        > --- ------------ -------------------- --- ---------- ---------- ----------
        > 1 zzzz Novice 105 Mage Half-griff 22 Mar
        > 2007
        > 2 xxxxxx Novice 48 Thief Triton 20 Dec
        > 2006
        > 3 yyyyyy Novice 123 Warrior Wolfen 20 Jan
        > 2007
        > 4 wwwwww Novice 10 Thief Wolfen 22 Mar
        > 2007
        >
        > Any suggestions would be very helpful.
        >
        > Jon Moss
        >
      • Axel Berger
        ... Do you happen to use Windows and and Microsoft Excel? As far as I know the latter does quite a decent job of reading tables like yours and it can export
        Message 3 of 11 , Mar 23, 2007
        • 0 Attachment
          Jon Moss wrote:
          > Here's a short example:

          Do you happen to use Windows and and Microsoft Excel? As far as I know
          the latter does quite a decent job of reading tables like yours and it
          can export CSV. Not a clip, but maybe worth a try.
          (Can try it, no Excel here - no stolen programs at all ind facht and
          Excel is too expensive for what it could offer me.)
          Axel
        • Don - HtmlFixIt.com
          Several options for going to columnar data. 1. find and explore the notetab clip for such data (think it s on Jody s site); 2. use excel as someone suggested,
          Message 4 of 11 , Mar 23, 2007
          • 0 Attachment
            Several options for going to columnar data.

            1. find and explore the notetab clip for such data (think it's on Jody's
            site);

            2. use excel as someone suggested, use import data and select choice for
            columnar data;

            3. use a simple regex replace to replace multiple spaces with a delimiter;

            4. check out csv edit (freeware also I think) which will import and
            handle it for you.

            Happy to help you get it done if you wish to contact me.

            Don
          • Axel Berger
            ... Obvious solution and much easier than starting some humungous bloatware, why didn t I see that? But now that I have I looked and it won t work ... ^
            Message 5 of 11 , Mar 24, 2007
            • 0 Attachment
              "Don - HtmlFixIt.com" wrote:
              > 3. use a simple regex replace to replace multiple spaces
              > with a delimiter;

              Obvious solution and much easier than starting some humungous bloatware,
              why didn't I see that? But now that I have I looked and it won't work
              here. Look at:

              >1 zzzz Novice 105 Mage Half-griff 22 Mar
              ^ ^ ^

              Axel
            • Sheri
              If you have NoteTab 5.1 you can try this clip; in addition to putting commas between fields, it will quote non-numberic fields. If you want all fields quoted,
              Message 6 of 11 , Mar 24, 2007
              • 0 Attachment
                If you have NoteTab 5.1 you can try this clip; in addition to putting
                commas between fields, it will quote non-numberic fields. If you want
                all fields quoted, comment out the tests for ^$IsNumber$. If you don't
                want any fields quoted, comment out the tests for ^$IsNumber$ as well
                as the code insertions of double quotes.

                Regards,
                Sheri

                ;column widths
                ;No 4
                ;Name 12
                ;Rank 20
                ;Lvl 3
                ;Class 10
                ;Race 10
                ;Last On up to 11
                ^!SetScreenupdate off
                :Start
                ^!Find "^(.{4}) (.{12}) (.{20}) (.{3}) (.{10}) (.{10}) (.{0,11})$" RS
                ^!IfError Out
                ^!SetArray %thisrow%=^$GetReSubStrings$
                ^!Set %i%=0
                :insfields
                ^!Inc %i%
                ^!If ^%i%>^%thisrow0% Start
                ^!IfFalse ^$IsNumber(^$StrTrim(^%thisrow^%i%%)$)$ Next Else Skip_1
                ^!InsertCode "
                ^!InsertText ^$StrTrim(^%thisrow^%i%%)$
                ^!IfFalse ^$IsNumber(^$StrTrim(^%thisrow^%i%%)$)$ Next Else Skip_1
                ^!InsertCode "
                ^!If ^%i%<^%thisrow0% Next Else Skip_1
                ^!InsertText ","
                ^!Goto insfields
                :Out
                ^!Info All Done
              • buralex@gmail.com
                ... Rather than quoting string fields containing spaces, I usually just use a TAB rather than a COMMA as delimiter when working with CSV files - hence no
                Message 7 of 11 , Mar 24, 2007
                • 0 Attachment
                  > "Sheri" <silvermoonwoman@...> said on 03/24/2007 8:26:31 AM -0400
                  > If you have NoteTab 5.1 you can try this clip; in addition to putting
                  > commas between fields, it will quote non-numberic fields. If you want
                  > all fields quoted, comment out the tests for ^$IsNumber$. If you don't
                  > want any fields quoted, comment out the tests for ^$IsNumber$ as well
                  > as the code insertions of double quotes.
                  Rather than quoting string fields containing spaces, I usually just use
                  a TAB rather than a COMMA as delimiter when working with CSV files -
                  hence no quoting at all is needed. Are there any apps you are aware of
                  where this won't work? AFAIK just about everything I've seen will allow
                  you to specify TAB as delimiter rather than comma.

                  Regards ... Alec -- buralex-gmail
                  --
                • Don - HtmlFixIt.com
                  ... I also agree. I more often use the pipe character | myself, but I think that is a lot easier than introducing a bunch of quotes. ... Obvious solution and
                  Message 8 of 11 , Mar 24, 2007
                  • 0 Attachment
                    buralex@... wrote:
                    > Rather than quoting string fields containing spaces, I usually just use
                    > a TAB rather than a COMMA as delimiter when working with CSV files -
                    > hence no quoting at all is needed. Are there any apps you are aware of
                    > where this won't work? AFAIK just about everything I've seen will allow
                    > you to specify TAB as delimiter rather than comma.
                    I also agree. I more often use the pipe character | myself, but I think
                    that is a lot easier than introducing a bunch of quotes.

                    And:
                    > 3. use a simple regex replace to replace multiple spaces
                    > > with a delimiter;

                    Obvious solution and much easier than starting some humungous bloatware,
                    why didn't I see that? But now that I have I looked and it won't work
                    here. Look at:

                    > >1 zzzz Novice 105 Mage Half-griff 22 Mar
                    ^ ^ ^


                    I couldn't get a good read on your data structure so in this case it may
                    not work, but worth a shot.

                    Is Sheri's proposal covering you?
                  • Sheri
                    ... Hi Alec, the user specifically requested CSV format, which is short for comma separated values http://en.wikipedia.org/wiki/Comma-separated_values The
                    Message 9 of 11 , Mar 25, 2007
                    • 0 Attachment
                      buralex@... wrote:
                      >> "Sheri" <silvermoonwoman@...> said on 03/24/2007 8:26:31 AM -0400
                      >> If you have NoteTab 5.1 you can try this clip; in addition to putting
                      >> commas between fields, it will quote non-numberic fields. If you want
                      >> all fields quoted, comment out the tests for ^$IsNumber$. If you don't
                      >> want any fields quoted, comment out the tests for ^$IsNumber$ as well
                      >> as the code insertions of double quotes.
                      >>
                      > Rather than quoting string fields containing spaces, I usually just use
                      > a TAB rather than a COMMA as delimiter when working with CSV files -
                      > hence no quoting at all is needed. Are there any apps you are aware of
                      > where this won't work? AFAIK just about everything I've seen will allow
                      > you to specify TAB as delimiter rather than comma.
                      >
                      > Regards ... Alec -- buralex-gmail
                      >
                      Hi Alec, the user specifically requested CSV format, which is short for
                      "comma separated values" http://en.wikipedia.org/wiki/Comma-separated_values

                      The user could very well find that other types of delimited files would
                      suit his purposes, but tabs are not valid delimiters in CSVs.

                      Referenced page says it is necessary to escape double quotes within
                      fields containing double quotes (my clip is not doing so). Also, I don't
                      know if NoteTab's ^$IsNumber$ function would recognize a number
                      containing commas as a number. If so, the clip should be probably be
                      enhanced to remove commas from numeric fields.

                      Regards,
                      Sheri
                    • buralex@gmail.com
                      ... Good point (though your reference says: While no formal specification for CSV exists, RFC 4180 describes a common
                      Message 10 of 11 , Mar 25, 2007
                      • 0 Attachment
                        > Sheri <silvermoonwoman@...> said on 03/25/2007 12:33:32 PM -0400
                        >
                        > Hi Alec, the user specifically requested CSV format, which is short for
                        > "comma separated values" http://en.wikipedia.org/wiki/Comma-separated_values
                        >
                        Good point (though your reference says: While no formal specification
                        for CSV exists, RFC 4180 <http://tools.ietf.org/html/rfc4180> describes
                        a common format and establishes "text/csv" as the MIME type
                        <http://en.wikipedia.org/wiki/MIME_type> registered with the IANA
                        <http://en.wikipedia.org/wiki/Internet_Assigned_Numbers_Authority>.) I
                        agree that mine was a sloppy answer. :-[

                        Out of curiosity, I Googled Tab2CSV and got a bunch of hits. One near
                        the top on a download site lead to to:
                        http://www.info-pack.com/tab2csv/

                        On it you find a program which will do exactly that for the LOW LOW
                        price of 49.95! (ie. almost twice the price of Notetab 5.1!!). However
                        it does for that price include free email support and free upgrades :-)
                        I wonder how many copies they've sold?
                        > The user could very well find that other types of delimited files would
                        > suit his purposes, but tabs are not valid delimiters in CSVs.
                        >
                        > Referenced page says it is necessary to escape double quotes within
                        > fields containing double quotes (my clip is not doing so). Also, I don't
                        > know if NoteTab's ^$IsNumber$ function would recognize a number
                        > containing commas as a number. If so, the clip should be probably be
                        > enhanced to remove commas from numeric fields.

                        Regards ... Alec -- buralex-gmail
                        --



                        [Non-text portions of this message have been removed]
                      • jane_sedgewick
                        ... If you are using the PCRE version of the Regex you should be able to use a regex match of: ^( d+) s( w+) s( w+) s( w+) s([^ d r n]+) s(.+)$ and replace
                        Message 11 of 11 , Apr 2, 2007
                        • 0 Attachment
                          --- In ntb-clips@yahoogroups.com, "Jon Moss" <mossjon@...> wrote:
                          >
                          > Basically, I have a list which was copied and pasted from a telnet
                          > session. It's not tab delimited but is lined up well in Courier type
                          > fonts.
                          >
                          > I want to convert this columnar list to a CSV file to be imported into
                          > a database (probably MySQL).
                          >
                          > I've search through this list and a couple of others and read through
                          > some of NoteTabLite's help file, but I'm just not finding anything
                          > helpful.
                          >
                          > Here's a short example:
                          >
                          > No. Name Current Rank Lvl Class Race Last On
                          > --- ------------ -------------------- --- ---------- ----------
                          ----------
                          > 1 zzzz Novice 105 Mage Half-griff 22 Mar
                          > 2007
                          > 2 xxxxxx Novice 48 Thief Triton 20 Dec
                          > 2006
                          > 3 yyyyyy Novice 123 Warrior Wolfen 20 Jan
                          > 2007
                          > 4 wwwwww Novice 10 Thief Wolfen 22 Mar
                          > 2007
                          >
                          > Any suggestions would be very helpful.
                          >
                          > Jon Moss
                          >

                          If you are using the PCRE version of the Regex you should be able to
                          use a regex match of:
                          ^(\d+)\s(\w+)\s(\w+)\s(\w+)\s([^\d\r\n]+)\s(.+)$

                          and replace with:
                          $1,$2,$3,$4,$5,$6

                          Using that I get
                          1 zzzz Novice 105 Mage Half-griff 22 Mar 2007
                          2 xxxxxx Novice 48 Thief Triton 20 Dec 2006
                          3 yyyyyy Novice 123 Warrior Wolfen 20 Jan 2007
                          4 wwwwww Novice 10 Thief Wolfen 22 Mar 2007

                          converted to:

                          1,zzzz,Novice,105,Mage Half-griff,22 Mar 2007
                          2,xxxxxx,Novice,48,Thief Triton,20 Dec 2006
                          3,yyyyyy,Novice,123,Warrior Wolfen,20 Jan 2007
                          4,wwwwww,Novice,10,Thief Wolfen,22 Mar 2007

                          \d is shorthand character class for [0-9]
                          \w is shorthand character class for [a-zA-Z0-9]

                          It will handle any size number (1st field) and separates the first
                          4 fields on the spaces between them. Because the 5th field can contain
                          spaces it uses the start of the date number to determine the end of
                          the fifth field and start of the sixth.
                          I don't have version 5 of Notetab so you will have to tweak it as
                          necessary, but it works in other PCRE applications as well as PERL itself.
                          Cordially,
                          Jane
                        Your message has been successfully submitted and would be delivered to recipients shortly.