Loading ...
Sorry, an error occurred while loading the content.

Re: [Clip] Create CSV file

Expand Messages
  • hsavage
    ... mlafount, You also may want to consider downloading CSVed , for handling, altering and creating CSV databases. The link is,
    Message 1 of 17 , Dec 6, 2007
    • 0 Attachment
      mlafount wrote:
      > Hi All,
      >
      > I'm looking for a clip that someone has probably already written :-)
      >
      > Let's say I got a string of tab delimited data like this:
      >
      > ABD072132 52 LaFountain, Michael
      > ABD073243 49 Smith, Sherri
      > ABD074354 12 Johnson, Marci
      >
      > I need a clip that will convert it to a CSV file like this:
      >
      > 'ABD072132', 52, 'LaFountain, Michael'
      > 'ABD073243', 49, 'Smith, Sherri'
      > 'ABD074354', 12, 'Johnson, Marci'
      >
      > The data above is just an example. I would like the clip to be
      > intelligent enough to recognize or be told a delimiter, add quotes
      > to any number of alpha fields and replace the delimiter with a comma
      > and space.
      >
      > Anyone got something like this laying around?
      >
      mlafount,

      You also may want to consider downloading 'CSVed', for handling,
      altering and creating CSV databases.

      The link is, http://home.hccnet.nl/s.j.francke/software/software.htm

      --
      ·············································
      ºvº SL-12-43 -created- 2007.12.06 - 07.22.28

      Great Truths Children Have Learned:
      "Never ask your 3-year old brother to hold a tomato."
      ¤ ø ¤ hrs ø hsavage@...
    • Sheri
      ... Here s one that quotes the nonnumeric fields ... This will convert the specific example, including quoting of nonnumeric fields. It allows commas and
      Message 2 of 17 , Dec 6, 2007
      • 0 Attachment
        --- In ntb-clips@yahoogroups.com, "mlafount" <mlafount@...> wrote:
        >
        > Hi All,
        >
        > I'm looking for a clip that someone has probably already written :-)
        >
        > Let's say I got a string of tab delimited data like this:
        >
        > ABD072132 52 LaFountain, Michael
        > ABD073243 49 Smith, Sherri
        > ABD074354 12 Johnson, Marci
        >
        > I need a clip that will convert it to a CSV file like this:
        >
        > 'ABD072132', 52, 'LaFountain, Michael'
        > 'ABD073243', 49, 'Smith, Sherri'
        > 'ABD074354', 12, 'Johnson, Marci'

        Here's one that quotes the nonnumeric fields
        >
        > The data above is just an example. I would like the clip to be
        > intelligent enough to recognize or be told a delimiter, add quotes
        > to any number of alpha fields and replace the delimiter with a comma
        > and space.
        >
        > Anyone got something like this laying around?
        >

        This will convert the specific example, including quoting of
        nonnumeric fields. It allows commas and periods in the numeric fields
        and numeric fields can start with a minus sign.

        Regards,
        Sheri

        ^!Replace "(^|\t)\K[^\t\r\n][^\t\r\n]*(?=(\t|$))" >> "'$0'" RAWS
        ^!Replace "(?:^|\t)\K'(\-?(?:[\d,.]+))'" >> "$1" RAWS
        ^!Replace "^T" >> ", " AWS
      • mlafount
        Thanks for all the suggestions. I really needed something that is fairly flexible. I often need to move data from an excel spreadsheet to a mainframe. The
        Message 3 of 17 , Dec 6, 2007
        • 0 Attachment
          Thanks for all the suggestions. I really needed something that is
          fairly flexible. I often need to move data from an excel spreadsheet
          to a mainframe. The easiest way I have found is via an SQL insert
          statement. Excel will let you mess with text qualifiers and other
          parameters when importing a file but it is limited when saving a CSV
          file. So, I needed the following clip.

          The idea is to copy all of the columns from a spreadsheet into
          NoteTab. This creates a tab delimited document. Run the clip and it
          creates a new document based on the formatting parameters. The first
          line is used as a model for the formatting of the subsequent lines.
          If the columns in the first line are: Alpha, Alpha, Numeric, Alpha,
          Numeric then all of the other lines will be formatted like the first
          line. If a row is missing fields, they will be filled and delimited
          as nulls. If there are extra columns in a row, they will be
          truncated.

          Maybe someone else will find it useful too.

          -Mike LaFountain


          Test Data:
          aaaaaaa bbbbbbb 111 ddd 4444 ffff
          bbbbbbb cccccc -222
          ddd eeee 333 gggg -6666 jjj kkkk 1211
          cccccc ddd 333 ffff -6666 hhh


          Clip:
          ; CREATE CSV

          ;=====================
          ; Get input variables
          ;=====================
          ^!Continue This clip will take the active tab (which should be a
          delimited file) and format it according to the selected options. The
          first row is used as a model for all columns. Continue?
          ^!ClearVariables
          ^!Set %LP%=(; %LB%=[; %LC%={; %LN=^%Empty%
          ^!Set %DelimiterIn%=^?[--Input Options-- Field delimiter in=_Tab^=^%
          TAB%|Comma^=,|Pipe^=^%VBar%|Semi-colon^=;|Space^=^%SPACE%|Equal^==]
          ^!Set %DelimiterOut%=^?[--Output Options-- Field delimiter=Tab^=^%
          TAB%|_Comma^=,|Comma/Space^=,^%Space%|Pipe^=^%VBar%|Semi-
          colon^=;|Space^=^%SPACE%|Equal^==]
          ^!Set %TxtQual%=^?[Text qualifier=_Apostrophe^='|Quote^="|None^=^%
          Empty%]
          ^!Set %NullAlpha%=^?[Null alpha field=_Null^=^%Empty%|Blank^=^%
          Space%]
          ^!Set %NullNumeric%=^?[Null numeric field=_Null^=^%Empty%|Zero^=0]
          ^!Set %Negative%=^?[Negative sign before or after
          number=Before^=B|_After^=A]
          ^!Set %EncloseL%=^?[Enclose Line=No^=^%LN%|_Parentheses^=^%LP%
          |Brackets^=^%LB%|Braces^=^%LC%]
          ^!Set %EndOfLine%=^?[Add to end of line=None^=^%Empty%|Tab^=^%TAB%
          |Comma^=,|_Space/Comma/Space^=^%Space%,^%Space%|Pipe^=^%VBar%|Semi-
          colon^=;|Space^=^%SPACE%|Equal^==]
          ^!IfCancel Exit

          ;================
          ; Set Delimiters
          ;================
          ^!SetListDelimiter=^%DelimiterIn%
          ^!SetDelimiter=^%DelimiterIn%

          ;===========================================
          ; Get maximum row count, set Row and Column
          ; Setup "right side" of enclose line
          ;===========================================
          ^!Set %MaxRow%=^$GetTextLineCount$
          ^!Set %r%=1; %c%=1
          ^!If ^%EncloseL%=^%LN% ^!Set %EncloseR%=^%LN%
          ^!If ^%EncloseL%=^%LP% ^!Set %EncloseR%=)
          ^!If ^%EncloseL%=^%LB% ^!Set %EncloseR%=]
          ^!If ^%EncloseL%=^%LC% ^!Set %EncloseR%=}

          ;================
          ; Get first line
          ;================
          ^!Jump Doc_Start
          ^!Select Line
          ^!SetArray %Format%=^$GetSelection$
          ^!Set %MaxColumn%=^%Format0%
          ^!Select 0

          ;====================================================================
          =
          ; Capture format from first line for each column (alpha or numeric).
          ; All subsequent lines will be forced to match the first line
          format.
          ;====================================================================
          =
          :FormatLoop
          ^!If ^$IsNumber(^%Format^%c%%)$ = 1 FormatN Else FormatA
          :FormatN
          ^!Set %Format^%c%%=N
          ^!Goto FormatC
          :FormatA
          ^!Set %Format^%c%%=A
          :FormatC
          ^!Inc %c%
          ^!If ^%c% <= ^%MaxColumn% FormatLoop

          ;====================
          ; Loop thru each row
          ;====================
          ^!Set %r%=1; %c%=1
          :RowLoop
          ^!Set %RowData%=^%Empty%

          ;=====================================
          ; Loop thru each column and get field
          ;=====================================
          ^!Set %c%=1
          :ColumnLoop
          ^!Set %Field%=^$GetField(^%r%;^%c%)$
          ^!If ^%Format^%c%=A FormatAlpha Else FormatNumeric

          ;=======================
          ; Format field as alpha
          ;=======================
          :FormatAlpha
          ^!Set %Field%=^%TxtQual%^%Field%^%TxtQual%
          ^!If ^%Field%=^%TxtQual%^%TxtQual% ^!Set %Field%=^%TxtQual%^%
          NullAlpha%^%TxtQual%
          ^!Goto AddDelimiter

          ;=========================
          ; Format field as numeric
          ;=========================
          :FormatNumeric
          ^!If ^%Field%=^%Empty% ^!Set %Field%=^%NullNumeric%
          ^!If ^%Field%<0 Next Else AddDelimiter
          ^!If ^%Field%=^%Empty% AddDelimiter
          ^!If ^%Negative%=B AddDelimiter
          ^!Set %Field%=^$Calc(^%Field%*-1)$-

          ;==============================
          ; Add delimiter, append to row
          ;==============================
          :AddDelimiter
          ^!If ^%c%<^%MaxColumn% ^!Set %Field%=^%Field%^%DelimiterOut%
          ^!Set %RowData%=^%RowData%^%Field%

          ;=================
          ; Get next column
          ;=================
          ^!Inc %c%
          ^!If ^%c% <= ^%MaxColumn% ColumnLoop

          ;==============
          ; Get next row
          ;==============
          ^!Set %RowData%=^%EncloseL%^%RowData%^%EncloseR%^%EndOfLine%^%NL%
          ^!Set %Table%=^%Table%^%RowData%
          ^!Inc %r%
          ^!If ^%r% <= ^%MaxRow% RowLoop

          ;===================================
          ; Write edited data to new document
          ;===================================
          ^!Toolbar New Document
          ^!InsertText ^%Table%
        • buralex@gmail.com
          Don - HtmlFixIt.com said on Dec 06, 2007 7:14 ... Alec not Alex :-) ... ^!replace ( r n)+ r n rwais or ^!replace R+ r n
          Message 4 of 17 , Dec 6, 2007
          • 0 Attachment
            "Don - HtmlFixIt.com" <don@...> said on Dec 06, 2007 7:14
            -0500 (in part):
            > good start Alex ...
            Alec not Alex :-)
            > ; remove all empty lines
            > ; how to do with regex? anyone?
            > :Loop
            > ^!Replace "^P^P" >> "" ACIWS
            > ^!IfError Next ELSE Loop
            ^!replace "(\r\n)+" >> "\r\n" rwais

            or

            ^!replace "\R+" >> "\r\n" rwais

            ie. change one or more occurrences of <RETURN> to just one on entire
            document. Your "^P^P" requires extra passes based on whether there is
            odd or even numbers of extra blank lines. Since one by itself on the end
            of the line with non-blank on next line is OK it *might* be faster to
            use: "(\r\n){2,}" or "\R{2,} but Sheri is the only one who might be able
            to answer that authoritatively :-)

            btw: (someone else asked about use of flags in
            replace/find/regex/non-regex)
            I always include "i"-case insensitive flag even when not needed as in
            above two lines.
            And *almost* always "w"-whole doc, not just from cursor, "a"-all
            occurrences in line not just first

            btw: Mike - does it make any difference wrt. your end use of the
            converted spreadsheet whether numeric fields are un-quoted or not, or
            whether or not extra trailing null fields (TABS) get converted to
            unnecessary sets of (,"","","")
            My perhaps erroneous belief had always been that CSV requires
            surrounding quote marks only when a field actually contains interior
            comma(s).
            Isn't:

            * bob,alec, dave jones, bill is just as valid as
            * "bob","alec", "dave jones","bill"?

            Regards ... Alec -- buralex-gmail
            --



            [Non-text portions of this message have been removed]
          • Sheri
            ... It would be more efficient IMO to replace two or more with one than to replace one or more with one. It would be doing unnecessary work if you match and
            Message 5 of 17 , Dec 7, 2007
            • 0 Attachment
              --- In ntb-clips@yahoogroups.com, buralex@... wrote:
              >
              > "Don - HtmlFixIt.com" <don@...> said on Dec 06, 2007 7:14
              > -0500 (in part):
              > > good start Alex ...
              > Alec not Alex :-)
              > > ; remove all empty lines
              > > ; how to do with regex? anyone?
              > > :Loop
              > > ^!Replace "^P^P" >> "" ACIWS
              > > ^!IfError Next ELSE Loop
              > ^!replace "(\r\n)+" >> "\r\n" rwais
              >
              > or
              >
              > ^!replace "\R+" >> "\r\n" rwais
              >
              > ie. change one or more occurrences of <RETURN> to just one on
              > entire document. Your "^P^P" requires extra passes based on
              > whether there is odd or even numbers of extra blank lines. Since
              > one by itself on the end of the line with non-blank on next line
              > is OK it *might* be faster to use: "(\r\n){2,}" or "\R{2,} but
              > Sheri is the only one who might be able to answer that
              > authoritatively :-)

              It would be more efficient IMO to replace two or more with one than to
              replace one or more with one. It would be doing unnecessary work if
              you match and replace the single occurrences. However the time
              difference would probably not be noticeable.

              Regards,
              Sheri
            • Don - HtmlFixIt.com
              ... Actually I tried this very combination before writing the above. It does not work! If you have a blank line as the last line in the file, it is not
              Message 6 of 17 , Dec 7, 2007
              • 0 Attachment
                Sheri wrote:
                > --- In ntb-clips@yahoogroups.com, buralex@... wrote:
                >> "Don - HtmlFixIt.com" <don@...> said on Dec 06, 2007 7:14
                >> -0500 (in part):
                >>> good start Alex ...
                >> Alec not Alex :-)
                >>> ; remove all empty lines
                >>> ; how to do with regex? anyone?
                >>> :Loop
                >>> ^!Replace "^P^P" >> "" ACIWS
                >>> ^!IfError Next ELSE Loop


                >> ^!replace "(\r\n)+" >> "\r\n" rwais
                >>

                Actually I tried this very combination before writing the above. It
                does not work! If you have a blank line as the last line in the file,
                it is not removed. Even though it shows two paragraphs symbols in a
                row. Mine doesn't work either (the ^P^P) will not remove the last blank
                line. Perhaps this is a bug?

                I sometimes use this at the end of the file:
                :ConfirmNoBlankLine
                ^!Jump Doc_End
                ^!Select Bol
                ^!If "^$GetSelection$" <> "" DoIt
                ^!Keyboard BACKSPACE
              • Flo
                ... Don, The A and z sequences are doing the job. The possible positions of CR/NL are: - at doc start matched with A - a CR/NL followed by CR/NL - at doc
                Message 7 of 17 , Dec 8, 2007
                • 0 Attachment
                  --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...>
                  wrote:

                  > ^!replace "(\r\n)+" >> "\r\n" rwais
                  > Actually I tried this very combination before writing the above.
                  > It does not work! If you have a blank line as the last line in
                  > the file, it is not removed...

                  Don,

                  The \A and \z sequences are doing the job. The possible positions of
                  CR/NL are:

                  - at doc start matched with \A
                  - a CR/NL followed by CR/NL
                  - at doc end matched with \z

                  This will remove double CR/NL at any position:

                  ^!Replace "\R(?=\R)|\A\R|\R\z" >> "" AWRS

                  Regards,
                  Flo
                • Sheri
                  ... The end-of-file marker is not a line break. A line break is a series of actual control characters (carriage return and line feed). They can actually be
                  Message 8 of 17 , Dec 8, 2007
                  • 0 Attachment
                    Flo wrote:
                    > --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...>
                    > wrote:
                    >
                    >
                    >> ^!replace "(\r\n)+" >> "\r\n" rwais
                    >> Actually I tried this very combination before writing the above.
                    >> It does not work! If you have a blank line as the last line in
                    >> the file, it is not removed...
                    >>
                    >
                    > Don,
                    >
                    > The \A and \z sequences are doing the job. The possible positions of
                    > CR/NL are:
                    >
                    > - at doc start matched with \A
                    > - a CR/NL followed by CR/NL
                    > - at doc end matched with \z
                    >
                    > This will remove double CR/NL at any position:
                    >
                    > ^!Replace "\R(?=\R)|\A\R|\R\z" >> "" AWRS
                    >
                    > Regards,
                    > Flo
                    >
                    >

                    The end-of-file marker is not a line break. A line break is a series of
                    actual control characters (carriage return and line feed). They can
                    actually be selected, copied, pasted, etc. NoteTab Pro has a feature to
                    display hidden characters, but it makes line breaks appear to be one
                    character when it is actually two.

                    It is usually best if each line in the document is terminated with a
                    carriage return/line feed. That means there will hopefully be no text on
                    the "line" containing the end-of-file marker.

                    When the last line of text is followed immediately by the end-of-file
                    marker, clips to process that file often require special processing just
                    for the last line.

                    Regards,
                    Sheri
                  • Don - HtmlFixIt.com
                    Ok why does a guy named Alec have Alex in his email address? Anyway, you are correct, the quotes are only needed in comma separated value files when/if there
                    Message 9 of 17 , Dec 8, 2007
                    • 0 Attachment
                      Ok why does a guy named Alec have Alex in his email address? Anyway, you
                      are correct, the quotes are only needed in comma separated value files
                      when/if there are delimiters (usually commas) in the data itself.
                      However having extra quotes causes no problems.

                      > btw: Mike - does it make any difference wrt. your end use of the
                      > converted spreadsheet whether numeric fields are un-quoted or not, or
                      > whether or not extra trailing null fields (TABS) get converted to
                      > unnecessary sets of (,"","","")
                      > My perhaps erroneous belief had always been that CSV requires
                      > surrounding quote marks only when a field actually contains interior
                      > comma(s).
                      > Isn't:
                      >
                      > * bob,alec, dave jones, bill is just as valid as
                      > * "bob","alec", "dave jones","bill"?
                      >
                      > Regards ... Alec -- buralex-gmail
                    • Don - HtmlFixIt.com
                      ... Thanks Sheri and Flo (and AleC) for moving this discussion along. Flo that removes all empty lines as promised. That is one I need to save. Perhaps,
                      Message 10 of 17 , Dec 8, 2007
                      • 0 Attachment
                        Flo wrote:
                        >> - at doc start matched with \A
                        >> - a CR/NL followed by CR/NL
                        >> - at doc end matched with \z
                        >>
                        >> This will remove double CR/NL at any position:
                        >>
                        >> ^!Replace "\R(?=\R)|\A\R|\R\z" >> "" AWRS
                        >>
                        >> Regards,
                        >> Flo
                        >>
                        >>
                        > Sheri wrote:
                        > The end-of-file marker is not a line break. A line break is a series of
                        > actual control characters (carriage return and line feed). They can
                        > actually be selected, copied, pasted, etc. NoteTab Pro has a feature to
                        > display hidden characters, but it makes line breaks appear to be one
                        > character when it is actually two.
                        >
                        > It is usually best if each line in the document is terminated with a
                        > carriage return/line feed. That means there will hopefully be no text on
                        > the "line" containing the end-of-file marker.
                        >
                        > When the last line of text is followed immediately by the end-of-file
                        > marker, clips to process that file often require special processing just
                        > for the last line.
                        >
                        > Regards,
                        > Sheri
                        >

                        Thanks Sheri and Flo (and AleC) for moving this discussion along.

                        Flo that removes all "empty lines" as promised. That is one I need to
                        save. Perhaps, when showing hidden characters, notetab should really
                        reflect the last "return" as a file end mark of some type to distinguish
                        it from another return? I'll be honest I have never thought about it
                        because I am only now beginning to understand and use regex. I always
                        did it manually. But it all makes sense now, that last hidden character
                        is really a file end (the \z) and returns are really two characters the
                        (\r\n) as I caught on to a while back when I started using regex. I
                        guess I have always known there was a file end but never gave it a
                        second thought. In our context the issue is whether the file end is on
                        the last line or on the next line then if I have it right now.

                        Don

                        saved here: http://htmlfixit.com/blog/?p=361
                      • Sheri
                        ... Hi Don, Not to nit pick with your blog, but the end-of-file marker is not a character. z is an assertion for the position of end of the file. It has a
                        Message 11 of 17 , Dec 8, 2007
                        • 0 Attachment
                          Don - HtmlFixIt.com wrote:
                          > Flo wrote:
                          >
                          >>> - at doc start matched with \A
                          >>> - a CR/NL followed by CR/NL
                          >>> - at doc end matched with \z
                          >>>
                          >>> This will remove double CR/NL at any position:
                          >>>
                          >>> ^!Replace "\R(?=\R)|\A\R|\R\z" >> "" AWRS
                          >>>
                          >>> Regards,
                          >>> Flo
                          >>>
                          >>>
                          >>>
                          >> Sheri wrote:
                          >> The end-of-file marker is not a line break. A line break is a series of
                          >> actual control characters (carriage return and line feed). They can
                          >> actually be selected, copied, pasted, etc. NoteTab Pro has a feature to
                          >> display hidden characters, but it makes line breaks appear to be one
                          >> character when it is actually two.
                          >>
                          >> It is usually best if each line in the document is terminated with a
                          >> carriage return/line feed. That means there will hopefully be no text on
                          >> the "line" containing the end-of-file marker.
                          >>
                          >> When the last line of text is followed immediately by the end-of-file
                          >> marker, clips to process that file often require special processing just
                          >> for the last line.
                          >>
                          >> Regards,
                          >> Sheri
                          >>
                          >>
                          >
                          > Thanks Sheri and Flo (and AleC) for moving this discussion along.
                          >
                          > Flo that removes all "empty lines" as promised. That is one I need to
                          > save. Perhaps, when showing hidden characters, notetab should really
                          > reflect the last "return" as a file end mark of some type to distinguish
                          > it from another return? I'll be honest I have never thought about it
                          > because I am only now beginning to understand and use regex. I always
                          > did it manually. But it all makes sense now, that last hidden character
                          > is really a file end (the \z) and returns are really two characters the
                          > (\r\n) as I caught on to a while back when I started using regex. I
                          > guess I have always known there was a file end but never gave it a
                          > second thought. In our context the issue is whether the file end is on
                          > the last line or on the next line then if I have it right now.
                          >
                          > Don
                          >
                          > saved here: http://htmlfixit.com/blog/?p=361
                          >
                          >
                          Hi Don,

                          Not to nit pick with your blog, but the end-of-file marker is not a
                          character. \z is an assertion for the position of end of the file. It
                          has a width of zero characters. Ditto for \A, \Z, \z, ^ and $.

                          Regards,
                          Sheri
                        • buralex@gmail.com
                          Don - HtmlFixIt.com said on Dec 08, 2007 10:39 ... My given name is Alexander but my parents always called me Alec after my uncle and
                          Message 12 of 17 , Dec 8, 2007
                          • 0 Attachment
                            "Don - HtmlFixIt.com" <don@...> said on Dec 08, 2007 10:39
                            -0500 (in part):
                            > Ok why does a guy named Alec have Alex in his email address?
                            My given name is Alexander but my parents always called me Alec after my
                            uncle and grandfather.
                            More recently ... I have an email filter that puts any email I receive
                            that contains the string "alec" in to a "Look-at-me first" folder. So I
                            stick the alex in my email address as BURgess+ALEX. If I'd used BURALEC
                            then all the newsletters that contain stuff like : "you subscribed a
                            buralex@..." would be mistakenly filtered to the "Look-at-me
                            first" folder.

                            Regards ... Alec -- buralex-gmail
                            --



                            [Non-text portions of this message have been removed]
                          Your message has been successfully submitted and would be delivered to recipients shortly.