Loading ...
Sorry, an error occurred while loading the content.

Re: Create CSV file

Expand Messages
  • Sheri
    ... Here s one that quotes the nonnumeric fields ... This will convert the specific example, including quoting of nonnumeric fields. It allows commas and
    Message 1 of 17 , Dec 6, 2007
    • 0 Attachment
      --- In ntb-clips@yahoogroups.com, "mlafount" <mlafount@...> wrote:
      >
      > Hi All,
      >
      > I'm looking for a clip that someone has probably already written :-)
      >
      > Let's say I got a string of tab delimited data like this:
      >
      > ABD072132 52 LaFountain, Michael
      > ABD073243 49 Smith, Sherri
      > ABD074354 12 Johnson, Marci
      >
      > I need a clip that will convert it to a CSV file like this:
      >
      > 'ABD072132', 52, 'LaFountain, Michael'
      > 'ABD073243', 49, 'Smith, Sherri'
      > 'ABD074354', 12, 'Johnson, Marci'

      Here's one that quotes the nonnumeric fields
      >
      > The data above is just an example. I would like the clip to be
      > intelligent enough to recognize or be told a delimiter, add quotes
      > to any number of alpha fields and replace the delimiter with a comma
      > and space.
      >
      > Anyone got something like this laying around?
      >

      This will convert the specific example, including quoting of
      nonnumeric fields. It allows commas and periods in the numeric fields
      and numeric fields can start with a minus sign.

      Regards,
      Sheri

      ^!Replace "(^|\t)\K[^\t\r\n][^\t\r\n]*(?=(\t|$))" >> "'$0'" RAWS
      ^!Replace "(?:^|\t)\K'(\-?(?:[\d,.]+))'" >> "$1" RAWS
      ^!Replace "^T" >> ", " AWS
    • mlafount
      Thanks for all the suggestions. I really needed something that is fairly flexible. I often need to move data from an excel spreadsheet to a mainframe. The
      Message 2 of 17 , Dec 6, 2007
      • 0 Attachment
        Thanks for all the suggestions. I really needed something that is
        fairly flexible. I often need to move data from an excel spreadsheet
        to a mainframe. The easiest way I have found is via an SQL insert
        statement. Excel will let you mess with text qualifiers and other
        parameters when importing a file but it is limited when saving a CSV
        file. So, I needed the following clip.

        The idea is to copy all of the columns from a spreadsheet into
        NoteTab. This creates a tab delimited document. Run the clip and it
        creates a new document based on the formatting parameters. The first
        line is used as a model for the formatting of the subsequent lines.
        If the columns in the first line are: Alpha, Alpha, Numeric, Alpha,
        Numeric then all of the other lines will be formatted like the first
        line. If a row is missing fields, they will be filled and delimited
        as nulls. If there are extra columns in a row, they will be
        truncated.

        Maybe someone else will find it useful too.

        -Mike LaFountain


        Test Data:
        aaaaaaa bbbbbbb 111 ddd 4444 ffff
        bbbbbbb cccccc -222
        ddd eeee 333 gggg -6666 jjj kkkk 1211
        cccccc ddd 333 ffff -6666 hhh


        Clip:
        ; CREATE CSV

        ;=====================
        ; Get input variables
        ;=====================
        ^!Continue This clip will take the active tab (which should be a
        delimited file) and format it according to the selected options. The
        first row is used as a model for all columns. Continue?
        ^!ClearVariables
        ^!Set %LP%=(; %LB%=[; %LC%={; %LN=^%Empty%
        ^!Set %DelimiterIn%=^?[--Input Options-- Field delimiter in=_Tab^=^%
        TAB%|Comma^=,|Pipe^=^%VBar%|Semi-colon^=;|Space^=^%SPACE%|Equal^==]
        ^!Set %DelimiterOut%=^?[--Output Options-- Field delimiter=Tab^=^%
        TAB%|_Comma^=,|Comma/Space^=,^%Space%|Pipe^=^%VBar%|Semi-
        colon^=;|Space^=^%SPACE%|Equal^==]
        ^!Set %TxtQual%=^?[Text qualifier=_Apostrophe^='|Quote^="|None^=^%
        Empty%]
        ^!Set %NullAlpha%=^?[Null alpha field=_Null^=^%Empty%|Blank^=^%
        Space%]
        ^!Set %NullNumeric%=^?[Null numeric field=_Null^=^%Empty%|Zero^=0]
        ^!Set %Negative%=^?[Negative sign before or after
        number=Before^=B|_After^=A]
        ^!Set %EncloseL%=^?[Enclose Line=No^=^%LN%|_Parentheses^=^%LP%
        |Brackets^=^%LB%|Braces^=^%LC%]
        ^!Set %EndOfLine%=^?[Add to end of line=None^=^%Empty%|Tab^=^%TAB%
        |Comma^=,|_Space/Comma/Space^=^%Space%,^%Space%|Pipe^=^%VBar%|Semi-
        colon^=;|Space^=^%SPACE%|Equal^==]
        ^!IfCancel Exit

        ;================
        ; Set Delimiters
        ;================
        ^!SetListDelimiter=^%DelimiterIn%
        ^!SetDelimiter=^%DelimiterIn%

        ;===========================================
        ; Get maximum row count, set Row and Column
        ; Setup "right side" of enclose line
        ;===========================================
        ^!Set %MaxRow%=^$GetTextLineCount$
        ^!Set %r%=1; %c%=1
        ^!If ^%EncloseL%=^%LN% ^!Set %EncloseR%=^%LN%
        ^!If ^%EncloseL%=^%LP% ^!Set %EncloseR%=)
        ^!If ^%EncloseL%=^%LB% ^!Set %EncloseR%=]
        ^!If ^%EncloseL%=^%LC% ^!Set %EncloseR%=}

        ;================
        ; Get first line
        ;================
        ^!Jump Doc_Start
        ^!Select Line
        ^!SetArray %Format%=^$GetSelection$
        ^!Set %MaxColumn%=^%Format0%
        ^!Select 0

        ;====================================================================
        =
        ; Capture format from first line for each column (alpha or numeric).
        ; All subsequent lines will be forced to match the first line
        format.
        ;====================================================================
        =
        :FormatLoop
        ^!If ^$IsNumber(^%Format^%c%%)$ = 1 FormatN Else FormatA
        :FormatN
        ^!Set %Format^%c%%=N
        ^!Goto FormatC
        :FormatA
        ^!Set %Format^%c%%=A
        :FormatC
        ^!Inc %c%
        ^!If ^%c% <= ^%MaxColumn% FormatLoop

        ;====================
        ; Loop thru each row
        ;====================
        ^!Set %r%=1; %c%=1
        :RowLoop
        ^!Set %RowData%=^%Empty%

        ;=====================================
        ; Loop thru each column and get field
        ;=====================================
        ^!Set %c%=1
        :ColumnLoop
        ^!Set %Field%=^$GetField(^%r%;^%c%)$
        ^!If ^%Format^%c%=A FormatAlpha Else FormatNumeric

        ;=======================
        ; Format field as alpha
        ;=======================
        :FormatAlpha
        ^!Set %Field%=^%TxtQual%^%Field%^%TxtQual%
        ^!If ^%Field%=^%TxtQual%^%TxtQual% ^!Set %Field%=^%TxtQual%^%
        NullAlpha%^%TxtQual%
        ^!Goto AddDelimiter

        ;=========================
        ; Format field as numeric
        ;=========================
        :FormatNumeric
        ^!If ^%Field%=^%Empty% ^!Set %Field%=^%NullNumeric%
        ^!If ^%Field%<0 Next Else AddDelimiter
        ^!If ^%Field%=^%Empty% AddDelimiter
        ^!If ^%Negative%=B AddDelimiter
        ^!Set %Field%=^$Calc(^%Field%*-1)$-

        ;==============================
        ; Add delimiter, append to row
        ;==============================
        :AddDelimiter
        ^!If ^%c%<^%MaxColumn% ^!Set %Field%=^%Field%^%DelimiterOut%
        ^!Set %RowData%=^%RowData%^%Field%

        ;=================
        ; Get next column
        ;=================
        ^!Inc %c%
        ^!If ^%c% <= ^%MaxColumn% ColumnLoop

        ;==============
        ; Get next row
        ;==============
        ^!Set %RowData%=^%EncloseL%^%RowData%^%EncloseR%^%EndOfLine%^%NL%
        ^!Set %Table%=^%Table%^%RowData%
        ^!Inc %r%
        ^!If ^%r% <= ^%MaxRow% RowLoop

        ;===================================
        ; Write edited data to new document
        ;===================================
        ^!Toolbar New Document
        ^!InsertText ^%Table%
      • buralex@gmail.com
        Don - HtmlFixIt.com said on Dec 06, 2007 7:14 ... Alec not Alex :-) ... ^!replace ( r n)+ r n rwais or ^!replace R+ r n
        Message 3 of 17 , Dec 6, 2007
        • 0 Attachment
          "Don - HtmlFixIt.com" <don@...> said on Dec 06, 2007 7:14
          -0500 (in part):
          > good start Alex ...
          Alec not Alex :-)
          > ; remove all empty lines
          > ; how to do with regex? anyone?
          > :Loop
          > ^!Replace "^P^P" >> "" ACIWS
          > ^!IfError Next ELSE Loop
          ^!replace "(\r\n)+" >> "\r\n" rwais

          or

          ^!replace "\R+" >> "\r\n" rwais

          ie. change one or more occurrences of <RETURN> to just one on entire
          document. Your "^P^P" requires extra passes based on whether there is
          odd or even numbers of extra blank lines. Since one by itself on the end
          of the line with non-blank on next line is OK it *might* be faster to
          use: "(\r\n){2,}" or "\R{2,} but Sheri is the only one who might be able
          to answer that authoritatively :-)

          btw: (someone else asked about use of flags in
          replace/find/regex/non-regex)
          I always include "i"-case insensitive flag even when not needed as in
          above two lines.
          And *almost* always "w"-whole doc, not just from cursor, "a"-all
          occurrences in line not just first

          btw: Mike - does it make any difference wrt. your end use of the
          converted spreadsheet whether numeric fields are un-quoted or not, or
          whether or not extra trailing null fields (TABS) get converted to
          unnecessary sets of (,"","","")
          My perhaps erroneous belief had always been that CSV requires
          surrounding quote marks only when a field actually contains interior
          comma(s).
          Isn't:

          * bob,alec, dave jones, bill is just as valid as
          * "bob","alec", "dave jones","bill"?

          Regards ... Alec -- buralex-gmail
          --



          [Non-text portions of this message have been removed]
        • Sheri
          ... It would be more efficient IMO to replace two or more with one than to replace one or more with one. It would be doing unnecessary work if you match and
          Message 4 of 17 , Dec 7, 2007
          • 0 Attachment
            --- In ntb-clips@yahoogroups.com, buralex@... wrote:
            >
            > "Don - HtmlFixIt.com" <don@...> said on Dec 06, 2007 7:14
            > -0500 (in part):
            > > good start Alex ...
            > Alec not Alex :-)
            > > ; remove all empty lines
            > > ; how to do with regex? anyone?
            > > :Loop
            > > ^!Replace "^P^P" >> "" ACIWS
            > > ^!IfError Next ELSE Loop
            > ^!replace "(\r\n)+" >> "\r\n" rwais
            >
            > or
            >
            > ^!replace "\R+" >> "\r\n" rwais
            >
            > ie. change one or more occurrences of <RETURN> to just one on
            > entire document. Your "^P^P" requires extra passes based on
            > whether there is odd or even numbers of extra blank lines. Since
            > one by itself on the end of the line with non-blank on next line
            > is OK it *might* be faster to use: "(\r\n){2,}" or "\R{2,} but
            > Sheri is the only one who might be able to answer that
            > authoritatively :-)

            It would be more efficient IMO to replace two or more with one than to
            replace one or more with one. It would be doing unnecessary work if
            you match and replace the single occurrences. However the time
            difference would probably not be noticeable.

            Regards,
            Sheri
          • Don - HtmlFixIt.com
            ... Actually I tried this very combination before writing the above. It does not work! If you have a blank line as the last line in the file, it is not
            Message 5 of 17 , Dec 7, 2007
            • 0 Attachment
              Sheri wrote:
              > --- In ntb-clips@yahoogroups.com, buralex@... wrote:
              >> "Don - HtmlFixIt.com" <don@...> said on Dec 06, 2007 7:14
              >> -0500 (in part):
              >>> good start Alex ...
              >> Alec not Alex :-)
              >>> ; remove all empty lines
              >>> ; how to do with regex? anyone?
              >>> :Loop
              >>> ^!Replace "^P^P" >> "" ACIWS
              >>> ^!IfError Next ELSE Loop


              >> ^!replace "(\r\n)+" >> "\r\n" rwais
              >>

              Actually I tried this very combination before writing the above. It
              does not work! If you have a blank line as the last line in the file,
              it is not removed. Even though it shows two paragraphs symbols in a
              row. Mine doesn't work either (the ^P^P) will not remove the last blank
              line. Perhaps this is a bug?

              I sometimes use this at the end of the file:
              :ConfirmNoBlankLine
              ^!Jump Doc_End
              ^!Select Bol
              ^!If "^$GetSelection$" <> "" DoIt
              ^!Keyboard BACKSPACE
            • Flo
              ... Don, The A and z sequences are doing the job. The possible positions of CR/NL are: - at doc start matched with A - a CR/NL followed by CR/NL - at doc
              Message 6 of 17 , Dec 8, 2007
              • 0 Attachment
                --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...>
                wrote:

                > ^!replace "(\r\n)+" >> "\r\n" rwais
                > Actually I tried this very combination before writing the above.
                > It does not work! If you have a blank line as the last line in
                > the file, it is not removed...

                Don,

                The \A and \z sequences are doing the job. The possible positions of
                CR/NL are:

                - at doc start matched with \A
                - a CR/NL followed by CR/NL
                - at doc end matched with \z

                This will remove double CR/NL at any position:

                ^!Replace "\R(?=\R)|\A\R|\R\z" >> "" AWRS

                Regards,
                Flo
              • Sheri
                ... The end-of-file marker is not a line break. A line break is a series of actual control characters (carriage return and line feed). They can actually be
                Message 7 of 17 , Dec 8, 2007
                • 0 Attachment
                  Flo wrote:
                  > --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...>
                  > wrote:
                  >
                  >
                  >> ^!replace "(\r\n)+" >> "\r\n" rwais
                  >> Actually I tried this very combination before writing the above.
                  >> It does not work! If you have a blank line as the last line in
                  >> the file, it is not removed...
                  >>
                  >
                  > Don,
                  >
                  > The \A and \z sequences are doing the job. The possible positions of
                  > CR/NL are:
                  >
                  > - at doc start matched with \A
                  > - a CR/NL followed by CR/NL
                  > - at doc end matched with \z
                  >
                  > This will remove double CR/NL at any position:
                  >
                  > ^!Replace "\R(?=\R)|\A\R|\R\z" >> "" AWRS
                  >
                  > Regards,
                  > Flo
                  >
                  >

                  The end-of-file marker is not a line break. A line break is a series of
                  actual control characters (carriage return and line feed). They can
                  actually be selected, copied, pasted, etc. NoteTab Pro has a feature to
                  display hidden characters, but it makes line breaks appear to be one
                  character when it is actually two.

                  It is usually best if each line in the document is terminated with a
                  carriage return/line feed. That means there will hopefully be no text on
                  the "line" containing the end-of-file marker.

                  When the last line of text is followed immediately by the end-of-file
                  marker, clips to process that file often require special processing just
                  for the last line.

                  Regards,
                  Sheri
                • Don - HtmlFixIt.com
                  Ok why does a guy named Alec have Alex in his email address? Anyway, you are correct, the quotes are only needed in comma separated value files when/if there
                  Message 8 of 17 , Dec 8, 2007
                  • 0 Attachment
                    Ok why does a guy named Alec have Alex in his email address? Anyway, you
                    are correct, the quotes are only needed in comma separated value files
                    when/if there are delimiters (usually commas) in the data itself.
                    However having extra quotes causes no problems.

                    > btw: Mike - does it make any difference wrt. your end use of the
                    > converted spreadsheet whether numeric fields are un-quoted or not, or
                    > whether or not extra trailing null fields (TABS) get converted to
                    > unnecessary sets of (,"","","")
                    > My perhaps erroneous belief had always been that CSV requires
                    > surrounding quote marks only when a field actually contains interior
                    > comma(s).
                    > Isn't:
                    >
                    > * bob,alec, dave jones, bill is just as valid as
                    > * "bob","alec", "dave jones","bill"?
                    >
                    > Regards ... Alec -- buralex-gmail
                  • Don - HtmlFixIt.com
                    ... Thanks Sheri and Flo (and AleC) for moving this discussion along. Flo that removes all empty lines as promised. That is one I need to save. Perhaps,
                    Message 9 of 17 , Dec 8, 2007
                    • 0 Attachment
                      Flo wrote:
                      >> - at doc start matched with \A
                      >> - a CR/NL followed by CR/NL
                      >> - at doc end matched with \z
                      >>
                      >> This will remove double CR/NL at any position:
                      >>
                      >> ^!Replace "\R(?=\R)|\A\R|\R\z" >> "" AWRS
                      >>
                      >> Regards,
                      >> Flo
                      >>
                      >>
                      > Sheri wrote:
                      > The end-of-file marker is not a line break. A line break is a series of
                      > actual control characters (carriage return and line feed). They can
                      > actually be selected, copied, pasted, etc. NoteTab Pro has a feature to
                      > display hidden characters, but it makes line breaks appear to be one
                      > character when it is actually two.
                      >
                      > It is usually best if each line in the document is terminated with a
                      > carriage return/line feed. That means there will hopefully be no text on
                      > the "line" containing the end-of-file marker.
                      >
                      > When the last line of text is followed immediately by the end-of-file
                      > marker, clips to process that file often require special processing just
                      > for the last line.
                      >
                      > Regards,
                      > Sheri
                      >

                      Thanks Sheri and Flo (and AleC) for moving this discussion along.

                      Flo that removes all "empty lines" as promised. That is one I need to
                      save. Perhaps, when showing hidden characters, notetab should really
                      reflect the last "return" as a file end mark of some type to distinguish
                      it from another return? I'll be honest I have never thought about it
                      because I am only now beginning to understand and use regex. I always
                      did it manually. But it all makes sense now, that last hidden character
                      is really a file end (the \z) and returns are really two characters the
                      (\r\n) as I caught on to a while back when I started using regex. I
                      guess I have always known there was a file end but never gave it a
                      second thought. In our context the issue is whether the file end is on
                      the last line or on the next line then if I have it right now.

                      Don

                      saved here: http://htmlfixit.com/blog/?p=361
                    • Sheri
                      ... Hi Don, Not to nit pick with your blog, but the end-of-file marker is not a character. z is an assertion for the position of end of the file. It has a
                      Message 10 of 17 , Dec 8, 2007
                      • 0 Attachment
                        Don - HtmlFixIt.com wrote:
                        > Flo wrote:
                        >
                        >>> - at doc start matched with \A
                        >>> - a CR/NL followed by CR/NL
                        >>> - at doc end matched with \z
                        >>>
                        >>> This will remove double CR/NL at any position:
                        >>>
                        >>> ^!Replace "\R(?=\R)|\A\R|\R\z" >> "" AWRS
                        >>>
                        >>> Regards,
                        >>> Flo
                        >>>
                        >>>
                        >>>
                        >> Sheri wrote:
                        >> The end-of-file marker is not a line break. A line break is a series of
                        >> actual control characters (carriage return and line feed). They can
                        >> actually be selected, copied, pasted, etc. NoteTab Pro has a feature to
                        >> display hidden characters, but it makes line breaks appear to be one
                        >> character when it is actually two.
                        >>
                        >> It is usually best if each line in the document is terminated with a
                        >> carriage return/line feed. That means there will hopefully be no text on
                        >> the "line" containing the end-of-file marker.
                        >>
                        >> When the last line of text is followed immediately by the end-of-file
                        >> marker, clips to process that file often require special processing just
                        >> for the last line.
                        >>
                        >> Regards,
                        >> Sheri
                        >>
                        >>
                        >
                        > Thanks Sheri and Flo (and AleC) for moving this discussion along.
                        >
                        > Flo that removes all "empty lines" as promised. That is one I need to
                        > save. Perhaps, when showing hidden characters, notetab should really
                        > reflect the last "return" as a file end mark of some type to distinguish
                        > it from another return? I'll be honest I have never thought about it
                        > because I am only now beginning to understand and use regex. I always
                        > did it manually. But it all makes sense now, that last hidden character
                        > is really a file end (the \z) and returns are really two characters the
                        > (\r\n) as I caught on to a while back when I started using regex. I
                        > guess I have always known there was a file end but never gave it a
                        > second thought. In our context the issue is whether the file end is on
                        > the last line or on the next line then if I have it right now.
                        >
                        > Don
                        >
                        > saved here: http://htmlfixit.com/blog/?p=361
                        >
                        >
                        Hi Don,

                        Not to nit pick with your blog, but the end-of-file marker is not a
                        character. \z is an assertion for the position of end of the file. It
                        has a width of zero characters. Ditto for \A, \Z, \z, ^ and $.

                        Regards,
                        Sheri
                      • buralex@gmail.com
                        Don - HtmlFixIt.com said on Dec 08, 2007 10:39 ... My given name is Alexander but my parents always called me Alec after my uncle and
                        Message 11 of 17 , Dec 8, 2007
                        • 0 Attachment
                          "Don - HtmlFixIt.com" <don@...> said on Dec 08, 2007 10:39
                          -0500 (in part):
                          > Ok why does a guy named Alec have Alex in his email address?
                          My given name is Alexander but my parents always called me Alec after my
                          uncle and grandfather.
                          More recently ... I have an email filter that puts any email I receive
                          that contains the string "alec" in to a "Look-at-me first" folder. So I
                          stick the alex in my email address as BURgess+ALEX. If I'd used BURALEC
                          then all the newsletters that contain stuff like : "you subscribed a
                          buralex@..." would be mistakenly filtered to the "Look-at-me
                          first" folder.

                          Regards ... Alec -- buralex-gmail
                          --



                          [Non-text portions of this message have been removed]
                        Your message has been successfully submitted and would be delivered to recipients shortly.