Loading ...
Sorry, an error occurred while loading the content.
 

CSV

Expand Messages
  • johnta1
    I have a csv file that I want to delete a , in the 5th and 7th position and replace with a space. Example: 01 , 111 , some text , more text , text before
    Message 1 of 16 , Jan 28, 2010
      I have a csv file that I want to delete a "," in the 5th and 7th position and replace with a space.

      Example:
      "01","111","some text","more text","text before replace","text after replace","extra text before replace","extra text after replace","text","text2"


      What it should look like:
      "01","111","some text","more text","text before replace text after replace","extra text before replace extra text after replace","text","text2"

      How do you replace them, by count?
      Or use an array?

      Have about 5000 lines like this in the list.

      Thanks,
      John.
    • diodeom
      Here s an approach that reproduces your example: ^!Replace ^(.+? , ){4}.+? K , WARS ^!Replace ^(.+? , ){5}.+? K , WARS
      Message 2 of 16 , Jan 28, 2010
        Here's an approach that reproduces your example:

        ^!Replace "^(.+?","){4}.+?\K","" >> " " WARS
        ^!Replace "^(.+?","){5}.+?\K","" >> " " WARS

        --- In ntb-clips@yahoogroups.com, "johnta1" <johnta1@...> wrote:
        >
        > I have a csv file that I want to delete a "," in the 5th and 7th position and replace with a space.
        >
        > Example:
        > "01","111","some text","more text","text before replace","text after replace","extra text before replace","extra text after replace","text","text2"
        >
        >
        > What it should look like:
        > "01","111","some text","more text","text before replace text after replace","extra text before replace extra text after replace","text","text2"
        >
        > How do you replace them, by count?
        > Or use an array?
        >
        > Have about 5000 lines like this in the list.
        >
        > Thanks,
        > John.
        >
      • diodeom
        Or this: ^!Replace ^(.+? , ){4}.+? K , (.+? , .+?) , $2 WARS
        Message 3 of 16 , Jan 28, 2010
          Or this:

          ^!Replace "^(.+?","){4}.+?\K","(.+?",".+?)","" >> " $2 " WARS

          --- In ntb-clips@yahoogroups.com, "diodeom" <diomir@...> wrote:
          >
          > Here's an approach that reproduces your example:
          >
          > ^!Replace "^(.+?","){4}.+?\K","" >> " " WARS
          > ^!Replace "^(.+?","){5}.+?\K","" >> " " WARS
          >
          > --- In ntb-clips@yahoogroups.com, "johnta1" <johnta1@> wrote:
          > >
          > > I have a csv file that I want to delete a "," in the 5th and 7th position and replace with a space.
          > >
          > > Example:
          > > "01","111","some text","more text","text before replace","text after replace","extra text before replace","extra text after replace","text","text2"
          > >
          > >
          > > What it should look like:
          > > "01","111","some text","more text","text before replace text after replace","extra text before replace extra text after replace","text","text2"
          > >
          > > How do you replace them, by count?
          > > Or use an array?
          > >
          > > Have about 5000 lines like this in the list.
          > >
          > > Thanks,
          > > John.
          > >
          >
        • Don - HtmlFixIt.com
          Cool stuff ... do you mind breaking it out? I learn from these examples. In the first two I mostly get it. ^ at the start of a line (.+? , ) is capturing
          Message 4 of 16 , Jan 28, 2010
            Cool stuff ... do you mind breaking it out? I learn from these
            examples. In the first two I mostly get it.
            ^ at the start of a line
            (.+?",") is capturing anything followed by a comma -- and the ? means
            don't be greedy
            {4} means repeat above four times
            .+? means anything but it cannot be an empty field -- assuming that fits
            the data pattern and you never have the two commas back to back
            \K -- this one I don't know yet ...

            because you put the comma in quotes that is all that is replaced?

            W=whole
            A=all
            R=regex
            S=silent

            diodeom wrote:
            > --- In ntb-clips@yahoogroups.com, "diodeom" <diomir@...> wrote:
            >> Here's an approach that reproduces your example:
            >>
            >> ^!Replace "^(.+?","){4}.+?\K","" >> " " WARS
            >> ^!Replace "^(.+?","){5}.+?\K","" >> " " WARS
          • diodeom
            K allows to diregard whatever was matched prior to its placement. In this case the regex grabs only what comes after, which is the desired instance of ,
            Message 5 of 16 , Jan 28, 2010
              \K allows to diregard whatever was matched prior to its placement. In this case the regex grabs only what comes after, which is the desired instance of "," (where quotes are a vital part of the match) to swap it for a space.

              And to account for possible empty fields * instead of + should make the pattern more universal:

              ^!Replace "^(.*?","){4}.*?\K","" >> " " WARS
              ^!Replace "^(.*?","){5}.*?\K","" >> " " WARS

              --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...> wrote:
              >
              > Cool stuff ... do you mind breaking it out? I learn from these
              > examples. In the first two I mostly get it.
              > ^ at the start of a line
              > (.+?",") is capturing anything followed by a comma -- and the ? means
              > don't be greedy
              > {4} means repeat above four times
              > .+? means anything but it cannot be an empty field -- assuming that fits
              > the data pattern and you never have the two commas back to back
              > \K -- this one I don't know yet ...
              >
              > because you put the comma in quotes that is all that is replaced?
              >
              > W=whole
              > A=all
              > R=regex
              > S=silent
              >
              > diodeom wrote:
              > > --- In ntb-clips@yahoogroups.com, "diodeom" <diomir@> wrote:
              > >> Here's an approach that reproduces your example:
              > >>
              > >> ^!Replace "^(.+?","){4}.+?\K","" >> " " WARS
              > >> ^!Replace "^(.+?","){5}.+?\K","" >> " " WARS
              >
            • Don - HtmlFixIt.com
              That is cool stuff, thanks for explaining. I am still working on RegEx stuff. Probably one of my greater weaknesses, but I m growing. Used to be I d avoid
              Message 6 of 16 , Jan 28, 2010
                That is cool stuff, thanks for explaining. I am still working on RegEx
                stuff. Probably one of my greater weaknesses, but I'm growing. Used to
                be I'd avoid them, now I can write one from scratch often and on the
                first try :-)

                I wonder instead of {4} you could use {^%count} and then inc that and
                repeat it. Have to test that when I get a minute. Would allow you to
                enter a list of delimiters to dump out (in this case I know he had a
                fixed number) and then do them with a one liner.

                diodeom wrote:
                > \K allows to diregard whatever was matched prior to its placement. In this case the regex grabs only what comes after, which is the desired instance of "," (where quotes are a vital part of the match) to swap it for a space.
                >
                > And to account for possible empty fields * instead of + should make the pattern more universal:
                >
                > ^!Replace "^(.*?","){4}.*?\K","" >> " " WARS
                > ^!Replace "^(.*?","){5}.*?\K","" >> " " WARS
              • John Wallace
                Thanks, that is great. Like Don, Ifind this interesting. It worked on all the lines in no time at all. Thanks, again. John _____ Subject: [Clip] Re: CSV Or
                Message 7 of 16 , Jan 28, 2010
                  Thanks, that is great.

                  Like Don, Ifind this interesting.
                  It worked on all the lines in no time at all.

                  Thanks, again.

                  John


                  _____

                  Subject: [Clip] Re: CSV




                  Or this:

                  ^!Replace "^(.+?","){4}.+?\K","(.+?",".+?)","" >> " $2 " WARS

                  --- In ntb-clips@yahoogrou <mailto:ntb-clips%40yahoogroups.com> ps.com,
                  "diodeom" <diomir@...> wrote:
                  >
                  > Here's an approach that reproduces your example:
                  >
                  > ^!Replace "^(.+?","){4}.+?\K","" >> " " WARS
                  > ^!Replace "^(.+?","){5}.+?\K","" >> " " WARS
                  >
                  > --- In ntb-clips@yahoogrou <mailto:ntb-clips%40yahoogroups.com> ps.com,
                  "johnta1" <johnta1@> wrote:
                  > >
                  > > I have a csv file that I want to delete a "," in the 5th and 7th
                  position and replace with a space.
                  > >
                  > > Example:
                  > > "01","111","some text","more text","text before replace","text after
                  replace","extra text before replace","extra text after
                  replace","text","text2"
                  > >
                  > >
                  > > What it should look like:
                  > > "01","111","some text","more text","text before replace text after
                  replace","extra text before replace extra text after replace","text","text2"
                  > >
                  > > How do you replace them, by count?
                  > > Or use an array?
                  > >
                  > > Have about 5000 lines like this in the list.
                  > >
                  > > Thanks,
                  > > John.
                  > >
                  >




                  [Non-text portions of this message have been removed]
                • diodeom
                  ... Same here, Don — I m far from any RegEx competency. I m just content when I m able to utilize it to the extent of my needs/wants. And K was a like
                  Message 8 of 16 , Jan 28, 2010
                    --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...> wrote:
                    >
                    > That is cool stuff, thanks for explaining. I am still working on RegEx
                    > stuff. Probably one of my greater weaknesses, but I'm growing. Used to
                    > be I'd avoid them, now I can write one from scratch often and on the
                    > first try :-)


                    Same here, Don — I'm far from any RegEx competency. I'm just content when I'm able to utilize it to the extent of my needs/wants. And \K was a like breath of fresh air to me just where the usual look-behind assertions weren't overly eager to cooperate.

                    \K is actually somewhat similar to the recently "tortured" method of removing array elements in the very aspect you brought up: when \K says "forget about what's before me," one can still reference any parenthesized fragment(s) persistent in the otherwise disregarded section of the match.

                    To illustrate:

                    Sample string contains: 2010-01-28
                    RegEx search pattern: "(\d{4})-\K\d\d-\d\d"
                    (Matching substring: 01-28)
                    Replacement pattern: "$1"
                    (It references: 2010)
                    Result: 2010-2010

                    The first four digits, which are accessible by $1 even though \K determines that they are not a part of the match, will be in this case placed where the actual (practical?) match happens (at month and day digits, dividing dash included).

                    It is this apparent here ability to seemingly "have it both ways at once" which makes me wonder what possibilities are there for the similar behavior in the method of decreasing arrays by their element count, where again — for all practical purposes — the "tricked" arrays act exactly as they should, and being able to still access their removed (superficially or not) elements looks to me like an added and quite explorable advantage.
                  • diodeom
                    ... A csv Field Merge clip? Maybe: - verify csv peculiarities (delimiter, et al.) - ask which fields should be merged - calc position of the delimiter to
                    Message 9 of 16 , Jan 28, 2010
                      --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...> wrote:
                      >
                      > I wonder instead of {4} you could use {^%count} and then inc that and
                      > repeat it. Have to test that when I get a minute. Would allow you to
                      > enter a list of delimiters to dump out (in this case I know he had a
                      > fixed number) and then do them with a one liner.


                      A csv Field Merge clip? Maybe:

                      - verify csv peculiarities (delimiter, et al.)
                      - ask which fields should be merged
                      - calc position of the delimiter to remove
                      - feed it to the replace pattern

                      It wouldn't surprise me to learn that a similar clip already resides somewhere within standard libraries. I won't bother to look though... :D
                    • John Wallace
                      Quote: A csv Field Merge clip? Maybe: I d be interested in that. ... I do a lot of csv files to use in my website. I usually mess up and get some extraneous
                      Message 10 of 16 , Jan 29, 2010
                        Quote: A csv Field Merge clip? Maybe:

                        I'd be interested in that.
                        :)

                        I do a lot of csv files to use in my website.
                        I usually mess up and get some extraneous quotes or commas.
                        Also need to do a lot of 'cutting' of fields.

                        In this case combining fields.

                        This worked great.

                        At some point in time I'd like to combine all my data to one file to make a
                        huge search database for me.

                        Thanks again,
                        John


                        _____

                        From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf
                        Of diodeom
                        Sent: Friday, January 29, 2010 12:24 AM
                        To: ntb-clips@yahoogroups.com
                        Subject: [Clip] Re: CSV




                        --- In ntb-clips@yahoogrou <mailto:ntb-clips%40yahoogroups.com> ps.com, "Don
                        - HtmlFixIt.com" <don@...> wrote:
                        >
                        > I wonder instead of {4} you could use {^%count} and then inc that and
                        > repeat it. Have to test that when I get a minute. Would allow you to
                        > enter a list of delimiters to dump out (in this case I know he had a
                        > fixed number) and then do them with a one liner.

                        A csv Field Merge clip? Maybe:

                        - verify csv peculiarities (delimiter, et al.)
                        - ask which fields should be merged
                        - calc position of the delimiter to remove
                        - feed it to the replace pattern

                        It wouldn't surprise me to learn that a similar clip already resides
                        somewhere within standard libraries. I won't bother to look though... :D






                        [Non-text portions of this message have been removed]
                      • diodeom
                        ... I m glad you found the previous quicky useful. Below is what I cooked up just for simple column removal. If I find time, I will try to improve it and maybe
                        Message 11 of 16 , Jan 29, 2010
                          --- In ntb-clips@yahoogroups.com, "John Wallace" <johnta1@...> wrote:
                          >
                          > Quote: A csv Field Merge clip? Maybe:
                          >
                          > I'd be interested in that.
                          > :)
                          >
                          > I do a lot of csv files to use in my website.
                          > I usually mess up and get some extraneous quotes or commas.
                          > Also need to do a lot of 'cutting' of fields.
                          >
                          > In this case combining fields.
                          >
                          > This worked great.
                          >
                          > At some point in time I'd like to combine all my data to one file to make a
                          > huge search database for me.
                          >
                          > Thanks again,
                          > John


                          I'm glad you found the previous quicky useful.

                          Below is what I cooked up just for simple column removal. If I find time, I will try to improve it and maybe play next with column rearranging.

                          (Please note: this clip assumes that all your fields are surrounded by double-quotes and comma is the separator.)

                          And for less trivial manipulations, before a decent complex NT clip surfaces, I'd pragmatically (and perhaps sacrilegiously) suggest an immediately available dedicated csv editor (like CsvEd ==> csved.sjfrancke.nl).

                          Dio

                          h="CSV Column Remover"

                          ;Get number of columns
                          ^!Set %Col%=^$StrCount("","";"^$GetParagraph$";False;False)$
                          ^!If ^%Col%=0 Oops
                          ^!Inc %Col%
                          ;Build list of choices
                          ^!SetDelimiter |
                          ^!SetArray %Choice%=_Column 1^=1
                          ^!Set %Count%=2
                          :Loop
                          ^!If ^%Count%>^%Col% Ask
                          ^!Append %Choice%=|Column ^%Count%^=^%Count%
                          ^!Inc %Count%
                          ^!Goto Loop
                          :Ask
                          ^!SetWizardWidth 58
                          ^!Set %Choice%=^?{(T=C)Remove==^%Choice%}
                          ^!If ^%Choice%=1 First
                          ^!If ^%Choice%=^%Col% Last
                          ^!Set %Col%=(^.*?",)(.*?,){^$Calc(^%Choice%-2)$}\K.*?,
                          ^!Goto Del
                          :First
                          ^!Set %Col%=^.*?",
                          ^!Goto Del
                          :Last
                          ^!Set %Col%=^.+\K,".*?"$
                          :Del
                          ^!Replace "^%Col%" >> "" WARS
                          ^!ClearVariable %Count%
                          ^!ClearVariable %Choice%
                          ^!ClearVariable %Col%
                          ^!Jump TEXT_START
                          ^!Goto End
                          :Oops
                          ;Begin long line
                          ^!Prompt Sorry, but this clip will work only on a^%nl%comma-delimited double-quoted csv^%nl%with at least two columns.
                          ;End long line
                          ^!ClearVariable %Col%
                        • diodeom
                          ... Here s the first erratum: although the fifth line under :Ask happens to work as is, it should read: ^!Set %Col%=^.*? ,(.*?,){^$Calc(^%Choice%-2)$} K.*?,
                          Message 12 of 16 , Jan 29, 2010
                            --- In ntb-clips@yahoogroups.com, "diodeom" <diomir@...> wrote:
                            >
                            > h="CSV Column Remover"
                            >
                            > ;Get number of columns
                            > ^!Set %Col%=^$StrCount("","";"^$GetParagraph$";False;False)$
                            > ^!If ^%Col%=0 Oops
                            > ^!Inc %Col%
                            > ;Build list of choices
                            > ^!SetDelimiter |
                            > ^!SetArray %Choice%=_Column 1^=1
                            > ^!Set %Count%=2
                            > :Loop
                            > ^!If ^%Count%>^%Col% Ask
                            > ^!Append %Choice%=|Column ^%Count%^=^%Count%
                            > ^!Inc %Count%
                            > ^!Goto Loop
                            > :Ask
                            > ^!SetWizardWidth 58
                            > ^!Set %Choice%=^?{(T=C)Remove==^%Choice%}
                            > ^!If ^%Choice%=1 First
                            > ^!If ^%Choice%=^%Col% Last
                            > ^!Set %Col%=(^.*?",)(.*?,){^$Calc(^%Choice%-2)$}\K.*?,
                            > ^!Goto Del
                            > :First
                            > ^!Set %Col%=^.*?",
                            > ^!Goto Del
                            > :Last
                            > ^!Set %Col%=^.+\K,".*?"$
                            > :Del
                            > ^!Replace "^%Col%" >> "" WARS
                            > ^!ClearVariable %Count%
                            > ^!ClearVariable %Choice%
                            > ^!ClearVariable %Col%
                            > ^!Jump TEXT_START
                            > ^!Goto End
                            > :Oops
                            > ;Begin long line
                            > ^!Prompt Sorry, but this clip will work only on a^%nl%comma-delimited double-quoted csv^%nl%with at least two columns.
                            > ;End long line
                            > ^!ClearVariable %Col%
                            >
                            Here's the first erratum: although the fifth line under ":Ask" happens to work as is, it should read:
                            ^!Set %Col%=^.*?",(.*?,){^$Calc(^%Choice%-2)$}\K.*?,
                          • thefrankwmx
                            hello john et.al. another option is to replace offending chars with HTML special chars within the text before converting to csv (e.g. to ")... this will
                            Message 13 of 16 , Jan 30, 2010
                              hello john et.al.

                              another option is to replace offending chars with HTML special chars within the text before converting to csv (e.g. " to ")...

                              this will add filesize to your text and will impact storage limits if they exist. that is why many, including myself, no longer use comma-separated-values but prefer tab-delimited-values. works great unless tabs are ever included within your text, which in my cases are not.

                              with tab-delimited there is no need to wrap strings in quotes either, therefore no need to worry about commas, quotes, or apostrophes within your text. just be sure there are no tabs within the text!

                              then if you want to collapse 2 columns, start a the beginning of the row, count over to the tab between those columns, and replace it with a space.

                              regards,

                              tf
                            • diodeom
                              ... Yay to tabs; that s how I do it too. For converting a true csv files, which may have all sorts of challenging quotes (or none) and commas, I use the
                              Message 14 of 16 , Jan 30, 2010
                                --- In ntb-clips@yahoogroups.com, "thefrankwmx" <tf@...> wrote:
                                >
                                > hello john et.al.
                                >
                                > another option is to replace offending chars with HTML special chars within the text before converting to csv (e.g. " to ")...
                                >
                                > this will add filesize to your text and will impact storage limits if they exist. that is why many, including myself, no longer use comma-separated-values but prefer tab-delimited-values. works great unless tabs are ever included within your text, which in my cases are not.
                                >
                                > with tab-delimited there is no need to wrap strings in quotes either, therefore no need to worry about commas, quotes, or apostrophes within your text. just be sure there are no tabs within the text!
                                >
                                > then if you want to collapse 2 columns, start a the beginning of the row, count over to the tab between those columns, and replace it with a space.
                                >
                                > regards,
                                >
                                > tf
                                >

                                Yay to tabs; that's how I do it too.
                                For converting a true csv files, which may have all sorts of challenging quotes (or none) and commas, I use the following:

                                ^!Replace ",(?=(?:[^"]*"[^"]*")*(?![^"]*"))" >> "\t" WARS

                                Thus a sample csv record line:

                                0398,Diodeom "Dio" Doe,"01 BP 112, Deou",Burkina Faso

                                will read after the replacement:

                                0398<\t>Diodeom "Dio" Doe<\t>"01 BP 112, Deou"<\t>Burkina Faso

                                were for clarity I substituted proper tabs with <\t>

                                Dio
                              • Alec Burgess
                                ... If you happen to have Excel available, you can open a CSV file in it and then copy paste a subset of the spread-sheet into Notetab where it becomes tab
                                Message 15 of 16 , Jan 30, 2010
                                  thefrankwmx (tf@...) wrote (in part) (on 2010-01-30 at 15:44):
                                  > another option is to replace offending chars with HTML special chars
                                  > within the text before converting to csv (e.g. " to ")...
                                  >
                                  > this will add filesize to your text and will impact storage limits if
                                  > they exist. that is why many, including myself, no longer use
                                  > comma-separated-values but prefer tab-delimited-values. works great
                                  > unless tabs are ever included within your text, which in my cases are
                                  > not.
                                  >
                                  > with tab-delimited there is no need to wrap strings in quotes either,
                                  > therefore no need to worry about commas, quotes, or apostrophes
                                  > within your text. just be sure there are no tabs within the text!
                                  >
                                  > then if you want to collapse 2 columns, start a the beginning of the
                                  > row, count over to the tab between those columns, and replace it with
                                  > a space.

                                  If you happen to have Excel available, you can open a CSV file in it and
                                  then copy paste a subset of the spread-sheet into Notetab where it
                                  becomes tab delimited.

                                  When working with CSV and or spread-sheet data I frequently do the
                                  "stuff" that's easiest in Excel (eg. insert delete columns, special
                                  sorts) there and copy-paste back to Notetab for ""stuff" like running
                                  regexp clips.

                                  --
                                  Regards ... Alec (buralex@gmail & WinLiveMess - alec.m.burgess@skype)




                                  [Non-text portions of this message have been removed]
                                • Don - HtmlFixIt.com
                                  First, great to see thefrank posting. I haven t notice his posts for several years. Always in the middle of great discussions. Second, I use either tab or
                                  Message 16 of 16 , Jan 30, 2010
                                    First, great to see "thefrank" posting. I haven't notice his posts for
                                    several years. Always in the middle of great discussions.

                                    Second, I use either tab or pipes "|" neither of which typically appear
                                    in my stuff. I too often copy and paste back and forth between excel.
                                    I also have written several clips to divide stuff in a column like names
                                    .... most names are two groups of letters separated by one space, but
                                    some names are three groups of letters with two spaces or even four
                                    groups of letters with three spaces ...

                                    Mary Smith
                                    Mary Smith Jones
                                    Mary Jane Smith
                                    Mary Jane Smith Jones

                                    My clip asks if there are three "names" whether the extra space goes in
                                    the first name or last name .... I then paste back to excel after dividing.

                                    Of course if all names were two with one space, I could use text to
                                    columns ... but when you add extra spaces ... back to notetab and my clip.

                                    Don

                                    Alec Burgess wrote:
                                    > thefrankwmx (tf@...) wrote (in part) (on 2010-01-30 at 15:44):
                                    >> another option is to replace offending chars with HTML special chars
                                    >> within the text before converting to csv (e.g. " to ")...
                                    >>
                                    >> this will add filesize to your text and will impact storage limits if
                                    >> they exist. that is why many, including myself, no longer use
                                    >> comma-separated-values but prefer tab-delimited-values. works great
                                    >> unless tabs are ever included within your text, which in my cases are
                                    >> not.
                                    >>
                                    >> with tab-delimited there is no need to wrap strings in quotes either,
                                    >> therefore no need to worry about commas, quotes, or apostrophes
                                    >> within your text. just be sure there are no tabs within the text!
                                    >>
                                    >> then if you want to collapse 2 columns, start a the beginning of the
                                    >> row, count over to the tab between those columns, and replace it with
                                    >> a space.
                                    >
                                    > If you happen to have Excel available, you can open a CSV file in it and
                                    > then copy paste a subset of the spread-sheet into Notetab where it
                                    > becomes tab delimited.
                                    >
                                    > When working with CSV and or spread-sheet data I frequently do the
                                    > "stuff" that's easiest in Excel (eg. insert delete columns, special
                                    > sorts) there and copy-paste back to Notetab for ""stuff" like running
                                    > regexp clips.
                                    >
                                  Your message has been successfully submitted and would be delivered to recipients shortly.