Loading ...
Sorry, an error occurred while loading the content.

Deleting duplicate lines

Expand Messages
  • Mike Breiding - Morgantown WV
    Greetings, I have a CSV file with numerous duplicate lines. Any clips out there to delete the duplicate lines? Thanks, Mike
    Message 1 of 11 , Aug 30, 2013
    • 0 Attachment
      Greetings,
      I have a CSV file with numerous duplicate lines.

      Any clips out there to delete the duplicate lines?

      Thanks,
      Mike
    • Axel Berger
      ... If two identical lines come right one after the other it s easy. Otherwise you d need a fully featured CSS interpreter. All my CSS files contain many
      Message 2 of 11 , Aug 30, 2013
      • 0 Attachment
        Mike Breiding - Morgantown WV wrote:
        > Any clips out there to delete the duplicate lines?

        If two identical lines come right one after the other it's easy.
        Otherwise you'd need a fully featured CSS interpreter. All my CSS files
        contain many identical lines in different contexts. For example
        identical text colour means different things in different places.

        Axel
      • hsavage
        On 8/30/2013 2:03 PM, Mike Breiding - Morgantown WV wrote: ... Mike, I use CSVed now and then for CSV files. But, if you re correct and the referenced lines
        Message 3 of 11 , Aug 30, 2013
        • 0 Attachment
          On 8/30/2013 2:03 PM, Mike Breiding - Morgantown WV wrote:>
          > Greetings,
          > I have a CSV file with numerous duplicate lines.
          >
          > Any clips out there to delete the duplicate lines?
          >
          > Thanks,
          > Mike

          Mike,

          I use CSVed now and then for CSV files. But, if you're correct and the
          referenced lines are identical, that means each 'record' or series of
          'fields' is followed by a carriage return.

          If this is the case you can eliminate the dupes by loading into NTP and
          sorting. This, of course will sort all records in the database
          alphabetically, either ascending or descending. You make the call.

          ···············································
          ¤• JD#.242 - ¤• SL.968/@4>4 - 13.08.30~18.50.20

          • My Dog Is Worried About The Economy
          • Because Alpo Is Up To $3.00 A Can.
          • That's Almost $21.00 In Dog Money.

          • --Joe Weinstein

          € hrs € hsavage € pobox € com
        • flo.gehrke
          ... Harvey, In order to avoid the sorting you could try... ^!Set %Nr%=0 ... ^!Inc %Nr% ^!If ^%Nr% ^$GetTextLineCount$ End ^!Jump ^%Nr% ^!Set
          Message 4 of 11 , Aug 31, 2013
          • 0 Attachment
            --- In ntb-clips@yahoogroups.com, hsavage <hsavage@...> wrote:
            >
            > (...) you can eliminate the dupes by loading into NTP and
            > sorting. This, of course will sort all records in the database
            > alphabetically, either ascending or descending.

            Harvey,

            In order to avoid the sorting you could try...

            ^!Set %Nr%=0
            :Loop
            ^!Inc %Nr%
            ^!If ^%Nr% > ^$GetTextLineCount$ End
            ^!Jump ^%Nr%
            ^!Set %Line%=^$GetParagraph$
            ^!Replace "^P"^%Line%"" >> "" AS
            ^!Goto Loop

            In a CSV file like...

            "A";"B";"C"
            "D";"E";"F"
            "G";"H";"I"
            "A";"B";"C"
            "A";"B";"C"
            "J";"K";"L"

            the clip will remove line #4 and #5 (duplicates of line #1) without sorting.

            Note: You possibly have to adapt the search pattern to a different data structure.

            Regards,
            Flo
          • Adrian Worsfold
            I ve already got this: ^!ClearVariables ^!Jump Doc_End ^!InsertText ^p^p ^!Jump 1 ^!SetWizardLabel Dups, Trips, Quats removed ;) ^!Set %Prompt%=^?{Prompt
            Message 5 of 11 , Aug 31, 2013
            • 0 Attachment
              I've already got this:

              ^!ClearVariables
              ^!Jump Doc_End
              ^!InsertText ^p^p
              ^!Jump 1
              ^!SetWizardLabel Dups, Trips, Quats removed ;)
              ^!Set %Prompt%=^?{Prompt before deleting?==Yes^=1|_No^=0}; %Save%=Lines deleted:^%nl%^%nl%
              ^!IfFalse ^$IsWordWrap$ Start
              ^!ToolBar Toggle Word Wrap
              ^!Set %WrapOn%=1
              :Start
              ^!Select Eol
              ^!Set %Row%=^$GetRow$; %Data%=^$GetSelection$
              :Find
              ^!Find "^%Data%" IS
              ^!IfError NextRow
              ^!If "^$GetLineSize$" > "^$GetSelSize$" Find
              ^!Append %Save%=^%Data%^%nl%
              ^!If "^$GetRow$" = "^$GetLineCount$" Wrap
              ^!IfTrue ^%Prompt% Skip_2
              ^!DeleteLine
              ^!Goto Find
              ^!Skip Delete this line?
              ^!Goto Find
              ^!DeleteLine
              ^!If "^$GetRow$" = "^$GetLineCount$" Next Else Find
              :NextRow
              ^!If ^$GetRow$ = ^$GetLineCount$ Wrap
              ^!Inc %Row%
              ^!Jump ^%Row%
              ^!Goto Start
              :Wrap
              ^!IfTrue ^%WrapOn% Next else Skip
              ^!ToolBar Toggle Word Wrap
              ^!Jump Doc_End
              :Loop1
              ^!IfFalse ^$IsEmpty(^$GetLine$)$ Skip_2
              ^!Replace "^p" >> "" SB
              ^!Goto Loop1
              ^!InsertText ^p
              :Info
              ^!SetWizardTitle Duplicates Removed
              ^!SetClipboard ^%Save%
              ^!Toolbar Paste New
              :Loop2
              ^!IfFalse ^$IsEmpty(^$GetLine$)$ Skip_2
              ^!Replace "^p" >> "" SB
              ^!Goto Loop2
              ^!InsertText ^p




              Adrian Worsfold

              http://www.pluralist.co.uk
              http://pluralistspeaks.blogspot.com
              pluralist@...
              31-08-2013
              ----- Received the following content -----
              From: flo.gehrke
              Receiver: ntb-clips
              Time: 2013-08-31, 12:49:48
              Subject: Re: [Clip] Deleting duplicate lines


              [Non-text portions of this message have been removed]
            • John Shotsky
              To simply remove all duplicate lines, you can use this: ^!Replace (?s)^([^ r n]+ R)(.* R+) 1 ARSW ^!IfError Next Else Skip_-1 If they are the same
              Message 6 of 11 , Aug 31, 2013
              • 0 Attachment
                To simply remove all duplicate lines, you can use this:
                ^!Replace "(?s)^([^\r\n]+\R)(.*\R+)\1" >> "" ARSW
                ^!IfError Next Else Skip_-1

                If they are the same otherwise, but different case, add an 'I' to the options.
                Regards,
                John
                RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/
                John's Mags Yahoo Group: <http://groups.yahoo.com/group/johnsmags/> http://groups.yahoo.com/group/johnsmags/

                From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Adrian Worsfold
                Sent: Saturday, August 31, 2013 10:23
                To: ntb-clips
                Subject: [Clip] Deleting duplicate lines


                I've already got this:

                ^!ClearVariables
                ^!Jump Doc_End
                ^!InsertText ^p^p
                ^!Jump 1
                ^!SetWizardLabel Dups, Trips, Quats removed ;)
                ^!Set %Prompt%=^?{Prompt before deleting?==Yes^=1|_No^=0}; %Save%=Lines deleted:^%nl%^%nl%
                ^!IfFalse ^$IsWordWrap$ Start
                ^!ToolBar Toggle Word Wrap
                ^!Set %WrapOn%=1
                :Start
                ^!Select Eol
                ^!Set %Row%=^$GetRow$; %Data%=^$GetSelection$
                :Find
                ^!Find "^%Data%" IS
                ^!IfError NextRow
                ^!If "^$GetLineSize$" > "^$GetSelSize$" Find
                ^!Append %Save%=^%Data%^%nl%
                ^!If "^$GetRow$" = "^$GetLineCount$" Wrap
                ^!IfTrue ^%Prompt% Skip_2
                ^!DeleteLine
                ^!Goto Find
                ^!Skip Delete this line?
                ^!Goto Find
                ^!DeleteLine
                ^!If "^$GetRow$" = "^$GetLineCount$" Next Else Find
                :NextRow
                ^!If ^$GetRow$ = ^$GetLineCount$ Wrap
                ^!Inc %Row%
                ^!Jump ^%Row%
                ^!Goto Start
                :Wrap
                ^!IfTrue ^%WrapOn% Next else Skip
                ^!ToolBar Toggle Word Wrap
                ^!Jump Doc_End
                :Loop1
                ^!IfFalse ^$IsEmpty(^$GetLine$)$ Skip_2
                ^!Replace "^p" >> "" SB
                ^!Goto Loop1
                ^!InsertText ^p
                :Info
                ^!SetWizardTitle Duplicates Removed
                ^!SetClipboard ^%Save%
                ^!Toolbar Paste New
                :Loop2
                ^!IfFalse ^$IsEmpty(^$GetLine$)$ Skip_2
                ^!Replace "^p" >> "" SB
                ^!Goto Loop2
                ^!InsertText ^p




                Adrian Worsfold

                http://www.pluralist.co.uk
                http://pluralistspeaks.blogspot.com
                pluralist@... <mailto:pluralist%40pluralist.karoo.co.uk>
                31-08-2013
                ----- Received the following content -----
                From: flo.gehrke
                Receiver: ntb-clips
                Time: 2013-08-31, 12:49:48
                Subject: Re: [Clip] Deleting duplicate lines

                [Non-text portions of this message have been removed]



                [Non-text portions of this message have been removed]
              • Adrian Worsfold
                Hello John Shotsky Using ^!Replace (?s)^([^ r n]+ R)(.* R+) 1 ARSW ^!IfError Next Else Skip_-1 Means Fred was a job man John was a job man Bill enjoyed
                Message 7 of 11 , Aug 31, 2013
                • 0 Attachment
                  Hello John Shotsky

                  Using

                  ^!Replace "(?s)^([^\r\n]+\R)(.*\R+)\1" >> "" ARSW
                  ^!IfError Next Else Skip_-1

                  Means

                  Fred was a job man
                  John was a job man
                  Bill enjoyed work
                  Fred was a job man
                  Fred had a job
                  Fred was a job man
                  Bill enjoyed work
                  Bob was unemployed

                  Becomes

                  Bill enjoyed work
                  Bob was unemployed

                  Which is incorrect





                  Adrian Worsfold

                  http://www.pluralist.co.uk
                  http://pluralistspeaks.blogspot.com
                  pluralist@...
                  31-08-2013
                  ----- Received the following content -----
                  From: John Shotsky
                  Receiver: ntb-clips
                  Time: 2013-08-31, 18:48:03
                  Subject: RE: [Clip] Deleting duplicate lines


                  [Non-text portions of this message have been removed]
                • John Shotsky
                  Yes, of course it did. I sent it before I made the last change - Note the $2 in the replace side. ^!Replace (?s)^([^ r n]+ R)(.* R+) 1 $2 ARSW ^!IfError
                  Message 8 of 11 , Aug 31, 2013
                  • 0 Attachment
                    Yes, of course it did. I sent it before I made the last change - Note the $2 in the replace side.
                    ^!Replace "(?s)^([^\r\n]+\R)(.*\R+)\1" >> "$2" ARSW
                    ^!IfError Next Else Skip_-1
                    Regards,
                    John
                    RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/
                    John's Mags Yahoo Group: <http://groups.yahoo.com/group/johnsmags/> http://groups.yahoo.com/group/johnsmags/

                    From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Adrian Worsfold
                    Sent: Saturday, August 31, 2013 11:51
                    To: ntb-clips
                    Subject: Re: RE: [Clip] Deleting duplicate lines


                    Hello John Shotsky

                    Using

                    ^!Replace "(?s)^([^\r\n]+\R)(.*\R+)\1" >> "" ARSW
                    ^!IfError Next Else Skip_-1

                    Means

                    Fred was a job man
                    John was a job man
                    Bill enjoyed work
                    Fred was a job man
                    Fred had a job
                    Fred was a job man
                    Bill enjoyed work
                    Bob was unemployed

                    Becomes

                    Bill enjoyed work
                    Bob was unemployed

                    Which is incorrect





                    Adrian Worsfold

                    http://www.pluralist.co.uk
                    http://pluralistspeaks.blogspot.com
                    pluralist@... <mailto:pluralist%40pluralist.karoo.co.uk>
                    31-08-2013
                    ----- Received the following content -----
                    From: John Shotsky
                    Receiver: ntb-clips
                    Time: 2013-08-31, 18:48:03
                    Subject: RE: [Clip] Deleting duplicate lines

                    [Non-text portions of this message have been removed]



                    [Non-text portions of this message have been removed]
                  • flo.gehrke
                    ... I understand that this about unsorted lists, and the job is to remove duplicates without changing the order of lines. In this case, you better be cautious
                    Message 9 of 11 , Sep 1, 2013
                    • 0 Attachment
                      --- In ntb-clips@yahoogroups.com, "John Shotsky" <jshotsky@...> wrote:
                      >
                      > Yes, of course it did. I sent it before I made the last change -
                      > Note the $2 in the replace side.
                      > ^!Replace "(?s)^([^\r\n]+\R)(.*\R+)\1" >> "$2" ARSW
                      > ^!IfError Next Else Skip_-1
                      > Regards,
                      > John

                      I understand that this about unsorted lists, and the job is to remove duplicates without changing the order of lines. In this case, you better be cautious with that clip. For example, a list like...

                      BBB
                      111
                      111
                      BBB
                      222
                      DDD
                      222
                      FFF

                      is changed to...

                      111
                      111
                      DDD
                      FFF

                      Certainly, this is not the expected result.

                      Regards,
                      Flo
                    • John Shotsky
                      Yep, you re right. It worked on my list, but yours included things I didn t test for. Now it works on your list too. ^!Replace (?s)^([^ r n]+ R) K(.+ R)* 1
                      Message 10 of 11 , Sep 1, 2013
                      • 0 Attachment
                        Yep, you're right. It worked on my list, but yours included things I didn't test for.
                        Now it works on your list too.
                        ^!Replace "(?s)^([^\r\n]+\R)\K(.+\R)*\1" >> "$2" ARSW
                        ^!IfError Next Else Skip_-1

                        BBB
                        111
                        222
                        DDD
                        FFF

                        Regards,
                        John
                        RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/
                        John's Mags Yahoo Group: <http://groups.yahoo.com/group/johnsmags/> http://groups.yahoo.com/group/johnsmags/

                        From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of flo.gehrke
                        Sent: Sunday, September 01, 2013 09:23
                        To: ntb-clips@yahoogroups.com
                        Subject: Re: [Clip] Deleting duplicate lines


                        --- In ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com> , "John Shotsky" <jshotsky@...> wrote:
                        >
                        > Yes, of course it did. I sent it before I made the last change -
                        > Note the $2 in the replace side.
                        > ^!Replace "(?s)^([^\r\n]+\R)(.*\R+)\1" >> "$2" ARSW
                        > ^!IfError Next Else Skip_-1
                        > Regards,
                        > John

                        I understand that this about unsorted lists, and the job is to remove duplicates without changing the order of lines. In this case,
                        you better be cautious with that clip. For example, a list like...

                        BBB
                        111
                        111
                        BBB
                        222
                        DDD
                        222
                        FFF

                        is changed to...

                        111
                        111
                        DDD
                        FFF

                        Certainly, this is not the expected result.

                        Regards,
                        Flo



                        [Non-text portions of this message have been removed]
                      • Adrian Worsfold
                        Hello ^!Replace (?s)^([^ r n]+ R) K(.+ R)* 1 $2 ARSW ^!IfError Next Else Skip_-1 Needs a return on the final line end otherwise it doesn t remove a
                        Message 11 of 11 , Sep 1, 2013
                        • 0 Attachment
                          Hello

                          ^!Replace "(?s)^([^\r\n]+\R)\K(.+\R)*\1" >> "$2" ARSW
                          ^!IfError Next Else Skip_-1

                          Needs a return on the final line end otherwise it doesn't remove a duplicate line at the end.

                          David took a service
                          Paul didn't take a service
                          Geoff didn't take a service
                          Janet didn't take a service
                          Janet didn't take a service
                          David took a service





                          Adrian Worsfold

                          http://www.pluralist.co.uk
                          http://pluralistspeaks.blogspot.com
                          pluralist@...
                          01-09-2013
                          ----- Received the following content -----
                          From: John Shotsky
                          Receiver: ntb-clips
                          Time: 2013-09-01, 18:08:30
                          Subject: RE: [Clip] Deleting duplicate lines


                          [Non-text portions of this message have been removed]
                        Your message has been successfully submitted and would be delivered to recipients shortly.