Loading ...
Sorry, an error occurred while loading the content.

Re: [Clip] Deleting duplicate lines

Expand Messages
  • hsavage
    On 8/30/2013 2:03 PM, Mike Breiding - Morgantown WV wrote: ... Mike, I use CSVed now and then for CSV files. But, if you re correct and the referenced lines
    Message 1 of 11 , Aug 30, 2013
    • 0 Attachment
      On 8/30/2013 2:03 PM, Mike Breiding - Morgantown WV wrote:>
      > Greetings,
      > I have a CSV file with numerous duplicate lines.
      >
      > Any clips out there to delete the duplicate lines?
      >
      > Thanks,
      > Mike

      Mike,

      I use CSVed now and then for CSV files. But, if you're correct and the
      referenced lines are identical, that means each 'record' or series of
      'fields' is followed by a carriage return.

      If this is the case you can eliminate the dupes by loading into NTP and
      sorting. This, of course will sort all records in the database
      alphabetically, either ascending or descending. You make the call.

      ···············································
      ¤• JD#.242 - ¤• SL.968/@4>4 - 13.08.30~18.50.20

      • My Dog Is Worried About The Economy
      • Because Alpo Is Up To $3.00 A Can.
      • That's Almost $21.00 In Dog Money.

      • --Joe Weinstein

      € hrs € hsavage € pobox € com
    • flo.gehrke
      ... Harvey, In order to avoid the sorting you could try... ^!Set %Nr%=0 ... ^!Inc %Nr% ^!If ^%Nr% ^$GetTextLineCount$ End ^!Jump ^%Nr% ^!Set
      Message 2 of 11 , Aug 31, 2013
      • 0 Attachment
        --- In ntb-clips@yahoogroups.com, hsavage <hsavage@...> wrote:
        >
        > (...) you can eliminate the dupes by loading into NTP and
        > sorting. This, of course will sort all records in the database
        > alphabetically, either ascending or descending.

        Harvey,

        In order to avoid the sorting you could try...

        ^!Set %Nr%=0
        :Loop
        ^!Inc %Nr%
        ^!If ^%Nr% > ^$GetTextLineCount$ End
        ^!Jump ^%Nr%
        ^!Set %Line%=^$GetParagraph$
        ^!Replace "^P"^%Line%"" >> "" AS
        ^!Goto Loop

        In a CSV file like...

        "A";"B";"C"
        "D";"E";"F"
        "G";"H";"I"
        "A";"B";"C"
        "A";"B";"C"
        "J";"K";"L"

        the clip will remove line #4 and #5 (duplicates of line #1) without sorting.

        Note: You possibly have to adapt the search pattern to a different data structure.

        Regards,
        Flo
      • Adrian Worsfold
        I ve already got this: ^!ClearVariables ^!Jump Doc_End ^!InsertText ^p^p ^!Jump 1 ^!SetWizardLabel Dups, Trips, Quats removed ;) ^!Set %Prompt%=^?{Prompt
        Message 3 of 11 , Aug 31, 2013
        • 0 Attachment
          I've already got this:

          ^!ClearVariables
          ^!Jump Doc_End
          ^!InsertText ^p^p
          ^!Jump 1
          ^!SetWizardLabel Dups, Trips, Quats removed ;)
          ^!Set %Prompt%=^?{Prompt before deleting?==Yes^=1|_No^=0}; %Save%=Lines deleted:^%nl%^%nl%
          ^!IfFalse ^$IsWordWrap$ Start
          ^!ToolBar Toggle Word Wrap
          ^!Set %WrapOn%=1
          :Start
          ^!Select Eol
          ^!Set %Row%=^$GetRow$; %Data%=^$GetSelection$
          :Find
          ^!Find "^%Data%" IS
          ^!IfError NextRow
          ^!If "^$GetLineSize$" > "^$GetSelSize$" Find
          ^!Append %Save%=^%Data%^%nl%
          ^!If "^$GetRow$" = "^$GetLineCount$" Wrap
          ^!IfTrue ^%Prompt% Skip_2
          ^!DeleteLine
          ^!Goto Find
          ^!Skip Delete this line?
          ^!Goto Find
          ^!DeleteLine
          ^!If "^$GetRow$" = "^$GetLineCount$" Next Else Find
          :NextRow
          ^!If ^$GetRow$ = ^$GetLineCount$ Wrap
          ^!Inc %Row%
          ^!Jump ^%Row%
          ^!Goto Start
          :Wrap
          ^!IfTrue ^%WrapOn% Next else Skip
          ^!ToolBar Toggle Word Wrap
          ^!Jump Doc_End
          :Loop1
          ^!IfFalse ^$IsEmpty(^$GetLine$)$ Skip_2
          ^!Replace "^p" >> "" SB
          ^!Goto Loop1
          ^!InsertText ^p
          :Info
          ^!SetWizardTitle Duplicates Removed
          ^!SetClipboard ^%Save%
          ^!Toolbar Paste New
          :Loop2
          ^!IfFalse ^$IsEmpty(^$GetLine$)$ Skip_2
          ^!Replace "^p" >> "" SB
          ^!Goto Loop2
          ^!InsertText ^p




          Adrian Worsfold

          http://www.pluralist.co.uk
          http://pluralistspeaks.blogspot.com
          pluralist@...
          31-08-2013
          ----- Received the following content -----
          From: flo.gehrke
          Receiver: ntb-clips
          Time: 2013-08-31, 12:49:48
          Subject: Re: [Clip] Deleting duplicate lines


          [Non-text portions of this message have been removed]
        • John Shotsky
          To simply remove all duplicate lines, you can use this: ^!Replace (?s)^([^ r n]+ R)(.* R+) 1 ARSW ^!IfError Next Else Skip_-1 If they are the same
          Message 4 of 11 , Aug 31, 2013
          • 0 Attachment
            To simply remove all duplicate lines, you can use this:
            ^!Replace "(?s)^([^\r\n]+\R)(.*\R+)\1" >> "" ARSW
            ^!IfError Next Else Skip_-1

            If they are the same otherwise, but different case, add an 'I' to the options.
            Regards,
            John
            RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/
            John's Mags Yahoo Group: <http://groups.yahoo.com/group/johnsmags/> http://groups.yahoo.com/group/johnsmags/

            From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Adrian Worsfold
            Sent: Saturday, August 31, 2013 10:23
            To: ntb-clips
            Subject: [Clip] Deleting duplicate lines


            I've already got this:

            ^!ClearVariables
            ^!Jump Doc_End
            ^!InsertText ^p^p
            ^!Jump 1
            ^!SetWizardLabel Dups, Trips, Quats removed ;)
            ^!Set %Prompt%=^?{Prompt before deleting?==Yes^=1|_No^=0}; %Save%=Lines deleted:^%nl%^%nl%
            ^!IfFalse ^$IsWordWrap$ Start
            ^!ToolBar Toggle Word Wrap
            ^!Set %WrapOn%=1
            :Start
            ^!Select Eol
            ^!Set %Row%=^$GetRow$; %Data%=^$GetSelection$
            :Find
            ^!Find "^%Data%" IS
            ^!IfError NextRow
            ^!If "^$GetLineSize$" > "^$GetSelSize$" Find
            ^!Append %Save%=^%Data%^%nl%
            ^!If "^$GetRow$" = "^$GetLineCount$" Wrap
            ^!IfTrue ^%Prompt% Skip_2
            ^!DeleteLine
            ^!Goto Find
            ^!Skip Delete this line?
            ^!Goto Find
            ^!DeleteLine
            ^!If "^$GetRow$" = "^$GetLineCount$" Next Else Find
            :NextRow
            ^!If ^$GetRow$ = ^$GetLineCount$ Wrap
            ^!Inc %Row%
            ^!Jump ^%Row%
            ^!Goto Start
            :Wrap
            ^!IfTrue ^%WrapOn% Next else Skip
            ^!ToolBar Toggle Word Wrap
            ^!Jump Doc_End
            :Loop1
            ^!IfFalse ^$IsEmpty(^$GetLine$)$ Skip_2
            ^!Replace "^p" >> "" SB
            ^!Goto Loop1
            ^!InsertText ^p
            :Info
            ^!SetWizardTitle Duplicates Removed
            ^!SetClipboard ^%Save%
            ^!Toolbar Paste New
            :Loop2
            ^!IfFalse ^$IsEmpty(^$GetLine$)$ Skip_2
            ^!Replace "^p" >> "" SB
            ^!Goto Loop2
            ^!InsertText ^p




            Adrian Worsfold

            http://www.pluralist.co.uk
            http://pluralistspeaks.blogspot.com
            pluralist@... <mailto:pluralist%40pluralist.karoo.co.uk>
            31-08-2013
            ----- Received the following content -----
            From: flo.gehrke
            Receiver: ntb-clips
            Time: 2013-08-31, 12:49:48
            Subject: Re: [Clip] Deleting duplicate lines

            [Non-text portions of this message have been removed]



            [Non-text portions of this message have been removed]
          • Adrian Worsfold
            Hello John Shotsky Using ^!Replace (?s)^([^ r n]+ R)(.* R+) 1 ARSW ^!IfError Next Else Skip_-1 Means Fred was a job man John was a job man Bill enjoyed
            Message 5 of 11 , Aug 31, 2013
            • 0 Attachment
              Hello John Shotsky

              Using

              ^!Replace "(?s)^([^\r\n]+\R)(.*\R+)\1" >> "" ARSW
              ^!IfError Next Else Skip_-1

              Means

              Fred was a job man
              John was a job man
              Bill enjoyed work
              Fred was a job man
              Fred had a job
              Fred was a job man
              Bill enjoyed work
              Bob was unemployed

              Becomes

              Bill enjoyed work
              Bob was unemployed

              Which is incorrect





              Adrian Worsfold

              http://www.pluralist.co.uk
              http://pluralistspeaks.blogspot.com
              pluralist@...
              31-08-2013
              ----- Received the following content -----
              From: John Shotsky
              Receiver: ntb-clips
              Time: 2013-08-31, 18:48:03
              Subject: RE: [Clip] Deleting duplicate lines


              [Non-text portions of this message have been removed]
            • John Shotsky
              Yes, of course it did. I sent it before I made the last change - Note the $2 in the replace side. ^!Replace (?s)^([^ r n]+ R)(.* R+) 1 $2 ARSW ^!IfError
              Message 6 of 11 , Aug 31, 2013
              • 0 Attachment
                Yes, of course it did. I sent it before I made the last change - Note the $2 in the replace side.
                ^!Replace "(?s)^([^\r\n]+\R)(.*\R+)\1" >> "$2" ARSW
                ^!IfError Next Else Skip_-1
                Regards,
                John
                RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/
                John's Mags Yahoo Group: <http://groups.yahoo.com/group/johnsmags/> http://groups.yahoo.com/group/johnsmags/

                From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Adrian Worsfold
                Sent: Saturday, August 31, 2013 11:51
                To: ntb-clips
                Subject: Re: RE: [Clip] Deleting duplicate lines


                Hello John Shotsky

                Using

                ^!Replace "(?s)^([^\r\n]+\R)(.*\R+)\1" >> "" ARSW
                ^!IfError Next Else Skip_-1

                Means

                Fred was a job man
                John was a job man
                Bill enjoyed work
                Fred was a job man
                Fred had a job
                Fred was a job man
                Bill enjoyed work
                Bob was unemployed

                Becomes

                Bill enjoyed work
                Bob was unemployed

                Which is incorrect





                Adrian Worsfold

                http://www.pluralist.co.uk
                http://pluralistspeaks.blogspot.com
                pluralist@... <mailto:pluralist%40pluralist.karoo.co.uk>
                31-08-2013
                ----- Received the following content -----
                From: John Shotsky
                Receiver: ntb-clips
                Time: 2013-08-31, 18:48:03
                Subject: RE: [Clip] Deleting duplicate lines

                [Non-text portions of this message have been removed]



                [Non-text portions of this message have been removed]
              • flo.gehrke
                ... I understand that this about unsorted lists, and the job is to remove duplicates without changing the order of lines. In this case, you better be cautious
                Message 7 of 11 , Sep 1, 2013
                • 0 Attachment
                  --- In ntb-clips@yahoogroups.com, "John Shotsky" <jshotsky@...> wrote:
                  >
                  > Yes, of course it did. I sent it before I made the last change -
                  > Note the $2 in the replace side.
                  > ^!Replace "(?s)^([^\r\n]+\R)(.*\R+)\1" >> "$2" ARSW
                  > ^!IfError Next Else Skip_-1
                  > Regards,
                  > John

                  I understand that this about unsorted lists, and the job is to remove duplicates without changing the order of lines. In this case, you better be cautious with that clip. For example, a list like...

                  BBB
                  111
                  111
                  BBB
                  222
                  DDD
                  222
                  FFF

                  is changed to...

                  111
                  111
                  DDD
                  FFF

                  Certainly, this is not the expected result.

                  Regards,
                  Flo
                • John Shotsky
                  Yep, you re right. It worked on my list, but yours included things I didn t test for. Now it works on your list too. ^!Replace (?s)^([^ r n]+ R) K(.+ R)* 1
                  Message 8 of 11 , Sep 1, 2013
                  • 0 Attachment
                    Yep, you're right. It worked on my list, but yours included things I didn't test for.
                    Now it works on your list too.
                    ^!Replace "(?s)^([^\r\n]+\R)\K(.+\R)*\1" >> "$2" ARSW
                    ^!IfError Next Else Skip_-1

                    BBB
                    111
                    222
                    DDD
                    FFF

                    Regards,
                    John
                    RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/
                    John's Mags Yahoo Group: <http://groups.yahoo.com/group/johnsmags/> http://groups.yahoo.com/group/johnsmags/

                    From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of flo.gehrke
                    Sent: Sunday, September 01, 2013 09:23
                    To: ntb-clips@yahoogroups.com
                    Subject: Re: [Clip] Deleting duplicate lines


                    --- In ntb-clips@yahoogroups.com <mailto:ntb-clips%40yahoogroups.com> , "John Shotsky" <jshotsky@...> wrote:
                    >
                    > Yes, of course it did. I sent it before I made the last change -
                    > Note the $2 in the replace side.
                    > ^!Replace "(?s)^([^\r\n]+\R)(.*\R+)\1" >> "$2" ARSW
                    > ^!IfError Next Else Skip_-1
                    > Regards,
                    > John

                    I understand that this about unsorted lists, and the job is to remove duplicates without changing the order of lines. In this case,
                    you better be cautious with that clip. For example, a list like...

                    BBB
                    111
                    111
                    BBB
                    222
                    DDD
                    222
                    FFF

                    is changed to...

                    111
                    111
                    DDD
                    FFF

                    Certainly, this is not the expected result.

                    Regards,
                    Flo



                    [Non-text portions of this message have been removed]
                  • Adrian Worsfold
                    Hello ^!Replace (?s)^([^ r n]+ R) K(.+ R)* 1 $2 ARSW ^!IfError Next Else Skip_-1 Needs a return on the final line end otherwise it doesn t remove a
                    Message 9 of 11 , Sep 1, 2013
                    • 0 Attachment
                      Hello

                      ^!Replace "(?s)^([^\r\n]+\R)\K(.+\R)*\1" >> "$2" ARSW
                      ^!IfError Next Else Skip_-1

                      Needs a return on the final line end otherwise it doesn't remove a duplicate line at the end.

                      David took a service
                      Paul didn't take a service
                      Geoff didn't take a service
                      Janet didn't take a service
                      Janet didn't take a service
                      David took a service





                      Adrian Worsfold

                      http://www.pluralist.co.uk
                      http://pluralistspeaks.blogspot.com
                      pluralist@...
                      01-09-2013
                      ----- Received the following content -----
                      From: John Shotsky
                      Receiver: ntb-clips
                      Time: 2013-09-01, 18:08:30
                      Subject: RE: [Clip] Deleting duplicate lines


                      [Non-text portions of this message have been removed]
                    Your message has been successfully submitted and would be delivered to recipients shortly.