Loading ...
Sorry, an error occurred while loading the content.

RE: remove duplicate lines

Expand Messages
  • David Fishburn
    Here is a posting from a very long thread. I think this is pretty well the final version. So you can add this command to your vimrc, then map a key stroke to
    Message 1 of 15 , Jun 3, 2003
    • 0 Attachment
      Here is a posting from a very long thread.
      I think this is pretty well the final version.
      So you can add this command to your vimrc, then map a key stroke to it,
      or visually select a range and type
      :Uniq

      Courtesy of Preben 'Peppe' Guldberg and Piet Delport

      HTH,
      Dave

      ****************************
      Oops. Here's the right one:

      :command -range=% Uniq <line1>,<line2>s/\v^(.*)(%<<line2>l\n\1)+$/\1/e

      It dodges the need for arithmetic by matching the \%<Nl before the \n.

      This approach still has a slight flaw, though. The end of the range is
      inserted once, as a literal line number, into the substitution command.
      The problem appears when enough lines at the start of the range are
      deleted that new duplicate lines get shifted into the range from the
      bottom, where they eventually get operated on as well.

      Another approach has just struck me, though:

      :command -range=% Uniq <line1>,<line2>g/^\%<<line2>l\(.*\)\n\1$/d

      This works deleting all lines are followed by a duplicate of themselves
      Because of the way :g works (matching/marking the lines in one pass,
      then deleting them in a second pass), the shifting problem is avoided;
      and it's shorter, too. :-)

      I now officially declare this the shortest Uniq implementation in Vim.

      *waits for the inevitable correction*

      --
      Piet Delport
      Today's subliminal thought is:
      ****************************


      -----Original Message-----
      From: jonah [mailto:jonahgoldstein@...]
      Sent: Monday, June 02, 2003 11:34 PM
      To: vim
      Subject: remove duplicate lines


      does anyone have a script that will
      remove all duplicate lines in a file?

      thanks,
      jonah
    • jonah
      That is very clever. Just out of curiosity, is there an equally clever solution for sorting lines in vim (w/out using the external sort program)? Jonah ...
      Message 2 of 15 , Jun 3, 2003
      • 0 Attachment
        That is very clever.

        Just out of curiosity, is there an equally
        clever solution for sorting lines in vim (w/out using
        the external "sort" program)?

        Jonah

        -----Original Message-----
        From: David Fishburn [mailto:fishburn@...]
        Sent: Tuesday, June 03, 2003 7:00 AM
        To: 'jonah'; 'vim'
        Subject: RE: remove duplicate lines



        Here is a posting from a very long thread.
        I think this is pretty well the final version.
        So you can add this command to your vimrc, then map a key stroke to it,
        or visually select a range and type
        :Uniq

        Courtesy of Preben 'Peppe' Guldberg and Piet Delport

        HTH,
        Dave

        ****************************
        Oops. Here's the right one:

        :command -range=% Uniq <line1>,<line2>s/\v^(.*)(%<<line2>l\n\1)+$/\1/e

        It dodges the need for arithmetic by matching the \%<Nl before the \n.

        This approach still has a slight flaw, though. The end of the range is
        inserted once, as a literal line number, into the substitution command.
        The problem appears when enough lines at the start of the range are
        deleted that new duplicate lines get shifted into the range from the
        bottom, where they eventually get operated on as well.

        Another approach has just struck me, though:

        :command -range=% Uniq <line1>,<line2>g/^\%<<line2>l\(.*\)\n\1$/d

        This works deleting all lines are followed by a duplicate of themselves
        Because of the way :g works (matching/marking the lines in one pass,
        then deleting them in a second pass), the shifting problem is avoided;
        and it's shorter, too. :-)

        I now officially declare this the shortest Uniq implementation in Vim.

        *waits for the inevitable correction*

        --
        Piet Delport
        Today's subliminal thought is:
        ****************************


        -----Original Message-----
        From: jonah [mailto:jonahgoldstein@...]
        Sent: Monday, June 02, 2003 11:34 PM
        To: vim
        Subject: remove duplicate lines


        does anyone have a script that will
        remove all duplicate lines in a file?

        thanks,
        jonah
      • Piet Delport
        On Tue, 03 Jun 2003 at 09:28:46 -0700, jonah wrote: [snip Vim Uniq implementation] ... Quick copy&paste from a script i m (sporadically) working on: binary
        Message 3 of 15 , Jun 3, 2003
        • 0 Attachment
          On Tue, 03 Jun 2003 at 09:28:46 -0700, jonah wrote:
          [snip Vim Uniq implementation]
          >
          > That is very clever.
          >
          > Just out of curiosity, is there an equally clever solution for sorting
          > lines in vim (w/out using the external "sort" program)?

          Quick copy&paste from a script i'm (sporadically) working on:

          " binary insertion sort implementation, which works better in Vim than
          " quick sort
          function BinaryInsertSort(start, end)
          let i = a:start + 1
          while i <= a:end
          " find insertion point via binary search
          let i_val = getline(i)
          let lo = a:start
          let hi = i
          while lo < hi
          let mid = (lo + hi) / 2
          let mid_val = getline(mid)
          if i_val < mid_val
          let hi = mid
          else
          let lo = mid + 1
          if i_val == mid_val | break | endif
          endif
          endwhile
          " do insert
          if lo < i
          exec i.'d_'
          call append(lo - 1, i_val)
          endif
          let i = i + 1
          endwhile
          endfunction

          " function wrapper (for range semantics)
          function Sort() range
          call BinaryInsertSort(a:firstline, a:lastline)
          endfunction

          command -range=% Sort <line1>,<line2>call Sort()

          --
          Piet Delport
          Today's subliminal thought is:
        • Benji Fisher
          ... [snip] ... Personal preference: I do not like function wrappers. If you agree, there are a few ways to get around it. Here is one: function Sort(...)
          Message 4 of 15 , Jun 3, 2003
          • 0 Attachment
            Piet Delport wrote:
            >
            > Quick copy&paste from a script i'm (sporadically) working on:
            >
            > " binary insertion sort implementation, which works better in Vim than
            > " quick sort
            > function BinaryInsertSort(start, end)
            [snip]
            > endfunction
            >
            > " function wrapper (for range semantics)
            > function Sort() range
            > call BinaryInsertSort(a:firstline, a:lastline)
            > endfunction
            >
            > command -range=% Sort <line1>,<line2>call Sort()

            Personal preference: I do not like function wrappers. If you agree,
            there are a few ways to get around it. Here is one:

            function Sort(...) range
            let start = a:firstline
            let end = a:lastline
            if a:0 > 0
            let start = a:1
            if a:0 > 1
            let end = a:2
            endif
            endif
            ...
            endfunction

            --Benji Fisher
          • Piet Delport
            ... [snip] ... [snip] ... The problem with putting the range flag directly on the sort function is that it can make it very messy to pass the function any
            Message 5 of 15 , Jun 3, 2003
            • 0 Attachment
              On Tue, 03 Jun 2003 at 22:57:15 -0400, Benji Fisher wrote:
              > Piet Delport wrote:
              >>
              [snip]
              >>
              >> " function wrapper (for range semantics)
              >> function Sort() range
              >> call BinaryInsertSort(a:firstline, a:lastline)
              >> endfunction
              >>
              >> command -range=% Sort <line1>,<line2>call Sort()
              >
              > Personal preference: I do not like function wrappers. If you agree,
              > there are a few ways to get around it. Here is one:
              >
              > function Sort(...) range
              [snip]
              > endfunction

              The problem with putting the range flag directly on the sort function is
              that it can make it very messy to pass the function any range that can't
              be expressed in terms of Ex addresses. For example, instead of doing a
              recursive call like this:

              call QuickSort(a:start, hi)
              call QuickSort(lo, a:end)

              ...you're forced to incant something like:

              exec a:start.','.hi.'call QuickSort()'
              exec lo.','.a:end.'call QuickSort()'

              Readers who have been paying attention might point out that the function
              in question doesn't recurse, but the point is still valid if the
              function is ever to be called from other scripts, for example.

              --
              Piet Delport
              Today's subliminal thought is:
            • Benji Fisher
              ... I do not think so. I declared Sort() to have a variable number of arguments. Look again at the part you snipped: if you :call Sort(a:start, hi) then the
              Message 6 of 15 , Jun 4, 2003
              • 0 Attachment
                Piet Delport wrote:
                > On Tue, 03 Jun 2003 at 22:57:15 -0400, Benji Fisher wrote:
                >
                >>Piet Delport wrote:
                >>
                > [snip]
                >
                >>>" function wrapper (for range semantics)
                >>>function Sort() range
                >>> call BinaryInsertSort(a:firstline, a:lastline)
                >>>endfunction
                >>>
                >>>command -range=% Sort <line1>,<line2>call Sort()
                >>
                >> Personal preference: I do not like function wrappers. If you agree,
                >>there are a few ways to get around it. Here is one:
                >>
                >>function Sort(...) range
                >
                > [snip]
                >
                >>endfunction
                >
                >
                > The problem with putting the range flag directly on the sort function is
                > that it can make it very messy to pass the function any range that can't
                > be expressed in terms of Ex addresses. For example, instead of doing a
                > recursive call like this:
                >
                > call QuickSort(a:start, hi)
                > call QuickSort(lo, a:end)
                >
                > ...you're forced to incant something like:
                >
                > exec a:start.','.hi.'call QuickSort()'
                > exec lo.','.a:end.'call QuickSort()'

                I do not think so. I declared Sort() to have a variable number of
                arguments. Look again at the part you snipped: if you :call Sort(a:start, hi)
                then the arguments you supply are used and the range is ignored.

                --Benji Fisher
              • Piet Delport
                ... [snip--see below] ... Sorry; i merely glanced at the function body before snipping it. It s a pretty cool trick/technique. Still, ~8 lines per method is
                Message 7 of 15 , Jun 5, 2003
                • 0 Attachment
                  On Wed, 04 Jun 2003 at 08:37:05 -0400, Benji Fisher wrote:
                  > Piet Delport wrote:
                  >> On Tue, 03 Jun 2003 at 22:57:15 -0400, Benji Fisher wrote:
                  >>>
                  >>> Personal preference: I do not like function wrappers. If you agree,
                  >>> there are a few ways to get around it. Here is one:
                  [snip--see below]
                  >>
                  >> The problem with putting the range flag directly on the sort function is
                  >> that it can make it very messy to pass the function any range that can't
                  >> be expressed in terms of Ex addresses. For example, instead of doing a
                  >> recursive call like this:
                  >>
                  >> call QuickSort(a:start, hi)
                  >> call QuickSort(lo, a:end)
                  >>
                  >> ...you're forced to incant something like:
                  >>
                  >> exec a:start.','.hi.'call QuickSort()'
                  >> exec lo.','.a:end.'call QuickSort()'
                  >
                  > I do not think so. I declared Sort() to have a variable number of
                  > arguments. Look again at the part you snipped: if you :call Sort(a:start,
                  > hi) then the arguments you supply are used and the range is ignored.

                  On Tue, 03 Jun 2003 at 22:57:15 -0400, Benji Fisher wrote:
                  > function Sort(...) range
                  > let start = a:firstline
                  > let end = a:lastline
                  > if a:0 > 0
                  > let start = a:1
                  > if a:0 > 1
                  > let end = a:2
                  > endif
                  > endif
                  > ...
                  > endfunction

                  Sorry; i merely glanced at the function body before snipping it. It's a
                  pretty cool trick/technique.

                  Still, ~8 lines per method is quite a bit of boilerplate code. Maybe
                  something like:

                  function Sort(...) range
                  let start = a:0>1 ? a:1 : a:firstline
                  let end = a:0>1 ? a:2 : a:lastline
                  [...]
                  endfunction

                  ...is a bit easier to read and (arguably) understand? I also prefer the
                  test to be a:0 > 1 in both cases, so that you know the function is
                  either going to get its range from the arguments, or from the built-in
                  range, but never both.

                  --
                  Piet Delport
                  Today's subliminal thought is:
                • jonah
                  Just out of curiosity, Do you know how the speed of sorting in vim (using this Sort function) compares with the speed of the unix sort utility? Thanks, jonah
                  Message 8 of 15 , Jun 6, 2003
                  • 0 Attachment
                    Just out of curiosity,

                    Do you know how the speed of
                    sorting in vim (using this Sort function)
                    compares with the speed of the unix
                    sort utility?

                    Thanks,
                    jonah

                    -----Original Message-----
                    From: Piet Delport [mailto:pjd@...]
                    Sent: Thursday, June 05, 2003 4:53 AM
                    To: vim
                    Subject: Re: remove duplicate lines


                    On Wed, 04 Jun 2003 at 08:37:05 -0400, Benji Fisher wrote:
                    > Piet Delport wrote:
                    >> On Tue, 03 Jun 2003 at 22:57:15 -0400, Benji Fisher wrote:
                    >>>
                    >>> Personal preference: I do not like function wrappers. If you
                    agree,
                    >>> there are a few ways to get around it. Here is one:
                    [snip--see below]
                    >>
                    >> The problem with putting the range flag directly on the sort function is
                    >> that it can make it very messy to pass the function any range that can't
                    >> be expressed in terms of Ex addresses. For example, instead of doing a
                    >> recursive call like this:
                    >>
                    >> call QuickSort(a:start, hi)
                    >> call QuickSort(lo, a:end)
                    >>
                    >> ...you're forced to incant something like:
                    >>
                    >> exec a:start.','.hi.'call QuickSort()'
                    >> exec lo.','.a:end.'call QuickSort()'
                    >
                    > I do not think so. I declared Sort() to have a variable number of
                    > arguments. Look again at the part you snipped: if you :call
                    Sort(a:start,
                    > hi) then the arguments you supply are used and the range is ignored.

                    On Tue, 03 Jun 2003 at 22:57:15 -0400, Benji Fisher wrote:
                    > function Sort(...) range
                    > let start = a:firstline
                    > let end = a:lastline
                    > if a:0 > 0
                    > let start = a:1
                    > if a:0 > 1
                    > let end = a:2
                    > endif
                    > endif
                    > ...
                    > endfunction

                    Sorry; i merely glanced at the function body before snipping it. It's a
                    pretty cool trick/technique.

                    Still, ~8 lines per method is quite a bit of boilerplate code. Maybe
                    something like:

                    function Sort(...) range
                    let start = a:0>1 ? a:1 : a:firstline
                    let end = a:0>1 ? a:2 : a:lastline
                    [...]
                    endfunction

                    ...is a bit easier to read and (arguably) understand? I also prefer the
                    test to be a:0 > 1 in both cases, so that you know the function is
                    either going to get its range from the arguments, or from the built-in
                    range, but never both.

                    --
                    Piet Delport
                    Today's subliminal thought is:
                  • Piet Delport
                    ... In a word, it s pitiful. :-) This thread: http://marc.theaimsgroup.com/?t=105178088700001 goes into more detail, if you re interested, but remember that
                    Message 9 of 15 , Jun 11, 2003
                    • 0 Attachment
                      On Fri, 06 Jun 2003 at 09:30:16 -0700, jonah wrote:
                      >
                      > Just out of curiosity,
                      >
                      > Do you know how the speed of
                      > sorting in vim (using this Sort function)
                      > compares with the speed of the unix
                      > sort utility?

                      In a word, it's pitiful. :-) This thread:

                      http://marc.theaimsgroup.com/?t=105178088700001

                      goes into more detail, if you're interested, but remember that it's
                      hardly a fair comparison when you consider all the overhead and
                      abstraction involved in executing VimL and accessing a buffer. It's a
                      bit like cutting down a tree with a scalpel, instead of a chainsaw.

                      Still, it performs well enough that you shouldn't really notice a
                      difference when you're only sorting up to a few thousand lines or so.
                      And you also get host-OS-independence and the ability to use a custom
                      comparison function. But if you just want to plainly sort big, multi-
                      megabyte files, use sort(1).

                      HTH,

                      --
                      Piet Delport
                      Today's subliminal thought is:
                    Your message has been successfully submitted and would be delivered to recipients shortly.