Loading ...
Sorry, an error occurred while loading the content.

Find duplicated lines

Expand Messages
  • Alessandro Antonello
    Hi, all. I have a file with the following output: pass1 key: 9534 1CFF A92D 76B9 B52C 79E5 1D10 85E5 pass2 key: 6C66 D635 3922 1D99 6FCE 8366 7992 C3DE passN
    Message 1 of 7 , Mar 8, 2012
    • 0 Attachment
      Hi, all.

      I have a file with the following output:

      pass1 key: 9534 1CFF A92D 76B9 B52C 79E5 1D10 85E5
      pass2 key: 6C66 D635 3922 1D99 6FCE 8366 7992 C3DE
      passN key: F906 930C 2FD3 6B4B 7A2C 1AF5 C314 D62C

      There are several of that 3 lines. I could ':sort' the file to find duplicated
      lines but, what I really need to know is if there are binary data of 'pass1
      key' equal to 'pass2 key' or 'passN key'. I have 3 files with more than 8000
      lines each. So, visually do this is tedious and error prone. I need a little
      help, please.

      Alessandro

      --
      You received this message from the "vim_use" maillist.
      Do not top-post! Type your reply below the text you are replying to.
      For more information, visit http://www.vim.org/maillist.php
    • Tim Chase
      ... If you don t mind changing the file-order, I d sort it, then use ... /^.* (key: .* ) n.* 1$ It s a little easier to spot if you turn on search highlighting
      Message 2 of 7 , Mar 8, 2012
      • 0 Attachment
        On 03/08/12 06:45, Alessandro Antonello wrote:
        > pass1 key: 9534 1CFF A92D 76B9 B52C 79E5 1D10 85E5
        > pass2 key: 6C66 D635 3922 1D99 6FCE 8366 7992 C3DE
        > passN key: F906 930C 2FD3 6B4B 7A2C 1AF5 C314 D62C
        >
        > There are several of that 3 lines. I could ':sort' the file to find duplicated
        > lines but, what I really need to know is if there are binary data of 'pass1
        > key' equal to 'pass2 key' or 'passN key'. I have 3 files with more than 8000
        > lines each. So, visually do this is tedious and error prone. I need a little
        > help, please.

        If you don't mind changing the file-order, I'd sort it, then use
        a regexp to find the duplicate lines:

        :%sort /key: /
        /^.*\(key: .*\)\n.*\1$

        It's a little easier to spot if you turn on search highlighting

        :set hls

        which should then highlight them all.

        It's a lot uglier & slower if you want to leave the file unsorted
        because it has to check every line with every subsequent line.

        It might look something like

        /^.*\(key: .*\)\ze\n\%(.*\n\)*.*\1$

        It was fast on my dummy 5-line file using your data above
        (duplicating one line and changing the pass#, along with a blank
        line), but I suspect it would get progressively slower as your
        file grows.

        -tim


        --
        You received this message from the "vim_use" maillist.
        Do not top-post! Type your reply below the text you are replying to.
        For more information, visit http://www.vim.org/maillist.php
      • Christian Brabandt
        ... Put the following into ~/.vim/plugins/dupes.vim fu! s:Duplicates() let res={} for line in range(1,line( $ )) let key=matchstr(getline(line), ^[^:]*:
        Message 3 of 7 , Mar 8, 2012
        • 0 Attachment
          On Thu, March 8, 2012 13:45, Alessandro Antonello wrote:
          > Hi, all.
          >
          > I have a file with the following output:
          >
          > pass1 key: 9534 1CFF A92D 76B9 B52C 79E5 1D10 85E5
          > pass2 key: 6C66 D635 3922 1D99 6FCE 8366 7992 C3DE
          > passN key: F906 930C 2FD3 6B4B 7A2C 1AF5 C314 D62C
          >
          > There are several of that 3 lines. I could ':sort' the file to find
          > duplicated
          > lines but, what I really need to know is if there are binary data of
          > 'pass1
          > key' equal to 'pass2 key' or 'passN key'. I have 3 files with more than
          > 8000
          > lines each. So, visually do this is tedious and error prone. I need a
          > little
          > help, please.

          Put the following into ~/.vim/plugins/dupes.vim
          fu! s:Duplicates()
          let res={}
          for line in range(1,line('$'))
          let key=matchstr(getline(line), '^[^:]*: \zs.*$')
          let key = '\('.key.'\)'
          let res[key] = get(res, key) + 1
          endfor
          call filter(res, 'v:val > 1')
          call matchadd('TODO', join(keys(res), '\|'))
          endfu
          com! Dupes :call s:Duplicates()

          And then restart Vim and call :Dupes which should highlight
          all duplicates.

          regards,
          Christian

          --
          You received this message from the "vim_use" maillist.
          Do not top-post! Type your reply below the text you are replying to.
          For more information, visit http://www.vim.org/maillist.php
        • Reid Thompson
          ... $ cp -p passfile passfile.orig $ cat passfile # 3 repititions of your example lines above pass1 key: 9534 1CFF A92D 76B9 B52C 79E5 1D10 85E5 pass2 key:
          Message 4 of 7 , Mar 8, 2012
          • 0 Attachment
            On Thu, 2012-03-08 at 09:45 -0300, Alessandro Antonello wrote:
            > Hi, all.
            >
            > I have a file with the following output:
            >
            > pass1 key: 9534 1CFF A92D 76B9 B52C 79E5 1D10 85E5
            > pass2 key: 6C66 D635 3922 1D99 6FCE 8366 7992 C3DE
            > passN key: F906 930C 2FD3 6B4B 7A2C 1AF5 C314 D62C
            >
            > There are several of that 3 lines. I could ':sort' the file to find duplicated
            > lines but, what I really need to know is if there are binary data of 'pass1
            > key' equal to 'pass2 key' or 'passN key'. I have 3 files with more than 8000
            > lines each. So, visually do this is tedious and error prone. I need a little
            > help, please.
            >
            > Alessandro
            >

            $ cp -p passfile passfile.orig
            $ cat passfile # 3 repititions of your example lines above
            pass1 key: 9534 1CFF A92D 76B9 B52C 79E5 1D10 85E5
            pass2 key: 6C66 D635 3922 1D99 6FCE 8366 7992 C3DE
            pass3 key: F906 930C 2FD3 6B4B 7A2C 1AF5 C314 D62C
            pass4 key: 9534 1CFF A92D 76B9 B52C 79E5 1D10 85E5
            pass5 key: 6C66 D635 3922 1D99 6FCE 8366 7992 C3DE
            pass6 key: F906 930C 2FD3 6B4B 7A2C 1AF5 C314 D62C
            pass7 key: 9534 1CFF A92D 76B9 B52C 79E5 1D10 85E5
            pass8 key: 6C66 D635 3922 1D99 6FCE 8366 7992 C3DE
            pass9 key: F906 930C 2FD3 6B4B 7A2C 1AF5 C314 D62C
            rthompso@raker2>~
            $ sort -k2 -t: -b passfile | uniq -f2
            pass2 key: 6C66 D635 3922 1D99 6FCE 8366 7992 C3DE
            pass1 key: 9534 1CFF A92D 76B9 B52C 79E5 1D10 85E5
            pass3 key: F906 930C 2FD3 6B4B 7A2C 1AF5 C314 D62C
            [08:29:35] rthompso@raker2>~
            $ sort -k2 -t: -b passfile | uniq -f2 > passfile

            --
            You received this message from the "vim_use" maillist.
            Do not top-post! Type your reply below the text you are replying to.
            For more information, visit http://www.vim.org/maillist.php
          • Alessandro Antonello
            ... Hi, Christian. I tried it and it gave me the error E51 Too many ( . I have a lot of duplicated values. That s why the job is so error prone. I need to
            Message 5 of 7 , Mar 8, 2012
            • 0 Attachment
              2012/3/8 Christian Brabandt <cblists@...>:
              > On Thu, March 8, 2012 13:45, Alessandro Antonello wrote:
              >> Hi, all.
              >>
              >> I have a file with the following output:
              >>
              >> pass1 key: 9534 1CFF A92D 76B9 B52C 79E5 1D10 85E5
              >> pass2 key: 6C66 D635 3922 1D99 6FCE 8366 7992 C3DE
              >> passN key: F906 930C 2FD3 6B4B 7A2C 1AF5 C314 D62C
              >>
              >> There are several of that 3 lines. I could ':sort' the file to find
              >> duplicated
              >> lines but, what I really need to know is if there are binary data of
              >> 'pass1
              >> key' equal to 'pass2 key' or 'passN key'. I have 3 files with more than
              >> 8000
              >> lines each. So, visually do this is tedious and error prone. I need a
              >> little
              >> help, please.
              >
              > Put the following into ~/.vim/plugins/dupes.vim
              > fu! s:Duplicates()
              >    let res={}
              >    for line in range(1,line('$'))
              >        let key=matchstr(getline(line), '^[^:]*: \zs.*$')
              >        let key = '\('.key.'\)'
              >        let res[key] = get(res, key) + 1
              >    endfor
              >    call filter(res, 'v:val > 1')
              >    call matchadd('TODO', join(keys(res), '\|'))
              > endfu
              > com! Dupes :call s:Duplicates()
              >

              Hi, Christian.

              I tried it and it gave me the error "E51 Too many \(".
              I have a lot of duplicated values. That's why the job is so error
              prone. I need to know
              if I have a pass1 value equal to a pass2 or passN. Having several pass1 values
              duplicated is not an issue. Also, I can have pass2 values duplicated and passN
              values duplicated. But I cannot have a pass2 equal to a passN or pass1
              and vise-versa.

              Thanks a lot for your help.

              --
              You received this message from the "vim_use" maillist.
              Do not top-post! Type your reply below the text you are replying to.
              For more information, visit http://www.vim.org/maillist.php
            • Alessandro Antonello
              ... Hi, all. Thanks a lot for your help. I think I solved the problem, with your help, off course. Please, correct me if I m wrong. I used the Reid Thompson
              Message 6 of 7 , Mar 8, 2012
              • 0 Attachment
                2012/3/8 Tim Chase <vim@...>:
                > On 03/08/12 06:45, Alessandro Antonello wrote:
                >>
                >> pass1 key: 9534 1CFF A92D 76B9 B52C 79E5 1D10 85E5
                >> pass2 key: 6C66 D635 3922 1D99 6FCE 8366 7992 C3DE
                >> passN key: F906 930C 2FD3 6B4B 7A2C 1AF5 C314 D62C
                >>
                >> There are several of that 3 lines. I could ':sort' the file to find
                >> duplicated
                >> lines but, what I really need to know is if there are binary data of
                >> 'pass1
                >> key' equal to 'pass2 key' or 'passN key'. I have 3 files with more than
                >> 8000
                >> lines each. So, visually do this is tedious and error prone. I need a
                >> little
                >> help, please.
                >
                >
                > If you don't mind changing the file-order, I'd sort it, then use a regexp to
                > find the duplicate lines:
                >
                >  :%sort /key: /
                >  /^.*\(key: .*\)\n.*\1$
                >
                > It's a little easier to spot if you turn on search highlighting
                >
                >  :set hls
                >
                > which should then highlight them all.
                >
                > It's a lot uglier & slower if you want to leave the file unsorted because it
                > has to check every line with every subsequent line.
                >
                > It might look something like
                >
                >  /^.*\(key: .*\)\ze\n\%(.*\n\)*.*\1$
                >
                > It was fast on my dummy 5-line file using your data above (duplicating one
                > line and changing the pass#, along with a blank line), but I suspect it
                > would get progressively slower as your file grows.
                >
                > -tim
                >
                >

                Hi, all.

                Thanks a lot for your help. I think I solved the problem, with your help, off
                course. Please, correct me if I'm wrong.

                I used the Reid Thompson idea to sort and filter the file removing duplicated
                pass1, pass2 and passN values. The I used the Tim Chase search regex to find
                any remaining duplicates differentiated only by the "pass" type. The search
                found nothing! What is a good thing because I cannot have the same value for
                different passes.

                Thanks a lot!

                --
                You received this message from the "vim_use" maillist.
                Do not top-post! Type your reply below the text you are replying to.
                For more information, visit http://www.vim.org/maillist.php
              • Christian Brabandt
                Hi Alessandro! ... Hm, you could try to use %( instead of ( . But you seem to be done with the task anyway, so I won t update the script here now. ... Mit
                Message 7 of 7 , Mar 9, 2012
                • 0 Attachment
                  Hi Alessandro!

                  On Do, 08 Mär 2012, Alessandro Antonello wrote:

                  > 2012/3/8 Christian Brabandt <cblists@...>:
                  > > On Thu, March 8, 2012 13:45, Alessandro Antonello wrote:
                  > >> Hi, all.
                  > >>
                  > >> I have a file with the following output:
                  > >>
                  > >> pass1 key: 9534 1CFF A92D 76B9 B52C 79E5 1D10 85E5
                  > >> pass2 key: 6C66 D635 3922 1D99 6FCE 8366 7992 C3DE
                  > >> passN key: F906 930C 2FD3 6B4B 7A2C 1AF5 C314 D62C
                  > >>
                  > >> There are several of that 3 lines. I could ':sort' the file to find
                  > >> duplicated
                  > >> lines but, what I really need to know is if there are binary data of
                  > >> 'pass1
                  > >> key' equal to 'pass2 key' or 'passN key'. I have 3 files with more than
                  > >> 8000
                  > >> lines each. So, visually do this is tedious and error prone. I need a
                  > >> little
                  > >> help, please.
                  > >
                  > > Put the following into ~/.vim/plugins/dupes.vim
                  > > fu! s:Duplicates()
                  > >    let res={}
                  > >    for line in range(1,line('$'))
                  > >        let key=matchstr(getline(line), '^[^:]*: \zs.*$')
                  > >        let key = '\('.key.'\)'
                  > >        let res[key] = get(res, key) + 1
                  > >    endfor
                  > >    call filter(res, 'v:val > 1')
                  > >    call matchadd('TODO', join(keys(res), '\|'))
                  > > endfu
                  > > com! Dupes :call s:Duplicates()
                  > >
                  >
                  > Hi, Christian.
                  >
                  > I tried it and it gave me the error "E51 Too many \(".
                  > I have a lot of duplicated values. That's why the job is so error
                  > prone.

                  Hm, you could try to use '\%(' instead of '\('. But you seem to be done
                  with the task anyway, so I won't update the script here now.

                  > I need to know
                  > if I have a pass1 value equal to a pass2 or passN. Having several pass1 values
                  > duplicated is not an issue. Also, I can have pass2 values duplicated and passN
                  > values duplicated. But I cannot have a pass2 equal to a passN or pass1
                  > and vise-versa.

                  Mit freundlichen Grüßen
                  Christian
                  --
                  Der Krämer, der etwas abwiegt, schafft so gut die unbekannten Größen
                  auf die eine Seite und die bekannten auf die andere als der
                  Algebraist.
                  -- Georg Christoph Lichtenberg

                  --
                  You received this message from the "vim_use" maillist.
                  Do not top-post! Type your reply below the text you are replying to.
                  For more information, visit http://www.vim.org/maillist.php
                Your message has been successfully submitted and would be delivered to recipients shortly.