Loading ...
Sorry, an error occurred while loading the content.

Re: How to check if I have lines with different length in a database extract

Expand Messages
  • Tim Chase
    ... Well, a couple ideas stand out to me. Depending on your file size (in lines), it could be as simple as ... and then scrolling down, watching the right
    Message 1 of 9 , Nov 2, 2005
    • 0 Attachment
      > Each type of file has some fixed length of a line. Before we can pass
      > on this file for further processing I would like to check that all
      > lines are of equal length. Is there a simple way /pattern by which I
      > can identify lines with differing length.

      Well, a couple ideas stand out to me. Depending on your file size (in
      lines), it could be as simple as

      :set list

      and then scrolling down, watching the right margin to see if any of the
      "$" characters dance out of position.

      If, however, you've got a large file (or more lines than you reasonably
      care to scroll through), you can use something like

      :v/\%40c.$/#

      which will return a list of each of the lines that *don't* have 40
      characters in them, along with their line numbers. Your desire would be
      to get back the "error"

      Pattern found on every line

      However, if there are lines that don't have 40 characters, it will
      return them along with their line number. If you want to make changes
      to each line, just type the line number followed by "G" and you'll jump
      to the line in question.

      If you simply want to filter these errant lines out of your file by
      deleting them completely, you can simply change the "#" to a "d" in the
      above command, such as

      :v/\%40c.$/d

      and it will delete any of the problematic lines.

      If order doesn't matter, you can take a pre-processing pass and move
      them all to the bottom of the file with

      :v/\%40c.$/m$

      where you can edit them all or deal with them accordingly.

      Hope this gives you something to work with.

      -tim
    • Jürgen Krämer
      Hi, ... the following mapping will search for the next line that has a different length than the current line: nnoremap d
      Message 2 of 9 , Nov 2, 2005
      • 0 Attachment
        Hi,

        sanjeev.g.sapre@... wrote:
        >
        > I have some data extraction program which creates comma separated flat
        > files.
        >
        > Each type of file has some fixed length of a line. Before we can pass on
        > this file for further processing I would like to check that all lines are
        > of equal length. Is there a simple way /pattern by which I can identify
        > lines with differing length.

        the following mapping will search for the next line that has a different
        length than the current line:

        nnoremap \d /^\(.\{0,<c-r>=strlen(getline('.'))-1<cr>\}\\|.\{<c-r>=strlen(getline('.'))+1<cr>,\}\)$<cr>

        Regards,
        Jürgen

        --
        Jürgen Krämer Softwareentwicklung
        HABEL GmbH & Co. KG mailto:jkr@...
        Hinteres Öschle 2 Tel: +49 / 74 61 / 93 53 - 15
        78604 Rietheim-Weilheim Fax: +49 / 74 61 / 93 53 - 99
      • John Love-Jensen
        Hi Sanjeev, This is what I did... ... Maybe that would work for you. Note: somewhat destructive. Save the file before doing this, and restore it afterwards.
        Message 3 of 9 , Nov 2, 2005
        • 0 Attachment
          Hi Sanjeev,

          This is what I did...

          :%s/././g
          :%!sort | uniq -c

          Maybe that would work for you.

          Note: somewhat destructive. Save the file before doing this, and restore
          it afterwards.

          HTH,
          --Eljay
        • sanjeev.g.sapre@Cummins.com
          Thanks Tim.. That was very useful. That s what exactly I was looking for. Regards Sanjeev Holset, Huddersfield Direct: +44 - 01484 440 365 Tim Chase
          Message 4 of 9 , Nov 2, 2005
          • 0 Attachment
            Thanks Tim..

            That was very useful.
            That's what exactly I was looking for.


            Regards
            Sanjeev
            Holset, Huddersfield
            Direct: +44 - 01484 440 365




            Tim Chase <vim@...>
            02/11/2005 12:23


            To: sanjeev.g.sapre@...
            cc: vim@...
            Subject: Re: How to check if I have lines with different length in a database
            extract


            > Each type of file has some fixed length of a line. Before we can pass
            > on this file for further processing I would like to check that all
            > lines are of equal length. Is there a simple way /pattern by which I
            > can identify lines with differing length.

            Well, a couple ideas stand out to me. Depending on your file size (in
            lines), it could be as simple as

            :set list

            and then scrolling down, watching the right margin to see if any of the
            "$" characters dance out of position.

            If, however, you've got a large file (or more lines than you reasonably
            care to scroll through), you can use something like

            :v/\%40c.$/#

            which will return a list of each of the lines that *don't* have 40
            characters in them, along with their line numbers. Your desire would be
            to get back the "error"

            Pattern found on every line

            However, if there are lines that don't have 40 characters, it will
            return them along with their line number. If you want to make changes
            to each line, just type the line number followed by "G" and you'll jump
            to the line in question.

            If you simply want to filter these errant lines out of your file by
            deleting them completely, you can simply change the "#" to a "d" in the
            above command, such as

            :v/\%40c.$/d

            and it will delete any of the problematic lines.

            If order doesn't matter, you can take a pre-processing pass and move
            them all to the bottom of the file with

            :v/\%40c.$/m$

            where you can edit them all or deal with them accordingly.

            Hope this gives you something to work with.

            -tim






            ______________________________________________________________________
            This email has been scanned by the MessageLabs Email Security System.
            For more information please visit http://www.messagelabs.com/email
            ______________________________________________________________________
          • Bertilo Wennergren
            ... That actually seems to count bytes, not characters. I tried it using UTF-8, and my two-byte characters counted as two, at least sometimes. The results were
            Message 5 of 9 , Nov 2, 2005
            • 0 Attachment
              On 11/2/05, Tim Chase <vim@...> wrote:

              > If, however, you've got a large file (or more lines than you
              > reasonably care to scroll through), you can use something like

              > :v/\%40c.$/#
              >
              > which will return a list of each of the lines that *don't* have 40
              > characters in them,

              That actually seems to count bytes, not characters. I tried it using
              UTF-8, and my two-byte characters counted as two, at least sometimes.
              The results were not consistent!

              E.g.:

              oooo
              oooö
              oooo

              :v/\%4c.$/#
              Pattern found in every line: \%4c.$

              But:

              oooo
              ooöo
              oooo

              :v/\%4c.$/#
              2 ooöo

              Have I stumbled on a bug? This was in Vim 6.4 in Linux (Kubuntu).

              --
              Bertilo Wennergren <http://bertilow.com>
            • Tim Chase
              ... How strange. Bug? perhaps, or fixable with some option-obscura. I can t be of much help here as I don t use UTF-8 or multi-byte character sets/encodings
              Message 6 of 9 , Nov 2, 2005
              • 0 Attachment
                > That actually seems to count bytes, not characters. I
                > tried it using UTF-8, and my two-byte characters counted
                > as two, at least sometimes. The results were not
                > consistent!
                >
                > E.g.:
                >
                > oooo
                > oooö
                > oooo
                >
                > :v/\%4c.$/#
                > Pattern found in every line: \%4c.$
                >
                > But:
                >
                > oooo
                > ooöo
                > oooo
                >
                > :v/\%4c.$/#
                > 2 ooöo
                >
                > Have I stumbled on a bug? This was in Vim 6.4 in Linux
                > (Kubuntu).

                How strange. Bug? perhaps, or fixable with some
                option-obscura. I can't be of much help here as I don't use
                UTF-8 or multi-byte character sets/encodings for
                anything...other than occasionally trying out some of the
                crazy-good ideas by folks like Tony on the list who are much
                more well-versed in the ins and outs of this dark corner of
                Vim.

                -tim
              • James Vega
                ... Use %4v instead of %4c. ... James -- GPG Key: 1024D/61326D40 2003-09-02 James Vega
                Message 7 of 9 , Nov 2, 2005
                • 0 Attachment
                  On Wed, Nov 02, 2005 at 10:36:33PM +0900, Bertilo Wennergren wrote:
                  > On 11/2/05, Tim Chase <vim@...> wrote:
                  >
                  > > If, however, you've got a large file (or more lines than you
                  > > reasonably care to scroll through), you can use something like
                  >
                  > > :v/\%40c.$/#
                  > >
                  > > which will return a list of each of the lines that *don't* have 40
                  > > characters in them,
                  >
                  > That actually seems to count bytes, not characters. I tried it using
                  > UTF-8, and my two-byte characters counted as two, at least sometimes.
                  > The results were not consistent!
                  >
                  > E.g.:
                  >
                  > oooo
                  > oooö
                  > oooo
                  >
                  > :v/\%4c.$/#
                  > Pattern found in every line: \%4c.$
                  >
                  > But:
                  >
                  > oooo
                  > ooöo
                  > oooo
                  >
                  > :v/\%4c.$/#
                  > 2 ooöo
                  >
                  > Have I stumbled on a bug? This was in Vim 6.4 in Linux (Kubuntu).

                  Use \%4v instead of \%4c.

                  :he /\%c
                  :he /\%v

                  James
                  --
                  GPG Key: 1024D/61326D40 2003-09-02 James Vega <jamessan@...>
                • Charles E. Campbell, Jr.
                  ... will display all lines that don t have 74 characters in them. I just picked 74 out of the air, of course -- adjust it to whatever you need. Regards, Chip
                  Message 8 of 9 , Nov 2, 2005
                  • 0 Attachment
                    sanjeev.g.sapre@... wrote:

                    >...Each type of file has some fixed length of a line. Before we can pass on
                    >this file for further processing I would like to check that all lines are
                    >of equal length. Is there a simple way /pattern by which I can identify
                    >lines with differing length.
                    >
                    >
                    :v/^.*\%74c.$/p

                    will display all lines that don't have 74 characters in them. I just
                    picked 74 out of the air, of course -- adjust it to whatever you need.

                    Regards,
                    Chip Campbell
                  Your message has been successfully submitted and would be delivered to recipients shortly.