Loading ...
Sorry, an error occurred while loading the content.

How to check if I have lines with different length in a database extract

Expand Messages
  • sanjeev.g.sapre@Cummins.com
    List, I have some data extraction program which creates comma separated flat files. Each type of file has some fixed length of a line. Before we can pass on
    Message 1 of 9 , Nov 2, 2005
    • 0 Attachment
      List,

      I have some data extraction program which creates comma separated flat
      files.

      Each type of file has some fixed length of a line. Before we can pass on
      this file for further processing I would like to check that all lines are
      of equal length. Is there a simple way /pattern by which I can identify
      lines with differing length.

      Thanks in advance.

      Regards
      Sanjeev
      Holset, Huddersfield
      Direct: +44 - 01484 440 365
    • Tim Chase
      ... Well, a couple ideas stand out to me. Depending on your file size (in lines), it could be as simple as ... and then scrolling down, watching the right
      Message 2 of 9 , Nov 2, 2005
      • 0 Attachment
        > Each type of file has some fixed length of a line. Before we can pass
        > on this file for further processing I would like to check that all
        > lines are of equal length. Is there a simple way /pattern by which I
        > can identify lines with differing length.

        Well, a couple ideas stand out to me. Depending on your file size (in
        lines), it could be as simple as

        :set list

        and then scrolling down, watching the right margin to see if any of the
        "$" characters dance out of position.

        If, however, you've got a large file (or more lines than you reasonably
        care to scroll through), you can use something like

        :v/\%40c.$/#

        which will return a list of each of the lines that *don't* have 40
        characters in them, along with their line numbers. Your desire would be
        to get back the "error"

        Pattern found on every line

        However, if there are lines that don't have 40 characters, it will
        return them along with their line number. If you want to make changes
        to each line, just type the line number followed by "G" and you'll jump
        to the line in question.

        If you simply want to filter these errant lines out of your file by
        deleting them completely, you can simply change the "#" to a "d" in the
        above command, such as

        :v/\%40c.$/d

        and it will delete any of the problematic lines.

        If order doesn't matter, you can take a pre-processing pass and move
        them all to the bottom of the file with

        :v/\%40c.$/m$

        where you can edit them all or deal with them accordingly.

        Hope this gives you something to work with.

        -tim
      • Jürgen Krämer
        Hi, ... the following mapping will search for the next line that has a different length than the current line: nnoremap d
        Message 3 of 9 , Nov 2, 2005
        • 0 Attachment
          Hi,

          sanjeev.g.sapre@... wrote:
          >
          > I have some data extraction program which creates comma separated flat
          > files.
          >
          > Each type of file has some fixed length of a line. Before we can pass on
          > this file for further processing I would like to check that all lines are
          > of equal length. Is there a simple way /pattern by which I can identify
          > lines with differing length.

          the following mapping will search for the next line that has a different
          length than the current line:

          nnoremap \d /^\(.\{0,<c-r>=strlen(getline('.'))-1<cr>\}\\|.\{<c-r>=strlen(getline('.'))+1<cr>,\}\)$<cr>

          Regards,
          Jürgen

          --
          Jürgen Krämer Softwareentwicklung
          HABEL GmbH & Co. KG mailto:jkr@...
          Hinteres Öschle 2 Tel: +49 / 74 61 / 93 53 - 15
          78604 Rietheim-Weilheim Fax: +49 / 74 61 / 93 53 - 99
        • John Love-Jensen
          Hi Sanjeev, This is what I did... ... Maybe that would work for you. Note: somewhat destructive. Save the file before doing this, and restore it afterwards.
          Message 4 of 9 , Nov 2, 2005
          • 0 Attachment
            Hi Sanjeev,

            This is what I did...

            :%s/././g
            :%!sort | uniq -c

            Maybe that would work for you.

            Note: somewhat destructive. Save the file before doing this, and restore
            it afterwards.

            HTH,
            --Eljay
          • sanjeev.g.sapre@Cummins.com
            Thanks Tim.. That was very useful. That s what exactly I was looking for. Regards Sanjeev Holset, Huddersfield Direct: +44 - 01484 440 365 Tim Chase
            Message 5 of 9 , Nov 2, 2005
            • 0 Attachment
              Thanks Tim..

              That was very useful.
              That's what exactly I was looking for.


              Regards
              Sanjeev
              Holset, Huddersfield
              Direct: +44 - 01484 440 365




              Tim Chase <vim@...>
              02/11/2005 12:23


              To: sanjeev.g.sapre@...
              cc: vim@...
              Subject: Re: How to check if I have lines with different length in a database
              extract


              > Each type of file has some fixed length of a line. Before we can pass
              > on this file for further processing I would like to check that all
              > lines are of equal length. Is there a simple way /pattern by which I
              > can identify lines with differing length.

              Well, a couple ideas stand out to me. Depending on your file size (in
              lines), it could be as simple as

              :set list

              and then scrolling down, watching the right margin to see if any of the
              "$" characters dance out of position.

              If, however, you've got a large file (or more lines than you reasonably
              care to scroll through), you can use something like

              :v/\%40c.$/#

              which will return a list of each of the lines that *don't* have 40
              characters in them, along with their line numbers. Your desire would be
              to get back the "error"

              Pattern found on every line

              However, if there are lines that don't have 40 characters, it will
              return them along with their line number. If you want to make changes
              to each line, just type the line number followed by "G" and you'll jump
              to the line in question.

              If you simply want to filter these errant lines out of your file by
              deleting them completely, you can simply change the "#" to a "d" in the
              above command, such as

              :v/\%40c.$/d

              and it will delete any of the problematic lines.

              If order doesn't matter, you can take a pre-processing pass and move
              them all to the bottom of the file with

              :v/\%40c.$/m$

              where you can edit them all or deal with them accordingly.

              Hope this gives you something to work with.

              -tim






              ______________________________________________________________________
              This email has been scanned by the MessageLabs Email Security System.
              For more information please visit http://www.messagelabs.com/email
              ______________________________________________________________________
            • Bertilo Wennergren
              ... That actually seems to count bytes, not characters. I tried it using UTF-8, and my two-byte characters counted as two, at least sometimes. The results were
              Message 6 of 9 , Nov 2, 2005
              • 0 Attachment
                On 11/2/05, Tim Chase <vim@...> wrote:

                > If, however, you've got a large file (or more lines than you
                > reasonably care to scroll through), you can use something like

                > :v/\%40c.$/#
                >
                > which will return a list of each of the lines that *don't* have 40
                > characters in them,

                That actually seems to count bytes, not characters. I tried it using
                UTF-8, and my two-byte characters counted as two, at least sometimes.
                The results were not consistent!

                E.g.:

                oooo
                oooö
                oooo

                :v/\%4c.$/#
                Pattern found in every line: \%4c.$

                But:

                oooo
                ooöo
                oooo

                :v/\%4c.$/#
                2 ooöo

                Have I stumbled on a bug? This was in Vim 6.4 in Linux (Kubuntu).

                --
                Bertilo Wennergren <http://bertilow.com>
              • Tim Chase
                ... How strange. Bug? perhaps, or fixable with some option-obscura. I can t be of much help here as I don t use UTF-8 or multi-byte character sets/encodings
                Message 7 of 9 , Nov 2, 2005
                • 0 Attachment
                  > That actually seems to count bytes, not characters. I
                  > tried it using UTF-8, and my two-byte characters counted
                  > as two, at least sometimes. The results were not
                  > consistent!
                  >
                  > E.g.:
                  >
                  > oooo
                  > oooö
                  > oooo
                  >
                  > :v/\%4c.$/#
                  > Pattern found in every line: \%4c.$
                  >
                  > But:
                  >
                  > oooo
                  > ooöo
                  > oooo
                  >
                  > :v/\%4c.$/#
                  > 2 ooöo
                  >
                  > Have I stumbled on a bug? This was in Vim 6.4 in Linux
                  > (Kubuntu).

                  How strange. Bug? perhaps, or fixable with some
                  option-obscura. I can't be of much help here as I don't use
                  UTF-8 or multi-byte character sets/encodings for
                  anything...other than occasionally trying out some of the
                  crazy-good ideas by folks like Tony on the list who are much
                  more well-versed in the ins and outs of this dark corner of
                  Vim.

                  -tim
                • James Vega
                  ... Use %4v instead of %4c. ... James -- GPG Key: 1024D/61326D40 2003-09-02 James Vega
                  Message 8 of 9 , Nov 2, 2005
                  • 0 Attachment
                    On Wed, Nov 02, 2005 at 10:36:33PM +0900, Bertilo Wennergren wrote:
                    > On 11/2/05, Tim Chase <vim@...> wrote:
                    >
                    > > If, however, you've got a large file (or more lines than you
                    > > reasonably care to scroll through), you can use something like
                    >
                    > > :v/\%40c.$/#
                    > >
                    > > which will return a list of each of the lines that *don't* have 40
                    > > characters in them,
                    >
                    > That actually seems to count bytes, not characters. I tried it using
                    > UTF-8, and my two-byte characters counted as two, at least sometimes.
                    > The results were not consistent!
                    >
                    > E.g.:
                    >
                    > oooo
                    > oooö
                    > oooo
                    >
                    > :v/\%4c.$/#
                    > Pattern found in every line: \%4c.$
                    >
                    > But:
                    >
                    > oooo
                    > ooöo
                    > oooo
                    >
                    > :v/\%4c.$/#
                    > 2 ooöo
                    >
                    > Have I stumbled on a bug? This was in Vim 6.4 in Linux (Kubuntu).

                    Use \%4v instead of \%4c.

                    :he /\%c
                    :he /\%v

                    James
                    --
                    GPG Key: 1024D/61326D40 2003-09-02 James Vega <jamessan@...>
                  • Charles E. Campbell, Jr.
                    ... will display all lines that don t have 74 characters in them. I just picked 74 out of the air, of course -- adjust it to whatever you need. Regards, Chip
                    Message 9 of 9 , Nov 2, 2005
                    • 0 Attachment
                      sanjeev.g.sapre@... wrote:

                      >...Each type of file has some fixed length of a line. Before we can pass on
                      >this file for further processing I would like to check that all lines are
                      >of equal length. Is there a simple way /pattern by which I can identify
                      >lines with differing length.
                      >
                      >
                      :v/^.*\%74c.$/p

                      will display all lines that don't have 74 characters in them. I just
                      picked 74 out of the air, of course -- adjust it to whatever you need.

                      Regards,
                      Chip Campbell
                    Your message has been successfully submitted and would be delivered to recipients shortly.