Loading ...
Sorry, an error occurred while loading the content.

Re: How to check if I have lines with different length in a database extract

Expand Messages
  • sanjeev.g.sapre@Cummins.com
    Thanks Tim.. That was very useful. That s what exactly I was looking for. Regards Sanjeev Holset, Huddersfield Direct: +44 - 01484 440 365 Tim Chase
    Message 1 of 9 , Nov 2, 2005
    • 0 Attachment
      Thanks Tim..

      That was very useful.
      That's what exactly I was looking for.


      Regards
      Sanjeev
      Holset, Huddersfield
      Direct: +44 - 01484 440 365




      Tim Chase <vim@...>
      02/11/2005 12:23


      To: sanjeev.g.sapre@...
      cc: vim@...
      Subject: Re: How to check if I have lines with different length in a database
      extract


      > Each type of file has some fixed length of a line. Before we can pass
      > on this file for further processing I would like to check that all
      > lines are of equal length. Is there a simple way /pattern by which I
      > can identify lines with differing length.

      Well, a couple ideas stand out to me. Depending on your file size (in
      lines), it could be as simple as

      :set list

      and then scrolling down, watching the right margin to see if any of the
      "$" characters dance out of position.

      If, however, you've got a large file (or more lines than you reasonably
      care to scroll through), you can use something like

      :v/\%40c.$/#

      which will return a list of each of the lines that *don't* have 40
      characters in them, along with their line numbers. Your desire would be
      to get back the "error"

      Pattern found on every line

      However, if there are lines that don't have 40 characters, it will
      return them along with their line number. If you want to make changes
      to each line, just type the line number followed by "G" and you'll jump
      to the line in question.

      If you simply want to filter these errant lines out of your file by
      deleting them completely, you can simply change the "#" to a "d" in the
      above command, such as

      :v/\%40c.$/d

      and it will delete any of the problematic lines.

      If order doesn't matter, you can take a pre-processing pass and move
      them all to the bottom of the file with

      :v/\%40c.$/m$

      where you can edit them all or deal with them accordingly.

      Hope this gives you something to work with.

      -tim






      ______________________________________________________________________
      This email has been scanned by the MessageLabs Email Security System.
      For more information please visit http://www.messagelabs.com/email
      ______________________________________________________________________
    • Bertilo Wennergren
      ... That actually seems to count bytes, not characters. I tried it using UTF-8, and my two-byte characters counted as two, at least sometimes. The results were
      Message 2 of 9 , Nov 2, 2005
      • 0 Attachment
        On 11/2/05, Tim Chase <vim@...> wrote:

        > If, however, you've got a large file (or more lines than you
        > reasonably care to scroll through), you can use something like

        > :v/\%40c.$/#
        >
        > which will return a list of each of the lines that *don't* have 40
        > characters in them,

        That actually seems to count bytes, not characters. I tried it using
        UTF-8, and my two-byte characters counted as two, at least sometimes.
        The results were not consistent!

        E.g.:

        oooo
        oooö
        oooo

        :v/\%4c.$/#
        Pattern found in every line: \%4c.$

        But:

        oooo
        ooöo
        oooo

        :v/\%4c.$/#
        2 ooöo

        Have I stumbled on a bug? This was in Vim 6.4 in Linux (Kubuntu).

        --
        Bertilo Wennergren <http://bertilow.com>
      • Tim Chase
        ... How strange. Bug? perhaps, or fixable with some option-obscura. I can t be of much help here as I don t use UTF-8 or multi-byte character sets/encodings
        Message 3 of 9 , Nov 2, 2005
        • 0 Attachment
          > That actually seems to count bytes, not characters. I
          > tried it using UTF-8, and my two-byte characters counted
          > as two, at least sometimes. The results were not
          > consistent!
          >
          > E.g.:
          >
          > oooo
          > oooö
          > oooo
          >
          > :v/\%4c.$/#
          > Pattern found in every line: \%4c.$
          >
          > But:
          >
          > oooo
          > ooöo
          > oooo
          >
          > :v/\%4c.$/#
          > 2 ooöo
          >
          > Have I stumbled on a bug? This was in Vim 6.4 in Linux
          > (Kubuntu).

          How strange. Bug? perhaps, or fixable with some
          option-obscura. I can't be of much help here as I don't use
          UTF-8 or multi-byte character sets/encodings for
          anything...other than occasionally trying out some of the
          crazy-good ideas by folks like Tony on the list who are much
          more well-versed in the ins and outs of this dark corner of
          Vim.

          -tim
        • James Vega
          ... Use %4v instead of %4c. ... James -- GPG Key: 1024D/61326D40 2003-09-02 James Vega
          Message 4 of 9 , Nov 2, 2005
          • 0 Attachment
            On Wed, Nov 02, 2005 at 10:36:33PM +0900, Bertilo Wennergren wrote:
            > On 11/2/05, Tim Chase <vim@...> wrote:
            >
            > > If, however, you've got a large file (or more lines than you
            > > reasonably care to scroll through), you can use something like
            >
            > > :v/\%40c.$/#
            > >
            > > which will return a list of each of the lines that *don't* have 40
            > > characters in them,
            >
            > That actually seems to count bytes, not characters. I tried it using
            > UTF-8, and my two-byte characters counted as two, at least sometimes.
            > The results were not consistent!
            >
            > E.g.:
            >
            > oooo
            > oooö
            > oooo
            >
            > :v/\%4c.$/#
            > Pattern found in every line: \%4c.$
            >
            > But:
            >
            > oooo
            > ooöo
            > oooo
            >
            > :v/\%4c.$/#
            > 2 ooöo
            >
            > Have I stumbled on a bug? This was in Vim 6.4 in Linux (Kubuntu).

            Use \%4v instead of \%4c.

            :he /\%c
            :he /\%v

            James
            --
            GPG Key: 1024D/61326D40 2003-09-02 James Vega <jamessan@...>
          • Charles E. Campbell, Jr.
            ... will display all lines that don t have 74 characters in them. I just picked 74 out of the air, of course -- adjust it to whatever you need. Regards, Chip
            Message 5 of 9 , Nov 2, 2005
            • 0 Attachment
              sanjeev.g.sapre@... wrote:

              >...Each type of file has some fixed length of a line. Before we can pass on
              >this file for further processing I would like to check that all lines are
              >of equal length. Is there a simple way /pattern by which I can identify
              >lines with differing length.
              >
              >
              :v/^.*\%74c.$/p

              will display all lines that don't have 74 characters in them. I just
              picked 74 out of the air, of course -- adjust it to whatever you need.

              Regards,
              Chip Campbell
            Your message has been successfully submitted and would be delivered to recipients shortly.