Loading ...
Sorry, an error occurred while loading the content.

Re: How to check if I have lines with different length in a database extract

Expand Messages
  • Jürgen Krämer
    Hi, ... the following mapping will search for the next line that has a different length than the current line: nnoremap d
    Message 1 of 9 , Nov 2, 2005
    • 0 Attachment
      Hi,

      sanjeev.g.sapre@... wrote:
      >
      > I have some data extraction program which creates comma separated flat
      > files.
      >
      > Each type of file has some fixed length of a line. Before we can pass on
      > this file for further processing I would like to check that all lines are
      > of equal length. Is there a simple way /pattern by which I can identify
      > lines with differing length.

      the following mapping will search for the next line that has a different
      length than the current line:

      nnoremap \d /^\(.\{0,<c-r>=strlen(getline('.'))-1<cr>\}\\|.\{<c-r>=strlen(getline('.'))+1<cr>,\}\)$<cr>

      Regards,
      Jürgen

      --
      Jürgen Krämer Softwareentwicklung
      HABEL GmbH & Co. KG mailto:jkr@...
      Hinteres Öschle 2 Tel: +49 / 74 61 / 93 53 - 15
      78604 Rietheim-Weilheim Fax: +49 / 74 61 / 93 53 - 99
    • John Love-Jensen
      Hi Sanjeev, This is what I did... ... Maybe that would work for you. Note: somewhat destructive. Save the file before doing this, and restore it afterwards.
      Message 2 of 9 , Nov 2, 2005
      • 0 Attachment
        Hi Sanjeev,

        This is what I did...

        :%s/././g
        :%!sort | uniq -c

        Maybe that would work for you.

        Note: somewhat destructive. Save the file before doing this, and restore
        it afterwards.

        HTH,
        --Eljay
      • sanjeev.g.sapre@Cummins.com
        Thanks Tim.. That was very useful. That s what exactly I was looking for. Regards Sanjeev Holset, Huddersfield Direct: +44 - 01484 440 365 Tim Chase
        Message 3 of 9 , Nov 2, 2005
        • 0 Attachment
          Thanks Tim..

          That was very useful.
          That's what exactly I was looking for.


          Regards
          Sanjeev
          Holset, Huddersfield
          Direct: +44 - 01484 440 365




          Tim Chase <vim@...>
          02/11/2005 12:23


          To: sanjeev.g.sapre@...
          cc: vim@...
          Subject: Re: How to check if I have lines with different length in a database
          extract


          > Each type of file has some fixed length of a line. Before we can pass
          > on this file for further processing I would like to check that all
          > lines are of equal length. Is there a simple way /pattern by which I
          > can identify lines with differing length.

          Well, a couple ideas stand out to me. Depending on your file size (in
          lines), it could be as simple as

          :set list

          and then scrolling down, watching the right margin to see if any of the
          "$" characters dance out of position.

          If, however, you've got a large file (or more lines than you reasonably
          care to scroll through), you can use something like

          :v/\%40c.$/#

          which will return a list of each of the lines that *don't* have 40
          characters in them, along with their line numbers. Your desire would be
          to get back the "error"

          Pattern found on every line

          However, if there are lines that don't have 40 characters, it will
          return them along with their line number. If you want to make changes
          to each line, just type the line number followed by "G" and you'll jump
          to the line in question.

          If you simply want to filter these errant lines out of your file by
          deleting them completely, you can simply change the "#" to a "d" in the
          above command, such as

          :v/\%40c.$/d

          and it will delete any of the problematic lines.

          If order doesn't matter, you can take a pre-processing pass and move
          them all to the bottom of the file with

          :v/\%40c.$/m$

          where you can edit them all or deal with them accordingly.

          Hope this gives you something to work with.

          -tim






          ______________________________________________________________________
          This email has been scanned by the MessageLabs Email Security System.
          For more information please visit http://www.messagelabs.com/email
          ______________________________________________________________________
        • Bertilo Wennergren
          ... That actually seems to count bytes, not characters. I tried it using UTF-8, and my two-byte characters counted as two, at least sometimes. The results were
          Message 4 of 9 , Nov 2, 2005
          • 0 Attachment
            On 11/2/05, Tim Chase <vim@...> wrote:

            > If, however, you've got a large file (or more lines than you
            > reasonably care to scroll through), you can use something like

            > :v/\%40c.$/#
            >
            > which will return a list of each of the lines that *don't* have 40
            > characters in them,

            That actually seems to count bytes, not characters. I tried it using
            UTF-8, and my two-byte characters counted as two, at least sometimes.
            The results were not consistent!

            E.g.:

            oooo
            oooö
            oooo

            :v/\%4c.$/#
            Pattern found in every line: \%4c.$

            But:

            oooo
            ooöo
            oooo

            :v/\%4c.$/#
            2 ooöo

            Have I stumbled on a bug? This was in Vim 6.4 in Linux (Kubuntu).

            --
            Bertilo Wennergren <http://bertilow.com>
          • Tim Chase
            ... How strange. Bug? perhaps, or fixable with some option-obscura. I can t be of much help here as I don t use UTF-8 or multi-byte character sets/encodings
            Message 5 of 9 , Nov 2, 2005
            • 0 Attachment
              > That actually seems to count bytes, not characters. I
              > tried it using UTF-8, and my two-byte characters counted
              > as two, at least sometimes. The results were not
              > consistent!
              >
              > E.g.:
              >
              > oooo
              > oooö
              > oooo
              >
              > :v/\%4c.$/#
              > Pattern found in every line: \%4c.$
              >
              > But:
              >
              > oooo
              > ooöo
              > oooo
              >
              > :v/\%4c.$/#
              > 2 ooöo
              >
              > Have I stumbled on a bug? This was in Vim 6.4 in Linux
              > (Kubuntu).

              How strange. Bug? perhaps, or fixable with some
              option-obscura. I can't be of much help here as I don't use
              UTF-8 or multi-byte character sets/encodings for
              anything...other than occasionally trying out some of the
              crazy-good ideas by folks like Tony on the list who are much
              more well-versed in the ins and outs of this dark corner of
              Vim.

              -tim
            • James Vega
              ... Use %4v instead of %4c. ... James -- GPG Key: 1024D/61326D40 2003-09-02 James Vega
              Message 6 of 9 , Nov 2, 2005
              • 0 Attachment
                On Wed, Nov 02, 2005 at 10:36:33PM +0900, Bertilo Wennergren wrote:
                > On 11/2/05, Tim Chase <vim@...> wrote:
                >
                > > If, however, you've got a large file (or more lines than you
                > > reasonably care to scroll through), you can use something like
                >
                > > :v/\%40c.$/#
                > >
                > > which will return a list of each of the lines that *don't* have 40
                > > characters in them,
                >
                > That actually seems to count bytes, not characters. I tried it using
                > UTF-8, and my two-byte characters counted as two, at least sometimes.
                > The results were not consistent!
                >
                > E.g.:
                >
                > oooo
                > oooö
                > oooo
                >
                > :v/\%4c.$/#
                > Pattern found in every line: \%4c.$
                >
                > But:
                >
                > oooo
                > ooöo
                > oooo
                >
                > :v/\%4c.$/#
                > 2 ooöo
                >
                > Have I stumbled on a bug? This was in Vim 6.4 in Linux (Kubuntu).

                Use \%4v instead of \%4c.

                :he /\%c
                :he /\%v

                James
                --
                GPG Key: 1024D/61326D40 2003-09-02 James Vega <jamessan@...>
              • Charles E. Campbell, Jr.
                ... will display all lines that don t have 74 characters in them. I just picked 74 out of the air, of course -- adjust it to whatever you need. Regards, Chip
                Message 7 of 9 , Nov 2, 2005
                • 0 Attachment
                  sanjeev.g.sapre@... wrote:

                  >...Each type of file has some fixed length of a line. Before we can pass on
                  >this file for further processing I would like to check that all lines are
                  >of equal length. Is there a simple way /pattern by which I can identify
                  >lines with differing length.
                  >
                  >
                  :v/^.*\%74c.$/p

                  will display all lines that don't have 74 characters in them. I just
                  picked 74 out of the air, of course -- adjust it to whatever you need.

                  Regards,
                  Chip Campbell
                Your message has been successfully submitted and would be delivered to recipients shortly.