Loading ...
Sorry, an error occurred while loading the content.

Re: improving the :join command

Expand Messages
  • Christian Brabandt
    Hi Gene! ... There might be usecases. Data is growing rapidly today, and I myself had to manage automatically generated text-files of several hundred MB of
    Message 1 of 10 , Sep 1, 2009
    • 0 Attachment
      Hi Gene!

      On Di, 01 Sep 2009, Gene Kwiecinski wrote:

      > No disrespect intended, but *why* in B'Harni's Dark Name would you
      > want to join >10000 lines into 1?!?

      There might be usecases. Data is growing rapidly today, and I myself
      had to manage automatically generated text-files of several hundred MB
      of size. Plus there have occasionally been questions on this list
      regarding joining lines.

      Well just one simple test:

      #v+
      ~$ for i in 1 2 4 8 16 32 64 128; do
      seq 1 $(($i*1000)) > tempfile
      echo "joining $i kilo lines"
      time vim -u NONE -N -c ':%join|:q!' tempfile;
      done
      #v-

      and compare the timings yourself. Doesn't this look like a bug to you?

      > Any 'vi' variant is a *line*-based editor, which presumed a modest
      > line-size for each. Juggling lines back and forth is easy, but heaving
      > huge MB-sized chunks o' text is just obscene. Add to that syntax-based
      > highlighting, multiple colors, etc., and all the processing required for
      > just *1* line adds exponentially to the amount of work involved, let
      > alone cursor motions, etc.

      Well Vim is an editor. Shouldn't it be able to join properly millions
      of lines, even if that sounds strange? The power of vim comes from
      the fact, that you can do many different manipulations very
      efficiently and does not limit you.

      Plus :h limits does not talk about joining only a couple of lines ;)

      > Dunno, but to me, that seems like using a text editor to edit a .jpg or
      > .gif or something, ie, not the right tool for the job, even if, through
      > herculean contortions and torturing the editor's functionality, it *can*
      > be done.

      Exactly. It can. And it might be done by someone.

      > I'd, if anything, edit the file as needed, save it, then use 'sed',
      > 'tr', etc., to post-process it accordingly. No overhead for syntax,
      > colorschemes, etc. Ie, use the right tool for the job.

      Yeah, but sed, tr, awk, perl, $language is not always available. And
      Vim should be able to do it right.

      What was the reason again to add :vimgrep to vim when grep is
      available?

      regards,
      Christian

      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_use" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---
    • Gene Kwiecinski
      ... Even so, something which I can understand, eg, a logfile, should be delineated by linebreaks. Raw xml/sgml/etc., should be edited as a sequence of
      Message 2 of 10 , Sep 1, 2009
      • 0 Attachment
        >>No disrespect intended, but *why* in B'Harni's Dark Name would you
        >>want to join >10000 lines into 1?!?

        >There might be usecases. Data is growing rapidly today, and I myself
        >had to manage automatically generated text-files of several hundred MB
        >of size. Plus there have occasionally been questions on this list
        >regarding joining lines.

        Even so, something which I can understand, eg, a logfile, should be
        delineated by linebreaks. Raw xml/sgml/etc., should be edited as a
        sequence of modestly-sized lines, then if necessary, joined to a single
        line after saving (or before saving, if you have the time :D ).


        >Well just one simple test:

        >#v+
        >~$ for i in 1 2 4 8 16 32 64 128; do
        > seq 1 $(($i*1000)) >tempfile
        > echo "joining $i kilo lines"
        > time vim -u NONE -N -c ':%join|:q!' tempfile;
        >done
        >#v-

        >and compare the timings yourself. Doesn't this look like a bug to you?

        I have no idea, as I didn't run it yet. Offhand, an exponential
        increase wouldn't be out of the question, ie, e^n.

        Don't forget *physical* limits such as available memory. Once you bang
        your head on that memory-ceiling and start having to swap to disk, all
        bets are off, and processing time can increase by order*s* of magnitude,
        depending how bad it is. Hell, I run into that in *perl*, let alone
        'gvim', when intentionally joining huge files to a single line to c&p
        whole sections of the file! And I'm not even dealing with syntax
        highlighting, colorschemes, and the like.


        >>Any 'vi' variant is a *line*-based editor, which presumed a modest
        >>line-size for each. Juggling lines back and forth is easy, but
        heaving
        >>huge MB-sized chunks o' text is just obscene. Add to that
        syntax-based
        >>highlighting, multiple colors, etc., and all the processing required
        for
        >>just *1* line adds exponentially to the amount of work involved, let
        >>alone cursor motions, etc.

        >Well Vim is an editor. Shouldn't it be able to join properly millions
        >of lines, even if that sounds strange? The power of vim comes from

        Sure, it should be able to be pushed to its limits and do so, but not
        necessarily *efficiently*. Ie, it may hit that aforementioned ceiling
        and then start hitting the disk to do so, and pretty much require you to
        leave it running overnight to go and join a brazillion lines into 1
        Uberline. That's not necessarily a "bug", just an unexpected excursion
        of its performance envelope. The fact that it can create a huge
        Uberline without *crashing* is a testament to the robustness of the
        code. An old version of 'vi' I had would vomit on lines >300chars or
        so.

        Point being, *line*-editors are meant to be used with *lines*, and lines
        of a modest size. The fact that it *can* handle Uberlines is great, but
        you can't expect it to be handled "efficiently". The kind of advice I
        might give would be along the lines of the guy who sees his doctor:

        guy: "Doc, it hurts when I do this."
        doc: "So don't do that."


        >the fact, that you can do many different manipulations very
        >efficiently and does not limit you.

        Absolutely, but again, recall that it's intended to be a *line*-editor.
        Not to appear facetious in repeating that again, but that's what
        'vim'/'gvim' happens to be, a *line*-editor. You yank and put *lines*.
        You add *lines*. You delete *lines*. Hell, syntax highlighting becomes
        downright painful for overly-long lines that people wrote add-ons to
        stop highlighting after N columns! That should be Clue #1 that
        overly-long lines are not "natural" to a *line*-editor.


        >Plus :h limits does not talk about joining only a couple of lines ;)

        Of course not. I can 'ls' a list of filenames into a file, do a ':%j'
        to get them into a single line, then prepend a command to run (with the
        filelist as the list of files to operate on) and make an instant
        batchfile. Works great. But there's a huge difference between a
        batch-/shell-command that's 1000chars long, and a 1-line file with a
        100Mchar Uberline.


        >>Dunno, but to me, that seems like using a text editor to edit a .jpg
        or
        >>.gif or something, ie, not the right tool for the job, even if,
        through
        >>herculean contortions and torturing the editor's functionality, it
        *can*
        >>be done.

        >Exactly. It can. And it might be done by someone.

        And if he has the luxury of letting it run overnight, great. :D


        >>I'd, if anything, edit the file as needed, save it, then use 'sed',
        >>'tr', etc., to post-process it accordingly. No overhead for syntax,
        >>colorschemes, etc. Ie, use the right tool for the job.

        >Yeah, but sed, tr, awk, perl, $language is not always available. And
        >Vim should be able to do it right.

        >What was the reason again to add :vimgrep to vim when grep is
        >available?

        I have no idea, as I don't recall ever using it. <shrug/>


        To reiterate, I *don't* want to appear to be argumentative, but I'm just
        saying that handing Uberlines is something that's *possible* in
        'vim'/'gvim', but don't expect it to be handled "efficiently", not if
        it's well outside the usual performance envelope of file-editing.

        --~--~---------~--~----~------------~-------~--~----~
        You received this message from the "vim_use" maillist.
        For more information, visit http://www.vim.org/maillist.php
        -~----------~----~----~----~------~----~------~--~---
      • Christian Brabandt
        Hi Gene! ... Well, then don t use it. Nobody forces you to use the plugin. But someone might think it s useful. regards, Christian
        Message 3 of 10 , Sep 1, 2009
        • 0 Attachment
          Hi Gene!

          On Di, 01 Sep 2009, Gene Kwiecinski wrote:

          > To reiterate, I *don't* want to appear to be argumentative, but I'm just
          > saying that handing Uberlines is something that's *possible* in
          > 'vim'/'gvim', but don't expect it to be handled "efficiently", not if
          > it's well outside the usual performance envelope of file-editing.

          Well, then don't use it. Nobody forces you to use the plugin. But
          someone might think it's useful.

          regards,
          Christian

          --~--~---------~--~----~------------~-------~--~----~
          You received this message from the "vim_use" maillist.
          For more information, visit http://www.vim.org/maillist.php
          -~----------~----~----~----~------~----~------~--~---
        • sssstefan@gmail.com
          ... On the other hand, if the OP s plugin is actually faster than Vim native , it is probably an indication that the native implementation could easily (?) be
          Message 4 of 10 , Sep 4, 2009
          • 0 Attachment
            On 2009-09-01 16:31 -0400, Gene Kwiecinski wrote:

            > /.../ I'm just saying that handing Uberlines is something that's
            > *possible* in 'vim'/'gvim', but don't expect it to be handled
            > "efficiently", not if it's well outside the usual performance envelope
            > of file-editing.

            On the other hand, if the OP's plugin is actually faster than Vim
            "native", it is probably an indication that the native implementation
            could easily (?) be improved.

            --
            Stefan


            --~--~---------~--~----~------------~-------~--~----~
            You received this message from the "vim_use" maillist.
            For more information, visit http://www.vim.org/maillist.php
            -~----------~----~----~----~------~----~------~--~---
          • Christian Brabandt
            Hi sssstefan! ... It is. Try it out. Or look at the timings given at the scripts plugin page/help or look at the timings given at:
            Message 5 of 10 , Sep 4, 2009
            • 0 Attachment
              Hi sssstefan!

              On Fr, 04 Sep 2009, sssstefan@... wrote:

              > On the other hand, if the OP's plugin is actually faster than Vim
              > "native", it is probably an indication that the native implementation
              > could easily (?) be improved.

              It is. Try it out. Or look at the timings given at the scripts plugin
              page/help or look at the timings given at:
              http://article.gmane.org/gmane.editors.vim/80315 (the plugin uses a
              similar, though slightly more complex algorithm (to make it really
              like :join))

              regards,
              Christian
              --
              No children may attend school with their breath smelling of "wild onions."
              [real standing law in West Virginia, United States of America]

              --~--~---------~--~----~------------~-------~--~----~
              You received this message from the "vim_use" maillist.
              For more information, visit http://www.vim.org/maillist.php
              -~----------~----~----~----~------~----~------~--~---
            • Tony Mechelynck
              On 01/09/09 22:31, Gene Kwiecinski wrote: [...] ... Not exactly an add-on, it s an option, viz., synmaxcol , rather recent option at that: it was new in
              Message 6 of 10 , Sep 9, 2009
              • 0 Attachment
                On 01/09/09 22:31, Gene Kwiecinski wrote:
                [...]
                > Absolutely, but again, recall that it's intended to be a *line*-editor.
                > Not to appear facetious in repeating that again, but that's what
                > 'vim'/'gvim' happens to be, a *line*-editor. You yank and put *lines*.
                > You add *lines*. You delete *lines*. Hell, syntax highlighting becomes
                > downright painful for overly-long lines that people wrote add-ons to
                > stop highlighting after N columns! That should be Clue #1 that
                > overly-long lines are not "natural" to a *line*-editor.

                Not exactly an add-on, it's an option, viz., 'synmaxcol', rather recent
                option at that: it was new in Release 7.0. At first I thought it was a
                bug ("Hey! I don't get any highlighting past the n-th chracter on the
                line!") (and I see the present default is 3000, but ISTR it was
                originally 1000). Bram had to point me to the new option.

                [...]
                >> What was the reason again to add :vimgrep to vim when grep is
                >> available?

                The ":helpgrep" command appeared first (at patchlevel 6.1.423), to help
                us find our lost needles in the huge haystack of help text. Once that
                was there, adding a general-purpose internal grep (in version 7.0)
                wasn't much additional work, and it was a help for platforms like
                Windows, where the OS doesn't install an external grep -- sure, there
                are GnuWin32, unxutils, MinGW, and the like, but you have to fetch one
                of these grep versions yourself if you want to have one. With
                ":vimgrep", no need to check whether or not an external grep is
                available; and, I repeat, from ":helpgrep" to ":vimgrep" it wasn't a big
                step.

                Another advantage is that ":vimgrep" is guaranteed to use exactly the
                same regular expressions as are used everywhere else in Vim, while egrep
                may use something just subtly different, and certainly not documented in
                the Vim help.

                >
                > I have no idea, as I don't recall ever using it.<shrug/>

                I did, about as soon as it appeared, with some "snapshot" of Vim 7.0aa
                alpha.

                >
                >
                > To reiterate, I *don't* want to appear to be argumentative, but I'm just
                > saying that handing Uberlines is something that's *possible* in
                > 'vim'/'gvim', but don't expect it to be handled "efficiently", not if
                > it's well outside the usual performance envelope of file-editing.

                I agree with that. Searching a file tens of megabytes long for a
                not-too-complex regexp already takes some measurable time (maybe tens of
                seconds); if you want to join a gigabazillion lines into one, Vim can do
                it -- but expect it to take hours or even days rather than seconds, and
                at, oh, let's say 99.5% CPU time. It's not even a hang -- it's just that
                you gave it an enormous task to which it is ill-suited, and it's
                uncompainingly and patiently grinding at it; when it's finished, it will
                wait for the next command -- if by then you haven't lost patience and
                interruped it or killed it. To efficiently join all lines of a big file,
                use some program such as tr, which looks at the file characterwise
                rather than linewise and makes a single pass through it. You can even
                use it from within Vim, of course (see ":help filter").


                Best regards,
                Tony.
                --
                There once was a Scot named McAmeter
                With a tool of prodigious diameter.
                It was not the size
                That caused such surprise;
                'Twas his rhythm -- iambic pentameter.

                --~--~---------~--~----~------------~-------~--~----~
                You received this message from the "vim_use" maillist.
                For more information, visit http://www.vim.org/maillist.php
                -~----------~----~----~----~------~----~------~--~---
              • Christian Brabandt
                Hi Tony! ... Tony, please read the documentation provided with the plugin or at the plugin page and try out the test case I have been given. And then tell me
                Message 7 of 10 , Sep 14, 2009
                • 0 Attachment
                  Hi Tony!

                  On Mi, 09 Sep 2009, Tony Mechelynck wrote:


                  > tens of seconds); if you want to join a gigabazillion lines into
                  > one, Vim can do it -- but expect it to take hours or even days
                  > rather than seconds, and at, oh, let's say 99.5% CPU time. It's not
                  > even a hang -- it's just that you gave it an enormous task to which
                  > it is ill-suited, and it's uncompainingly and patiently grinding at
                  > it; when it's finished, it will wait for the next command -- if by
                  > then you haven't lost patience and interruped it or killed it. To

                  Tony, please read the documentation provided with the plugin or at the
                  plugin page and try out the test case I have been given. And then tell
                  me again, that this does not sound like a bug.

                  Just the fact, that it is possible to improve the join command using vim
                  scripting language gives strong evidence, that it is a bug indeed.

                  > efficiently join all lines of a big file, use some program such as
                  > tr, which looks at the file characterwise rather than linewise and
                  > makes a single pass through it. You can even use it from within Vim,
                  > of course (see ":help filter").

                  see above and please read the provided documentation.

                  regards,
                  Christian
                  --
                  :wq!

                  --~--~---------~--~----~------------~-------~--~----~
                  You received this message from the "vim_use" maillist.
                  For more information, visit http://www.vim.org/maillist.php
                  -~----------~----~----~----~------~----~------~--~---
                • bill lam
                  ... IMO it is not a bug but that it does not scale up. Just like compute 1+1+1+1 as ((1+1)+1)+1 that each operation (+) is independent of other (+) so that it
                  Message 8 of 10 , Sep 14, 2009
                  • 0 Attachment
                    On Mon, 14 Sep 2009, Christian Brabandt wrote:
                    >
                    > Hi Tony!
                    >
                    > On Mi, 09 Sep 2009, Tony Mechelynck wrote:
                    >
                    >
                    > > tens of seconds); if you want to join a gigabazillion lines into
                    > > one, Vim can do it -- but expect it to take hours or even days
                    > > rather than seconds, and at, oh, let's say 99.5% CPU time. It's not
                    > > even a hang -- it's just that you gave it an enormous task to which
                    > > it is ill-suited, and it's uncompainingly and patiently grinding at
                    > > it; when it's finished, it will wait for the next command -- if by
                    > > then you haven't lost patience and interruped it or killed it. To
                    >
                    > Tony, please read the documentation provided with the plugin or at the
                    > plugin page and try out the test case I have been given. And then tell
                    > me again, that this does not sound like a bug.
                    >
                    > Just the fact, that it is possible to improve the join command using vim
                    > scripting language gives strong evidence, that it is a bug indeed.

                    IMO it is not a bug but that it does not scale up. Just like compute
                    1+1+1+1 as ((1+1)+1)+1
                    that each operation (+) is independent of other (+) so that it needs a
                    quadratic time to do the calculation. It would be a trade off between
                    time and space, IMO to allocate 50% more space than what is needed
                    so that some of these (:join) can be done in-place should improve
                    performance. (untest)

                    --
                    regards,
                    ====================================================
                    GPG key 1024D/4434BAB3 2008-08-24
                    gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3

                    --~--~---------~--~----~------------~-------~--~----~
                    You received this message from the "vim_use" maillist.
                    For more information, visit http://www.vim.org/maillist.php
                    -~----------~----~----~----~------~----~------~--~---
                  Your message has been successfully submitted and would be delivered to recipients shortly.