Loading ...
Sorry, an error occurred while loading the content.

RE: improving the :join command

Expand Messages
  • Gene Kwiecinski
    ... No disrespect intended, but *why* in B Harni s Dark Name would you want to join 10000 lines into 1?!? Any vi variant is a *line*-based editor, which
    Message 1 of 10 , Sep 1, 2009
    • 0 Attachment
      >as you may know, :join suffers from a serious performance issue, when
      >joining many lines ( in the order of 10 thousands or more lines).

      No disrespect intended, but *why* in B'Harni's Dark Name would you want
      to join >10000 lines into 1?!?

      Any 'vi' variant is a *line*-based editor, which presumed a modest
      line-size for each. Juggling lines back and forth is easy, but heaving
      huge MB-sized chunks o' text is just obscene. Add to that syntax-based
      highlighting, multiple colors, etc., and all the processing required for
      just *1* line adds exponentially to the amount of work involved, let
      alone cursor motions, etc.

      Dunno, but to me, that seems like using a text editor to edit a .jpg or
      .gif or something, ie, not the right tool for the job, even if, through
      herculean contortions and torturing the editor's functionality, it *can*
      be done.

      I'd, if anything, edit the file as needed, save it, then use 'sed',
      'tr', etc., to post-process it accordingly. No overhead for syntax,
      colorschemes, etc. Ie, use the right tool for the job.

      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_use" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---
    • Christian Brabandt
      Hi Gene! ... There might be usecases. Data is growing rapidly today, and I myself had to manage automatically generated text-files of several hundred MB of
      Message 2 of 10 , Sep 1, 2009
      • 0 Attachment
        Hi Gene!

        On Di, 01 Sep 2009, Gene Kwiecinski wrote:

        > No disrespect intended, but *why* in B'Harni's Dark Name would you
        > want to join >10000 lines into 1?!?

        There might be usecases. Data is growing rapidly today, and I myself
        had to manage automatically generated text-files of several hundred MB
        of size. Plus there have occasionally been questions on this list
        regarding joining lines.

        Well just one simple test:

        #v+
        ~$ for i in 1 2 4 8 16 32 64 128; do
        seq 1 $(($i*1000)) > tempfile
        echo "joining $i kilo lines"
        time vim -u NONE -N -c ':%join|:q!' tempfile;
        done
        #v-

        and compare the timings yourself. Doesn't this look like a bug to you?

        > Any 'vi' variant is a *line*-based editor, which presumed a modest
        > line-size for each. Juggling lines back and forth is easy, but heaving
        > huge MB-sized chunks o' text is just obscene. Add to that syntax-based
        > highlighting, multiple colors, etc., and all the processing required for
        > just *1* line adds exponentially to the amount of work involved, let
        > alone cursor motions, etc.

        Well Vim is an editor. Shouldn't it be able to join properly millions
        of lines, even if that sounds strange? The power of vim comes from
        the fact, that you can do many different manipulations very
        efficiently and does not limit you.

        Plus :h limits does not talk about joining only a couple of lines ;)

        > Dunno, but to me, that seems like using a text editor to edit a .jpg or
        > .gif or something, ie, not the right tool for the job, even if, through
        > herculean contortions and torturing the editor's functionality, it *can*
        > be done.

        Exactly. It can. And it might be done by someone.

        > I'd, if anything, edit the file as needed, save it, then use 'sed',
        > 'tr', etc., to post-process it accordingly. No overhead for syntax,
        > colorschemes, etc. Ie, use the right tool for the job.

        Yeah, but sed, tr, awk, perl, $language is not always available. And
        Vim should be able to do it right.

        What was the reason again to add :vimgrep to vim when grep is
        available?

        regards,
        Christian

        --~--~---------~--~----~------------~-------~--~----~
        You received this message from the "vim_use" maillist.
        For more information, visit http://www.vim.org/maillist.php
        -~----------~----~----~----~------~----~------~--~---
      • Gene Kwiecinski
        ... Even so, something which I can understand, eg, a logfile, should be delineated by linebreaks. Raw xml/sgml/etc., should be edited as a sequence of
        Message 3 of 10 , Sep 1, 2009
        • 0 Attachment
          >>No disrespect intended, but *why* in B'Harni's Dark Name would you
          >>want to join >10000 lines into 1?!?

          >There might be usecases. Data is growing rapidly today, and I myself
          >had to manage automatically generated text-files of several hundred MB
          >of size. Plus there have occasionally been questions on this list
          >regarding joining lines.

          Even so, something which I can understand, eg, a logfile, should be
          delineated by linebreaks. Raw xml/sgml/etc., should be edited as a
          sequence of modestly-sized lines, then if necessary, joined to a single
          line after saving (or before saving, if you have the time :D ).


          >Well just one simple test:

          >#v+
          >~$ for i in 1 2 4 8 16 32 64 128; do
          > seq 1 $(($i*1000)) >tempfile
          > echo "joining $i kilo lines"
          > time vim -u NONE -N -c ':%join|:q!' tempfile;
          >done
          >#v-

          >and compare the timings yourself. Doesn't this look like a bug to you?

          I have no idea, as I didn't run it yet. Offhand, an exponential
          increase wouldn't be out of the question, ie, e^n.

          Don't forget *physical* limits such as available memory. Once you bang
          your head on that memory-ceiling and start having to swap to disk, all
          bets are off, and processing time can increase by order*s* of magnitude,
          depending how bad it is. Hell, I run into that in *perl*, let alone
          'gvim', when intentionally joining huge files to a single line to c&p
          whole sections of the file! And I'm not even dealing with syntax
          highlighting, colorschemes, and the like.


          >>Any 'vi' variant is a *line*-based editor, which presumed a modest
          >>line-size for each. Juggling lines back and forth is easy, but
          heaving
          >>huge MB-sized chunks o' text is just obscene. Add to that
          syntax-based
          >>highlighting, multiple colors, etc., and all the processing required
          for
          >>just *1* line adds exponentially to the amount of work involved, let
          >>alone cursor motions, etc.

          >Well Vim is an editor. Shouldn't it be able to join properly millions
          >of lines, even if that sounds strange? The power of vim comes from

          Sure, it should be able to be pushed to its limits and do so, but not
          necessarily *efficiently*. Ie, it may hit that aforementioned ceiling
          and then start hitting the disk to do so, and pretty much require you to
          leave it running overnight to go and join a brazillion lines into 1
          Uberline. That's not necessarily a "bug", just an unexpected excursion
          of its performance envelope. The fact that it can create a huge
          Uberline without *crashing* is a testament to the robustness of the
          code. An old version of 'vi' I had would vomit on lines >300chars or
          so.

          Point being, *line*-editors are meant to be used with *lines*, and lines
          of a modest size. The fact that it *can* handle Uberlines is great, but
          you can't expect it to be handled "efficiently". The kind of advice I
          might give would be along the lines of the guy who sees his doctor:

          guy: "Doc, it hurts when I do this."
          doc: "So don't do that."


          >the fact, that you can do many different manipulations very
          >efficiently and does not limit you.

          Absolutely, but again, recall that it's intended to be a *line*-editor.
          Not to appear facetious in repeating that again, but that's what
          'vim'/'gvim' happens to be, a *line*-editor. You yank and put *lines*.
          You add *lines*. You delete *lines*. Hell, syntax highlighting becomes
          downright painful for overly-long lines that people wrote add-ons to
          stop highlighting after N columns! That should be Clue #1 that
          overly-long lines are not "natural" to a *line*-editor.


          >Plus :h limits does not talk about joining only a couple of lines ;)

          Of course not. I can 'ls' a list of filenames into a file, do a ':%j'
          to get them into a single line, then prepend a command to run (with the
          filelist as the list of files to operate on) and make an instant
          batchfile. Works great. But there's a huge difference between a
          batch-/shell-command that's 1000chars long, and a 1-line file with a
          100Mchar Uberline.


          >>Dunno, but to me, that seems like using a text editor to edit a .jpg
          or
          >>.gif or something, ie, not the right tool for the job, even if,
          through
          >>herculean contortions and torturing the editor's functionality, it
          *can*
          >>be done.

          >Exactly. It can. And it might be done by someone.

          And if he has the luxury of letting it run overnight, great. :D


          >>I'd, if anything, edit the file as needed, save it, then use 'sed',
          >>'tr', etc., to post-process it accordingly. No overhead for syntax,
          >>colorschemes, etc. Ie, use the right tool for the job.

          >Yeah, but sed, tr, awk, perl, $language is not always available. And
          >Vim should be able to do it right.

          >What was the reason again to add :vimgrep to vim when grep is
          >available?

          I have no idea, as I don't recall ever using it. <shrug/>


          To reiterate, I *don't* want to appear to be argumentative, but I'm just
          saying that handing Uberlines is something that's *possible* in
          'vim'/'gvim', but don't expect it to be handled "efficiently", not if
          it's well outside the usual performance envelope of file-editing.

          --~--~---------~--~----~------------~-------~--~----~
          You received this message from the "vim_use" maillist.
          For more information, visit http://www.vim.org/maillist.php
          -~----------~----~----~----~------~----~------~--~---
        • Christian Brabandt
          Hi Gene! ... Well, then don t use it. Nobody forces you to use the plugin. But someone might think it s useful. regards, Christian
          Message 4 of 10 , Sep 1, 2009
          • 0 Attachment
            Hi Gene!

            On Di, 01 Sep 2009, Gene Kwiecinski wrote:

            > To reiterate, I *don't* want to appear to be argumentative, but I'm just
            > saying that handing Uberlines is something that's *possible* in
            > 'vim'/'gvim', but don't expect it to be handled "efficiently", not if
            > it's well outside the usual performance envelope of file-editing.

            Well, then don't use it. Nobody forces you to use the plugin. But
            someone might think it's useful.

            regards,
            Christian

            --~--~---------~--~----~------------~-------~--~----~
            You received this message from the "vim_use" maillist.
            For more information, visit http://www.vim.org/maillist.php
            -~----------~----~----~----~------~----~------~--~---
          • sssstefan@gmail.com
            ... On the other hand, if the OP s plugin is actually faster than Vim native , it is probably an indication that the native implementation could easily (?) be
            Message 5 of 10 , Sep 4, 2009
            • 0 Attachment
              On 2009-09-01 16:31 -0400, Gene Kwiecinski wrote:

              > /.../ I'm just saying that handing Uberlines is something that's
              > *possible* in 'vim'/'gvim', but don't expect it to be handled
              > "efficiently", not if it's well outside the usual performance envelope
              > of file-editing.

              On the other hand, if the OP's plugin is actually faster than Vim
              "native", it is probably an indication that the native implementation
              could easily (?) be improved.

              --
              Stefan


              --~--~---------~--~----~------------~-------~--~----~
              You received this message from the "vim_use" maillist.
              For more information, visit http://www.vim.org/maillist.php
              -~----------~----~----~----~------~----~------~--~---
            • Christian Brabandt
              Hi sssstefan! ... It is. Try it out. Or look at the timings given at the scripts plugin page/help or look at the timings given at:
              Message 6 of 10 , Sep 4, 2009
              • 0 Attachment
                Hi sssstefan!

                On Fr, 04 Sep 2009, sssstefan@... wrote:

                > On the other hand, if the OP's plugin is actually faster than Vim
                > "native", it is probably an indication that the native implementation
                > could easily (?) be improved.

                It is. Try it out. Or look at the timings given at the scripts plugin
                page/help or look at the timings given at:
                http://article.gmane.org/gmane.editors.vim/80315 (the plugin uses a
                similar, though slightly more complex algorithm (to make it really
                like :join))

                regards,
                Christian
                --
                No children may attend school with their breath smelling of "wild onions."
                [real standing law in West Virginia, United States of America]

                --~--~---------~--~----~------------~-------~--~----~
                You received this message from the "vim_use" maillist.
                For more information, visit http://www.vim.org/maillist.php
                -~----------~----~----~----~------~----~------~--~---
              • Tony Mechelynck
                On 01/09/09 22:31, Gene Kwiecinski wrote: [...] ... Not exactly an add-on, it s an option, viz., synmaxcol , rather recent option at that: it was new in
                Message 7 of 10 , Sep 9, 2009
                • 0 Attachment
                  On 01/09/09 22:31, Gene Kwiecinski wrote:
                  [...]
                  > Absolutely, but again, recall that it's intended to be a *line*-editor.
                  > Not to appear facetious in repeating that again, but that's what
                  > 'vim'/'gvim' happens to be, a *line*-editor. You yank and put *lines*.
                  > You add *lines*. You delete *lines*. Hell, syntax highlighting becomes
                  > downright painful for overly-long lines that people wrote add-ons to
                  > stop highlighting after N columns! That should be Clue #1 that
                  > overly-long lines are not "natural" to a *line*-editor.

                  Not exactly an add-on, it's an option, viz., 'synmaxcol', rather recent
                  option at that: it was new in Release 7.0. At first I thought it was a
                  bug ("Hey! I don't get any highlighting past the n-th chracter on the
                  line!") (and I see the present default is 3000, but ISTR it was
                  originally 1000). Bram had to point me to the new option.

                  [...]
                  >> What was the reason again to add :vimgrep to vim when grep is
                  >> available?

                  The ":helpgrep" command appeared first (at patchlevel 6.1.423), to help
                  us find our lost needles in the huge haystack of help text. Once that
                  was there, adding a general-purpose internal grep (in version 7.0)
                  wasn't much additional work, and it was a help for platforms like
                  Windows, where the OS doesn't install an external grep -- sure, there
                  are GnuWin32, unxutils, MinGW, and the like, but you have to fetch one
                  of these grep versions yourself if you want to have one. With
                  ":vimgrep", no need to check whether or not an external grep is
                  available; and, I repeat, from ":helpgrep" to ":vimgrep" it wasn't a big
                  step.

                  Another advantage is that ":vimgrep" is guaranteed to use exactly the
                  same regular expressions as are used everywhere else in Vim, while egrep
                  may use something just subtly different, and certainly not documented in
                  the Vim help.

                  >
                  > I have no idea, as I don't recall ever using it.<shrug/>

                  I did, about as soon as it appeared, with some "snapshot" of Vim 7.0aa
                  alpha.

                  >
                  >
                  > To reiterate, I *don't* want to appear to be argumentative, but I'm just
                  > saying that handing Uberlines is something that's *possible* in
                  > 'vim'/'gvim', but don't expect it to be handled "efficiently", not if
                  > it's well outside the usual performance envelope of file-editing.

                  I agree with that. Searching a file tens of megabytes long for a
                  not-too-complex regexp already takes some measurable time (maybe tens of
                  seconds); if you want to join a gigabazillion lines into one, Vim can do
                  it -- but expect it to take hours or even days rather than seconds, and
                  at, oh, let's say 99.5% CPU time. It's not even a hang -- it's just that
                  you gave it an enormous task to which it is ill-suited, and it's
                  uncompainingly and patiently grinding at it; when it's finished, it will
                  wait for the next command -- if by then you haven't lost patience and
                  interruped it or killed it. To efficiently join all lines of a big file,
                  use some program such as tr, which looks at the file characterwise
                  rather than linewise and makes a single pass through it. You can even
                  use it from within Vim, of course (see ":help filter").


                  Best regards,
                  Tony.
                  --
                  There once was a Scot named McAmeter
                  With a tool of prodigious diameter.
                  It was not the size
                  That caused such surprise;
                  'Twas his rhythm -- iambic pentameter.

                  --~--~---------~--~----~------------~-------~--~----~
                  You received this message from the "vim_use" maillist.
                  For more information, visit http://www.vim.org/maillist.php
                  -~----------~----~----~----~------~----~------~--~---
                • Christian Brabandt
                  Hi Tony! ... Tony, please read the documentation provided with the plugin or at the plugin page and try out the test case I have been given. And then tell me
                  Message 8 of 10 , Sep 14, 2009
                  • 0 Attachment
                    Hi Tony!

                    On Mi, 09 Sep 2009, Tony Mechelynck wrote:


                    > tens of seconds); if you want to join a gigabazillion lines into
                    > one, Vim can do it -- but expect it to take hours or even days
                    > rather than seconds, and at, oh, let's say 99.5% CPU time. It's not
                    > even a hang -- it's just that you gave it an enormous task to which
                    > it is ill-suited, and it's uncompainingly and patiently grinding at
                    > it; when it's finished, it will wait for the next command -- if by
                    > then you haven't lost patience and interruped it or killed it. To

                    Tony, please read the documentation provided with the plugin or at the
                    plugin page and try out the test case I have been given. And then tell
                    me again, that this does not sound like a bug.

                    Just the fact, that it is possible to improve the join command using vim
                    scripting language gives strong evidence, that it is a bug indeed.

                    > efficiently join all lines of a big file, use some program such as
                    > tr, which looks at the file characterwise rather than linewise and
                    > makes a single pass through it. You can even use it from within Vim,
                    > of course (see ":help filter").

                    see above and please read the provided documentation.

                    regards,
                    Christian
                    --
                    :wq!

                    --~--~---------~--~----~------------~-------~--~----~
                    You received this message from the "vim_use" maillist.
                    For more information, visit http://www.vim.org/maillist.php
                    -~----------~----~----~----~------~----~------~--~---
                  • bill lam
                    ... IMO it is not a bug but that it does not scale up. Just like compute 1+1+1+1 as ((1+1)+1)+1 that each operation (+) is independent of other (+) so that it
                    Message 9 of 10 , Sep 14, 2009
                    • 0 Attachment
                      On Mon, 14 Sep 2009, Christian Brabandt wrote:
                      >
                      > Hi Tony!
                      >
                      > On Mi, 09 Sep 2009, Tony Mechelynck wrote:
                      >
                      >
                      > > tens of seconds); if you want to join a gigabazillion lines into
                      > > one, Vim can do it -- but expect it to take hours or even days
                      > > rather than seconds, and at, oh, let's say 99.5% CPU time. It's not
                      > > even a hang -- it's just that you gave it an enormous task to which
                      > > it is ill-suited, and it's uncompainingly and patiently grinding at
                      > > it; when it's finished, it will wait for the next command -- if by
                      > > then you haven't lost patience and interruped it or killed it. To
                      >
                      > Tony, please read the documentation provided with the plugin or at the
                      > plugin page and try out the test case I have been given. And then tell
                      > me again, that this does not sound like a bug.
                      >
                      > Just the fact, that it is possible to improve the join command using vim
                      > scripting language gives strong evidence, that it is a bug indeed.

                      IMO it is not a bug but that it does not scale up. Just like compute
                      1+1+1+1 as ((1+1)+1)+1
                      that each operation (+) is independent of other (+) so that it needs a
                      quadratic time to do the calculation. It would be a trade off between
                      time and space, IMO to allocate 50% more space than what is needed
                      so that some of these (:join) can be done in-place should improve
                      performance. (untest)

                      --
                      regards,
                      ====================================================
                      GPG key 1024D/4434BAB3 2008-08-24
                      gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3

                      --~--~---------~--~----~------------~-------~--~----~
                      You received this message from the "vim_use" maillist.
                      For more information, visit http://www.vim.org/maillist.php
                      -~----------~----~----~----~------~----~------~--~---
                    Your message has been successfully submitted and would be delivered to recipients shortly.