Loading ...
Sorry, an error occurred while loading the content.

improving the :join command

Expand Messages
  • Christian Brabandt
    Hi, as you may know, :join suffers from a serious performance issue, when joining many lines ( in the order of 10 thousands or more lines). Therefore I have
    Message 1 of 10 , Aug 31, 2009
    • 0 Attachment
      Hi,

      as you may know, :join suffers from a serious performance issue, when
      joining many lines ( in the order of 10 thousands or more lines).
      Therefore I have written the following plugin┬╣, which tries to improve
      the performance when joining lines.

      I have been trying to make the plugin behave exactly like the :join
      command, therefore I'd like to ask to test, whether it works like
      :join.

      I hope this helps until the patch at the vim_extend repository get's
      into vim mainline.

      ┬╣)http://www.vim.org/scripts/script.php?script_id=2766


      regards,
      Christian

      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_use" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---
    • Gene Kwiecinski
      ... No disrespect intended, but *why* in B Harni s Dark Name would you want to join 10000 lines into 1?!? Any vi variant is a *line*-based editor, which
      Message 2 of 10 , Sep 1, 2009
      • 0 Attachment
        >as you may know, :join suffers from a serious performance issue, when
        >joining many lines ( in the order of 10 thousands or more lines).

        No disrespect intended, but *why* in B'Harni's Dark Name would you want
        to join >10000 lines into 1?!?

        Any 'vi' variant is a *line*-based editor, which presumed a modest
        line-size for each. Juggling lines back and forth is easy, but heaving
        huge MB-sized chunks o' text is just obscene. Add to that syntax-based
        highlighting, multiple colors, etc., and all the processing required for
        just *1* line adds exponentially to the amount of work involved, let
        alone cursor motions, etc.

        Dunno, but to me, that seems like using a text editor to edit a .jpg or
        .gif or something, ie, not the right tool for the job, even if, through
        herculean contortions and torturing the editor's functionality, it *can*
        be done.

        I'd, if anything, edit the file as needed, save it, then use 'sed',
        'tr', etc., to post-process it accordingly. No overhead for syntax,
        colorschemes, etc. Ie, use the right tool for the job.

        --~--~---------~--~----~------------~-------~--~----~
        You received this message from the "vim_use" maillist.
        For more information, visit http://www.vim.org/maillist.php
        -~----------~----~----~----~------~----~------~--~---
      • Christian Brabandt
        Hi Gene! ... There might be usecases. Data is growing rapidly today, and I myself had to manage automatically generated text-files of several hundred MB of
        Message 3 of 10 , Sep 1, 2009
        • 0 Attachment
          Hi Gene!

          On Di, 01 Sep 2009, Gene Kwiecinski wrote:

          > No disrespect intended, but *why* in B'Harni's Dark Name would you
          > want to join >10000 lines into 1?!?

          There might be usecases. Data is growing rapidly today, and I myself
          had to manage automatically generated text-files of several hundred MB
          of size. Plus there have occasionally been questions on this list
          regarding joining lines.

          Well just one simple test:

          #v+
          ~$ for i in 1 2 4 8 16 32 64 128; do
          seq 1 $(($i*1000)) > tempfile
          echo "joining $i kilo lines"
          time vim -u NONE -N -c ':%join|:q!' tempfile;
          done
          #v-

          and compare the timings yourself. Doesn't this look like a bug to you?

          > Any 'vi' variant is a *line*-based editor, which presumed a modest
          > line-size for each. Juggling lines back and forth is easy, but heaving
          > huge MB-sized chunks o' text is just obscene. Add to that syntax-based
          > highlighting, multiple colors, etc., and all the processing required for
          > just *1* line adds exponentially to the amount of work involved, let
          > alone cursor motions, etc.

          Well Vim is an editor. Shouldn't it be able to join properly millions
          of lines, even if that sounds strange? The power of vim comes from
          the fact, that you can do many different manipulations very
          efficiently and does not limit you.

          Plus :h limits does not talk about joining only a couple of lines ;)

          > Dunno, but to me, that seems like using a text editor to edit a .jpg or
          > .gif or something, ie, not the right tool for the job, even if, through
          > herculean contortions and torturing the editor's functionality, it *can*
          > be done.

          Exactly. It can. And it might be done by someone.

          > I'd, if anything, edit the file as needed, save it, then use 'sed',
          > 'tr', etc., to post-process it accordingly. No overhead for syntax,
          > colorschemes, etc. Ie, use the right tool for the job.

          Yeah, but sed, tr, awk, perl, $language is not always available. And
          Vim should be able to do it right.

          What was the reason again to add :vimgrep to vim when grep is
          available?

          regards,
          Christian

          --~--~---------~--~----~------------~-------~--~----~
          You received this message from the "vim_use" maillist.
          For more information, visit http://www.vim.org/maillist.php
          -~----------~----~----~----~------~----~------~--~---
        • Gene Kwiecinski
          ... Even so, something which I can understand, eg, a logfile, should be delineated by linebreaks. Raw xml/sgml/etc., should be edited as a sequence of
          Message 4 of 10 , Sep 1, 2009
          • 0 Attachment
            >>No disrespect intended, but *why* in B'Harni's Dark Name would you
            >>want to join >10000 lines into 1?!?

            >There might be usecases. Data is growing rapidly today, and I myself
            >had to manage automatically generated text-files of several hundred MB
            >of size. Plus there have occasionally been questions on this list
            >regarding joining lines.

            Even so, something which I can understand, eg, a logfile, should be
            delineated by linebreaks. Raw xml/sgml/etc., should be edited as a
            sequence of modestly-sized lines, then if necessary, joined to a single
            line after saving (or before saving, if you have the time :D ).


            >Well just one simple test:

            >#v+
            >~$ for i in 1 2 4 8 16 32 64 128; do
            > seq 1 $(($i*1000)) >tempfile
            > echo "joining $i kilo lines"
            > time vim -u NONE -N -c ':%join|:q!' tempfile;
            >done
            >#v-

            >and compare the timings yourself. Doesn't this look like a bug to you?

            I have no idea, as I didn't run it yet. Offhand, an exponential
            increase wouldn't be out of the question, ie, e^n.

            Don't forget *physical* limits such as available memory. Once you bang
            your head on that memory-ceiling and start having to swap to disk, all
            bets are off, and processing time can increase by order*s* of magnitude,
            depending how bad it is. Hell, I run into that in *perl*, let alone
            'gvim', when intentionally joining huge files to a single line to c&p
            whole sections of the file! And I'm not even dealing with syntax
            highlighting, colorschemes, and the like.


            >>Any 'vi' variant is a *line*-based editor, which presumed a modest
            >>line-size for each. Juggling lines back and forth is easy, but
            heaving
            >>huge MB-sized chunks o' text is just obscene. Add to that
            syntax-based
            >>highlighting, multiple colors, etc., and all the processing required
            for
            >>just *1* line adds exponentially to the amount of work involved, let
            >>alone cursor motions, etc.

            >Well Vim is an editor. Shouldn't it be able to join properly millions
            >of lines, even if that sounds strange? The power of vim comes from

            Sure, it should be able to be pushed to its limits and do so, but not
            necessarily *efficiently*. Ie, it may hit that aforementioned ceiling
            and then start hitting the disk to do so, and pretty much require you to
            leave it running overnight to go and join a brazillion lines into 1
            Uberline. That's not necessarily a "bug", just an unexpected excursion
            of its performance envelope. The fact that it can create a huge
            Uberline without *crashing* is a testament to the robustness of the
            code. An old version of 'vi' I had would vomit on lines >300chars or
            so.

            Point being, *line*-editors are meant to be used with *lines*, and lines
            of a modest size. The fact that it *can* handle Uberlines is great, but
            you can't expect it to be handled "efficiently". The kind of advice I
            might give would be along the lines of the guy who sees his doctor:

            guy: "Doc, it hurts when I do this."
            doc: "So don't do that."


            >the fact, that you can do many different manipulations very
            >efficiently and does not limit you.

            Absolutely, but again, recall that it's intended to be a *line*-editor.
            Not to appear facetious in repeating that again, but that's what
            'vim'/'gvim' happens to be, a *line*-editor. You yank and put *lines*.
            You add *lines*. You delete *lines*. Hell, syntax highlighting becomes
            downright painful for overly-long lines that people wrote add-ons to
            stop highlighting after N columns! That should be Clue #1 that
            overly-long lines are not "natural" to a *line*-editor.


            >Plus :h limits does not talk about joining only a couple of lines ;)

            Of course not. I can 'ls' a list of filenames into a file, do a ':%j'
            to get them into a single line, then prepend a command to run (with the
            filelist as the list of files to operate on) and make an instant
            batchfile. Works great. But there's a huge difference between a
            batch-/shell-command that's 1000chars long, and a 1-line file with a
            100Mchar Uberline.


            >>Dunno, but to me, that seems like using a text editor to edit a .jpg
            or
            >>.gif or something, ie, not the right tool for the job, even if,
            through
            >>herculean contortions and torturing the editor's functionality, it
            *can*
            >>be done.

            >Exactly. It can. And it might be done by someone.

            And if he has the luxury of letting it run overnight, great. :D


            >>I'd, if anything, edit the file as needed, save it, then use 'sed',
            >>'tr', etc., to post-process it accordingly. No overhead for syntax,
            >>colorschemes, etc. Ie, use the right tool for the job.

            >Yeah, but sed, tr, awk, perl, $language is not always available. And
            >Vim should be able to do it right.

            >What was the reason again to add :vimgrep to vim when grep is
            >available?

            I have no idea, as I don't recall ever using it. <shrug/>


            To reiterate, I *don't* want to appear to be argumentative, but I'm just
            saying that handing Uberlines is something that's *possible* in
            'vim'/'gvim', but don't expect it to be handled "efficiently", not if
            it's well outside the usual performance envelope of file-editing.

            --~--~---------~--~----~------------~-------~--~----~
            You received this message from the "vim_use" maillist.
            For more information, visit http://www.vim.org/maillist.php
            -~----------~----~----~----~------~----~------~--~---
          • Christian Brabandt
            Hi Gene! ... Well, then don t use it. Nobody forces you to use the plugin. But someone might think it s useful. regards, Christian
            Message 5 of 10 , Sep 1, 2009
            • 0 Attachment
              Hi Gene!

              On Di, 01 Sep 2009, Gene Kwiecinski wrote:

              > To reiterate, I *don't* want to appear to be argumentative, but I'm just
              > saying that handing Uberlines is something that's *possible* in
              > 'vim'/'gvim', but don't expect it to be handled "efficiently", not if
              > it's well outside the usual performance envelope of file-editing.

              Well, then don't use it. Nobody forces you to use the plugin. But
              someone might think it's useful.

              regards,
              Christian

              --~--~---------~--~----~------------~-------~--~----~
              You received this message from the "vim_use" maillist.
              For more information, visit http://www.vim.org/maillist.php
              -~----------~----~----~----~------~----~------~--~---
            • sssstefan@gmail.com
              ... On the other hand, if the OP s plugin is actually faster than Vim native , it is probably an indication that the native implementation could easily (?) be
              Message 6 of 10 , Sep 4, 2009
              • 0 Attachment
                On 2009-09-01 16:31 -0400, Gene Kwiecinski wrote:

                > /.../ I'm just saying that handing Uberlines is something that's
                > *possible* in 'vim'/'gvim', but don't expect it to be handled
                > "efficiently", not if it's well outside the usual performance envelope
                > of file-editing.

                On the other hand, if the OP's plugin is actually faster than Vim
                "native", it is probably an indication that the native implementation
                could easily (?) be improved.

                --
                Stefan


                --~--~---------~--~----~------------~-------~--~----~
                You received this message from the "vim_use" maillist.
                For more information, visit http://www.vim.org/maillist.php
                -~----------~----~----~----~------~----~------~--~---
              • Christian Brabandt
                Hi sssstefan! ... It is. Try it out. Or look at the timings given at the scripts plugin page/help or look at the timings given at:
                Message 7 of 10 , Sep 4, 2009
                • 0 Attachment
                  Hi sssstefan!

                  On Fr, 04 Sep 2009, sssstefan@... wrote:

                  > On the other hand, if the OP's plugin is actually faster than Vim
                  > "native", it is probably an indication that the native implementation
                  > could easily (?) be improved.

                  It is. Try it out. Or look at the timings given at the scripts plugin
                  page/help or look at the timings given at:
                  http://article.gmane.org/gmane.editors.vim/80315 (the plugin uses a
                  similar, though slightly more complex algorithm (to make it really
                  like :join))

                  regards,
                  Christian
                  --
                  No children may attend school with their breath smelling of "wild onions."
                  [real standing law in West Virginia, United States of America]

                  --~--~---------~--~----~------------~-------~--~----~
                  You received this message from the "vim_use" maillist.
                  For more information, visit http://www.vim.org/maillist.php
                  -~----------~----~----~----~------~----~------~--~---
                • Tony Mechelynck
                  On 01/09/09 22:31, Gene Kwiecinski wrote: [...] ... Not exactly an add-on, it s an option, viz., synmaxcol , rather recent option at that: it was new in
                  Message 8 of 10 , Sep 9, 2009
                  • 0 Attachment
                    On 01/09/09 22:31, Gene Kwiecinski wrote:
                    [...]
                    > Absolutely, but again, recall that it's intended to be a *line*-editor.
                    > Not to appear facetious in repeating that again, but that's what
                    > 'vim'/'gvim' happens to be, a *line*-editor. You yank and put *lines*.
                    > You add *lines*. You delete *lines*. Hell, syntax highlighting becomes
                    > downright painful for overly-long lines that people wrote add-ons to
                    > stop highlighting after N columns! That should be Clue #1 that
                    > overly-long lines are not "natural" to a *line*-editor.

                    Not exactly an add-on, it's an option, viz., 'synmaxcol', rather recent
                    option at that: it was new in Release 7.0. At first I thought it was a
                    bug ("Hey! I don't get any highlighting past the n-th chracter on the
                    line!") (and I see the present default is 3000, but ISTR it was
                    originally 1000). Bram had to point me to the new option.

                    [...]
                    >> What was the reason again to add :vimgrep to vim when grep is
                    >> available?

                    The ":helpgrep" command appeared first (at patchlevel 6.1.423), to help
                    us find our lost needles in the huge haystack of help text. Once that
                    was there, adding a general-purpose internal grep (in version 7.0)
                    wasn't much additional work, and it was a help for platforms like
                    Windows, where the OS doesn't install an external grep -- sure, there
                    are GnuWin32, unxutils, MinGW, and the like, but you have to fetch one
                    of these grep versions yourself if you want to have one. With
                    ":vimgrep", no need to check whether or not an external grep is
                    available; and, I repeat, from ":helpgrep" to ":vimgrep" it wasn't a big
                    step.

                    Another advantage is that ":vimgrep" is guaranteed to use exactly the
                    same regular expressions as are used everywhere else in Vim, while egrep
                    may use something just subtly different, and certainly not documented in
                    the Vim help.

                    >
                    > I have no idea, as I don't recall ever using it.<shrug/>

                    I did, about as soon as it appeared, with some "snapshot" of Vim 7.0aa
                    alpha.

                    >
                    >
                    > To reiterate, I *don't* want to appear to be argumentative, but I'm just
                    > saying that handing Uberlines is something that's *possible* in
                    > 'vim'/'gvim', but don't expect it to be handled "efficiently", not if
                    > it's well outside the usual performance envelope of file-editing.

                    I agree with that. Searching a file tens of megabytes long for a
                    not-too-complex regexp already takes some measurable time (maybe tens of
                    seconds); if you want to join a gigabazillion lines into one, Vim can do
                    it -- but expect it to take hours or even days rather than seconds, and
                    at, oh, let's say 99.5% CPU time. It's not even a hang -- it's just that
                    you gave it an enormous task to which it is ill-suited, and it's
                    uncompainingly and patiently grinding at it; when it's finished, it will
                    wait for the next command -- if by then you haven't lost patience and
                    interruped it or killed it. To efficiently join all lines of a big file,
                    use some program such as tr, which looks at the file characterwise
                    rather than linewise and makes a single pass through it. You can even
                    use it from within Vim, of course (see ":help filter").


                    Best regards,
                    Tony.
                    --
                    There once was a Scot named McAmeter
                    With a tool of prodigious diameter.
                    It was not the size
                    That caused such surprise;
                    'Twas his rhythm -- iambic pentameter.

                    --~--~---------~--~----~------------~-------~--~----~
                    You received this message from the "vim_use" maillist.
                    For more information, visit http://www.vim.org/maillist.php
                    -~----------~----~----~----~------~----~------~--~---
                  • Christian Brabandt
                    Hi Tony! ... Tony, please read the documentation provided with the plugin or at the plugin page and try out the test case I have been given. And then tell me
                    Message 9 of 10 , Sep 14, 2009
                    • 0 Attachment
                      Hi Tony!

                      On Mi, 09 Sep 2009, Tony Mechelynck wrote:


                      > tens of seconds); if you want to join a gigabazillion lines into
                      > one, Vim can do it -- but expect it to take hours or even days
                      > rather than seconds, and at, oh, let's say 99.5% CPU time. It's not
                      > even a hang -- it's just that you gave it an enormous task to which
                      > it is ill-suited, and it's uncompainingly and patiently grinding at
                      > it; when it's finished, it will wait for the next command -- if by
                      > then you haven't lost patience and interruped it or killed it. To

                      Tony, please read the documentation provided with the plugin or at the
                      plugin page and try out the test case I have been given. And then tell
                      me again, that this does not sound like a bug.

                      Just the fact, that it is possible to improve the join command using vim
                      scripting language gives strong evidence, that it is a bug indeed.

                      > efficiently join all lines of a big file, use some program such as
                      > tr, which looks at the file characterwise rather than linewise and
                      > makes a single pass through it. You can even use it from within Vim,
                      > of course (see ":help filter").

                      see above and please read the provided documentation.

                      regards,
                      Christian
                      --
                      :wq!

                      --~--~---------~--~----~------------~-------~--~----~
                      You received this message from the "vim_use" maillist.
                      For more information, visit http://www.vim.org/maillist.php
                      -~----------~----~----~----~------~----~------~--~---
                    • bill lam
                      ... IMO it is not a bug but that it does not scale up. Just like compute 1+1+1+1 as ((1+1)+1)+1 that each operation (+) is independent of other (+) so that it
                      Message 10 of 10 , Sep 14, 2009
                      • 0 Attachment
                        On Mon, 14 Sep 2009, Christian Brabandt wrote:
                        >
                        > Hi Tony!
                        >
                        > On Mi, 09 Sep 2009, Tony Mechelynck wrote:
                        >
                        >
                        > > tens of seconds); if you want to join a gigabazillion lines into
                        > > one, Vim can do it -- but expect it to take hours or even days
                        > > rather than seconds, and at, oh, let's say 99.5% CPU time. It's not
                        > > even a hang -- it's just that you gave it an enormous task to which
                        > > it is ill-suited, and it's uncompainingly and patiently grinding at
                        > > it; when it's finished, it will wait for the next command -- if by
                        > > then you haven't lost patience and interruped it or killed it. To
                        >
                        > Tony, please read the documentation provided with the plugin or at the
                        > plugin page and try out the test case I have been given. And then tell
                        > me again, that this does not sound like a bug.
                        >
                        > Just the fact, that it is possible to improve the join command using vim
                        > scripting language gives strong evidence, that it is a bug indeed.

                        IMO it is not a bug but that it does not scale up. Just like compute
                        1+1+1+1 as ((1+1)+1)+1
                        that each operation (+) is independent of other (+) so that it needs a
                        quadratic time to do the calculation. It would be a trade off between
                        time and space, IMO to allocate 50% more space than what is needed
                        so that some of these (:join) can be done in-place should improve
                        performance. (untest)

                        --
                        regards,
                        ====================================================
                        GPG key 1024D/4434BAB3 2008-08-24
                        gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3

                        --~--~---------~--~----~------------~-------~--~----~
                        You received this message from the "vim_use" maillist.
                        For more information, visit http://www.vim.org/maillist.php
                        -~----------~----~----~----~------~----~------~--~---
                      Your message has been successfully submitted and would be delivered to recipients shortly.