Loading ...
Sorry, an error occurred while loading the content.

Re: Some performance figures for the new regexp engine

Expand Messages
  • glts
    ... Sorry, it was getting late yesterday. Of course, Vim needs to read the big file into a buffer first. On my machine this takes about 0.34 seconds. -- -- You
    Message 1 of 10 , May 25, 2013
    • 0 Attachment
      On Sat, May 25, 2013 at 1:43 AM, glts <676c7473@...> wrote:
      > On Sat, May 25, 2013 at 1:17 AM, Marc Weber <marco-oweber@...> wrote:
      >> Starting vim takes less than a second, your runs are > 10 sec most of
      >> the time, so I guess that re=1 vs re=2 is representative for your cases.
      >
      > Vim startup time is <0.01s.

      Sorry, it was getting late yesterday.

      Of course, Vim needs to read the big file into a buffer first. On my
      machine this takes about 0.34 seconds.

      --
      --
      You received this message from the "vim_dev" maillist.
      Do not top-post! Type your reply below the text you are replying to.
      For more information, visit http://www.vim.org/maillist.php

      ---
      You received this message because you are subscribed to the Google Groups "vim_dev" group.
      To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
      For more options, visit https://groups.google.com/groups/opt_out.
    • Bram Moolenaar
      ... Interesting and useful. Since using short text there will be quite a bit of overhead from executing the regexp though. Did you also check for correctness?
      Message 2 of 10 , May 25, 2013
      • 0 Attachment
        Glts wrote:

        > there is simple but straightforward "Benchmark of Regex Libraries" at
        > http://lh3lh3.users.sourceforge.net/reb.shtml
        >
        > I measured search durations for both regexp engines in Vim 7.3.1010,
        > using the same regular expressions and the same data they used. I added
        > one simpler test, which is listed in the final column ("Word") below.
        >
        > Please refer to that web site for more information.
        >
        > URI Email Date Sum3 URI|Email Word
        > re=1 16.34 13.65 4.07 34.06 29.46 0.49
        > re=2 92.03 9.75 4.47 106.25 105.39 5.22
        >
        > Python 2.7.3 2.69 5.17 1.01 8.87 7.72 3.40
        > Perl 5.14.2 0.35 0.33 0.32 1.00 8.12 0.31
        > GNU egrep 2.10 0.21 0.16 0.56 0.93 10.86 0.03
        >
        > The figures (in s) are averages of five runs each, on Linux, 64-bit
        > i7-2700K CPU @ 3.50GHz x 8. (The Python, Perl, and egrep figures are
        > just for comparison with the original benchmark.)
        >
        > I made a gist with the necessary materials here:
        > https://gist.github.com/glts/5646749

        Interesting and useful. Since using short text there will be quite a
        bit of overhead from executing the regexp though.

        Did you also check for correctness? It's easy to make a regexp engine
        that's fast but wrong. And do they possibly run out of memory and
        crash? I rewrote the original Vim regexp engine to be iterative instead
        of recursive to handle that.

        I need some tests to do profiling, I could perhaps use some of this.


        --
        Our job was to build a computer information system for the branch banks. We
        were the perfect people for the job: Dean had seen a computer once, and I had
        heard Dean talk about it.
        (Scott Adams - The Dilbert principle)

        /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
        /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
        \\\ an exciting new programming language -- http://www.Zimbu.org ///
        \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

        --
        --
        You received this message from the "vim_dev" maillist.
        Do not top-post! Type your reply below the text you are replying to.
        For more information, visit http://www.vim.org/maillist.php

        ---
        You received this message because you are subscribed to the Google Groups "vim_dev" group.
        To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
        For more options, visit https://groups.google.com/groups/opt_out.
      • glts
        ... The Vim regexp engines always produced identical results, as did the Perl and Python scripts. There were a few bytes of discrepancies between Vim1/Vim2,
        Message 3 of 10 , May 25, 2013
        • 0 Attachment
          On Sat, May 25, 2013 at 1:13 PM, Bram Moolenaar <Bram@...> wrote:
          > Did you also check for correctness? It's easy to make a regexp engine
          > that's fast but wrong. And do they possibly run out of memory and
          > crash? I rewrote the original Vim regexp engine to be iterative instead
          > of recursive to handle that.

          The Vim regexp engines always produced identical results, as did the
          Perl and Python scripts.

          There were a few bytes of discrepancies between Vim1/Vim2, Perl/Python,
          and egrep, but those were mostly due to invalid UTF-8 bytes in the data
          file. (I had wanted to use the exact same, flawed input data as the
          original benchmark, but of course this isn't necessary.)

          More challenging tests for memory usage etc. would be interesting but I
          have no idea how to go about this.

          > I need some tests to do profiling, I could perhaps use some of this.

          Sure, it's free for the taking.

          --
          --
          You received this message from the "vim_dev" maillist.
          Do not top-post! Type your reply below the text you are replying to.
          For more information, visit http://www.vim.org/maillist.php

          ---
          You received this message because you are subscribed to the Google Groups "vim_dev" group.
          To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
          For more options, visit https://groups.google.com/groups/opt_out.
        • Marc Weber
          If we are at it - and other tools are already faster, is there a chance to also add a engine 3 reusing the fast code? If we partially break with VimL (by
          Message 4 of 10 , May 25, 2013
          • 0 Attachment
            If we are at it - and other tools are already faster, is there a chance
            to also add a engine 3 reusing the fast code?

            If we partially break with VimL (by improving python support) why
            not also add a commonly used regex syntax if speed is different.

            I've tried debugging gnugrep which was said to be faster:
            dfasearch.c contains:
            if ((err = re_compile_pattern (p, len,
            &(patterns[pcount].regexbuf))) != NULL)

            I was not able to step into re_compile_pattern, probably because its
            using glibc's re_compile_pattern.

            The problem is that I don't think you can use those regular expressions
            to search for multiline patterns if can be found in different patterns.

            Eg there is re_search_2 which does not work with an internal "regular
            expression matching state", but concatenates two strings:

            /* Like `re_search', but search in the concatenation of STRING1 and
            STRING2. Also, stop searching at index START + STOP. */
            extern int re_search_2 (struct re_pattern_buffer *__buffer,
            const char *__string1, int __length1,
            const char *__string2, int __length2, int __start,
            int __range, struct re_registers *__regs, int __stop);


            PCRE even started jitting:
            https://github.com/hnakamur/pcre-jit-benchmark
            http://sljit.sourceforge.net/pcre.html

            and it supports multi-segment matching (which I guess would be required
            to implement vim_regexec_multi ?

            Bram, do you think its worth a try adding pcre support?

            It shouldn't be too hard to translate Vim regular expressions to pcre
            syntax before compiling.

            Why should we continue spending time on maintaining this ourselves?

            I haven't looked at all details but I think adding pcre support is
            straight forward.

            Thoughts?

            Marc Weber

            --
            --
            You received this message from the "vim_dev" maillist.
            Do not top-post! Type your reply below the text you are replying to.
            For more information, visit http://www.vim.org/maillist.php

            ---
            You received this message because you are subscribed to the Google Groups "vim_dev" group.
            To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
            For more options, visit https://groups.google.com/groups/opt_out.
          • Bram Moolenaar
            ... Adding another regexp syntax will confuse users. It s too easy to make mistakes. Translating one RE syntax into another works for simple things, it s
            Message 5 of 10 , May 26, 2013
            • 0 Attachment
              Marc Weber wrote:

              > If we are at it - and other tools are already faster, is there a chance
              > to also add a engine 3 reusing the fast code?
              >
              > If we partially break with VimL (by improving python support) why
              > not also add a commonly used regex syntax if speed is different.
              >
              > I've tried debugging gnugrep which was said to be faster:
              > dfasearch.c contains:
              > if ((err = re_compile_pattern (p, len,
              > &(patterns[pcount].regexbuf))) != NULL)
              >
              > I was not able to step into re_compile_pattern, probably because its
              > using glibc's re_compile_pattern.
              >
              > The problem is that I don't think you can use those regular expressions
              > to search for multiline patterns if can be found in different patterns.
              >
              > Eg there is re_search_2 which does not work with an internal "regular
              > expression matching state", but concatenates two strings:
              >
              > /* Like `re_search', but search in the concatenation of STRING1 and
              > STRING2. Also, stop searching at index START + STOP. */
              > extern int re_search_2 (struct re_pattern_buffer *__buffer,
              > const char *__string1, int __length1,
              > const char *__string2, int __length2, int __start,
              > int __range, struct re_registers *__regs, int __stop);
              >
              >
              > PCRE even started jitting:
              > https://github.com/hnakamur/pcre-jit-benchmark
              > http://sljit.sourceforge.net/pcre.html
              >
              > and it supports multi-segment matching (which I guess would be required
              > to implement vim_regexec_multi ?
              >
              > Bram, do you think its worth a try adding pcre support?
              >
              > It shouldn't be too hard to translate Vim regular expressions to pcre
              > syntax before compiling.
              >
              > Why should we continue spending time on maintaining this ourselves?
              >
              > I haven't looked at all details but I think adding pcre support is
              > straight forward.

              Adding another regexp syntax will confuse users. It's too easy to make
              mistakes.

              Translating one RE syntax into another works for simple things, it's
              close to impossible for more complex things. Especially selecting the
              right match out of alternatives is not something you can easily do by
              changing the pattern. Getting it right for the new engine, which was
              specifically made for Vim, already turns out to be very difficult.

              The performance of pattern matching overal is good enough. It's only
              for syntax highlighting that slowness has been reported. And there we
              are using Vim expressions. Considering how slow syntax files are
              updated, I don't expect more than a few to switch to another RE syntax.
              They would more easily gain speed by optimizing the existing patterns.

              --
              The fastest way to get an engineer to solve a problem is to declare that the
              problem is unsolvable. No engineer can walk away from an unsolvable problem
              until it's solved.
              (Scott Adams - The Dilbert principle)

              /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
              /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
              \\\ an exciting new programming language -- http://www.Zimbu.org ///
              \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

              --
              --
              You received this message from the "vim_dev" maillist.
              Do not top-post! Type your reply below the text you are replying to.
              For more information, visit http://www.vim.org/maillist.php

              ---
              You received this message because you are subscribed to the Google Groups "vim_dev" group.
              To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
              For more options, visit https://groups.google.com/groups/opt_out.
            • glts
              I ran this benchmark again for Vim 7.3.1152, and I d like to share the results with you because I think they re impressive. ... Same machine, 142 patches
              Message 6 of 10 , Jun 9, 2013
              • 0 Attachment
                I ran this benchmark again for Vim 7.3.1152, and I'd like to share the
                results with you because I think they're impressive.

                On Saturday, May 25, 2013 12:56:56 AM UTC+2, I wrote:
                > URI Email Date Sum3 URI|Email Word
                > re=1 16.34 13.65 4.07 34.06 29.46 0.49
                > re=2 92.03 9.75 4.47 106.25 105.39 5.22
                >
                > Python 2.7.3 2.69 5.17 1.01 8.87 7.72 3.40
                > Perl 5.14.2 0.35 0.33 0.32 1.00 8.12 0.31
                > GNU egrep 2.10 0.21 0.16 0.56 0.93 10.86 0.03

                Same machine, 142 patches later:

                URI Email Date Sum3 URI|Email Word
                re=1 16.73 13.84 4.08 34.65 29.43 0.49
                re=2 2.94 3.07 1.52 7.53 5.61 2.31

                That's a drastic improvement. Note the results for "URI|Email", where
                Vim outperforms all of Python, Perl, and egrep. Cool!

                --
                --
                You received this message from the "vim_dev" maillist.
                Do not top-post! Type your reply below the text you are replying to.
                For more information, visit http://www.vim.org/maillist.php

                ---
                You received this message because you are subscribed to the Google Groups "vim_dev" group.
                To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
                For more options, visit https://groups.google.com/groups/opt_out.
              • Ben Fritz
                ... It also looks like we re finally at the point where the NFA engine actually is faster, except for trivial matches. I assume word means a single literal
                Message 7 of 10 , Jun 9, 2013
                • 0 Attachment
                  On Sunday, June 9, 2013 7:31:36 AM UTC-5, glts wrote:
                  >
                  > URI Email Date Sum3 URI|Email Word
                  >
                  > re=1 16.73 13.84 4.08 34.65 29.43 0.49
                  >
                  > re=2 2.94 3.07 1.52 7.53 5.61 2.31
                  >
                  >
                  >
                  > That's a drastic improvement. Note the results for "URI|Email", where
                  >
                  > Vim outperforms all of Python, Perl, and egrep. Cool!

                  It also looks like we're finally at the point where the NFA engine actually is faster, except for trivial matches. I assume "word" means a single literal word?

                  --
                  --
                  You received this message from the "vim_dev" maillist.
                  Do not top-post! Type your reply below the text you are replying to.
                  For more information, visit http://www.vim.org/maillist.php

                  ---
                  You received this message because you are subscribed to the Google Groups "vim_dev" group.
                  To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
                  For more options, visit https://groups.google.com/groups/opt_out.
                Your message has been successfully submitted and would be delivered to recipients shortly.