Loading ...
Sorry, an error occurred while loading the content.
 

Some performance figures for the new regexp engine

Expand Messages
  • glts
    Hi, there is simple but straightforward Benchmark of Regex Libraries at http://lh3lh3.users.sourceforge.net/reb.shtml I measured search durations for both
    Message 1 of 10 , May 24 3:56 PM
      Hi,

      there is simple but straightforward "Benchmark of Regex Libraries" at
      http://lh3lh3.users.sourceforge.net/reb.shtml

      I measured search durations for both regexp engines in Vim 7.3.1010,
      using the same regular expressions and the same data they used. I added
      one simpler test, which is listed in the final column ("Word") below.

      Please refer to that web site for more information.

      URI Email Date Sum3 URI|Email Word
      re=1 16.34 13.65 4.07 34.06 29.46 0.49
      re=2 92.03 9.75 4.47 106.25 105.39 5.22

      Python 2.7.3 2.69 5.17 1.01 8.87 7.72 3.40
      Perl 5.14.2 0.35 0.33 0.32 1.00 8.12 0.31
      GNU egrep 2.10 0.21 0.16 0.56 0.93 10.86 0.03

      The figures (in s) are averages of five runs each, on Linux, 64-bit
      i7-2700K CPU @ 3.50GHz x 8. (The Python, Perl, and egrep figures are
      just for comparison with the original benchmark.)

      I made a gist with the necessary materials here:
      https://gist.github.com/glts/5646749

      Best,

      --
      --
      You received this message from the "vim_dev" maillist.
      Do not top-post! Type your reply below the text you are replying to.
      For more information, visit http://www.vim.org/maillist.php

      ---
      You received this message because you are subscribed to the Google Groups "vim_dev" group.
      To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
      For more options, visit https://groups.google.com/groups/opt_out.
    • Marc Weber
      ... I think its worth mentioning that all the timings above do include startup times of the interpreter? In your gist the only defined command is the vim
      Message 2 of 10 , May 24 4:17 PM
        Excerpts from glts's message of Sat May 25 00:56:56 +0200 2013:
        > Please refer to that web site for more information.

        I think its worth mentioning that all the timings above do include
        startup times of the interpreter?

        In your gist the only defined command is the vim command line, so there
        is no way to reproduce your figures. Thus your test is almost void :(

        How to write a benchmark:
        - try to make the "startup" time neglegible, thus don't run 5 times, run
        1000 times, and do that within vim/python/.... loop so that startup
        only happens once.

        Starting vim takes less than a second, your runs are > 10 sec most of
        the time, so I guess that re=1 vs re=2 is representative for your cases.

        Marc Weber

        --
        --
        You received this message from the "vim_dev" maillist.
        Do not top-post! Type your reply below the text you are replying to.
        For more information, visit http://www.vim.org/maillist.php

        ---
        You received this message because you are subscribed to the Google Groups "vim_dev" group.
        To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
        For more options, visit https://groups.google.com/groups/opt_out.
      • glts
        Thank you for your input. ... I used a ~38M text file, startup time doesn t really enter here. ... perl script.pl pattern /dev/null python
        Message 3 of 10 , May 24 4:43 PM
          Thank you for your input.

          On Sat, May 25, 2013 at 1:17 AM, Marc Weber <marco-oweber@...> wrote:
          > I think its worth mentioning that all the timings above do include
          > startup times of the interpreter?

          I used a ~38M text file, startup time doesn't really enter here.

          > In your gist the only defined command is the vim command line, so there
          > is no way to reproduce your figures.

          perl script.pl 'pattern' </path/to/data >/dev/null
          python script.py 'pattern' </path/to/data >/dev/null
          egrep 'pattern' /path/to/data >/dev/null

          > Starting vim takes less than a second, your runs are > 10 sec most of
          > the time, so I guess that re=1 vs re=2 is representative for your cases.

          Vim startup time is <0.01s.

          --
          --
          You received this message from the "vim_dev" maillist.
          Do not top-post! Type your reply below the text you are replying to.
          For more information, visit http://www.vim.org/maillist.php

          ---
          You received this message because you are subscribed to the Google Groups "vim_dev" group.
          To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
          For more options, visit https://groups.google.com/groups/opt_out.
        • glts
          ... Sorry, it was getting late yesterday. Of course, Vim needs to read the big file into a buffer first. On my machine this takes about 0.34 seconds. -- -- You
          Message 4 of 10 , May 25 12:24 AM
            On Sat, May 25, 2013 at 1:43 AM, glts <676c7473@...> wrote:
            > On Sat, May 25, 2013 at 1:17 AM, Marc Weber <marco-oweber@...> wrote:
            >> Starting vim takes less than a second, your runs are > 10 sec most of
            >> the time, so I guess that re=1 vs re=2 is representative for your cases.
            >
            > Vim startup time is <0.01s.

            Sorry, it was getting late yesterday.

            Of course, Vim needs to read the big file into a buffer first. On my
            machine this takes about 0.34 seconds.

            --
            --
            You received this message from the "vim_dev" maillist.
            Do not top-post! Type your reply below the text you are replying to.
            For more information, visit http://www.vim.org/maillist.php

            ---
            You received this message because you are subscribed to the Google Groups "vim_dev" group.
            To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
            For more options, visit https://groups.google.com/groups/opt_out.
          • Bram Moolenaar
            ... Interesting and useful. Since using short text there will be quite a bit of overhead from executing the regexp though. Did you also check for correctness?
            Message 5 of 10 , May 25 4:13 AM
              Glts wrote:

              > there is simple but straightforward "Benchmark of Regex Libraries" at
              > http://lh3lh3.users.sourceforge.net/reb.shtml
              >
              > I measured search durations for both regexp engines in Vim 7.3.1010,
              > using the same regular expressions and the same data they used. I added
              > one simpler test, which is listed in the final column ("Word") below.
              >
              > Please refer to that web site for more information.
              >
              > URI Email Date Sum3 URI|Email Word
              > re=1 16.34 13.65 4.07 34.06 29.46 0.49
              > re=2 92.03 9.75 4.47 106.25 105.39 5.22
              >
              > Python 2.7.3 2.69 5.17 1.01 8.87 7.72 3.40
              > Perl 5.14.2 0.35 0.33 0.32 1.00 8.12 0.31
              > GNU egrep 2.10 0.21 0.16 0.56 0.93 10.86 0.03
              >
              > The figures (in s) are averages of five runs each, on Linux, 64-bit
              > i7-2700K CPU @ 3.50GHz x 8. (The Python, Perl, and egrep figures are
              > just for comparison with the original benchmark.)
              >
              > I made a gist with the necessary materials here:
              > https://gist.github.com/glts/5646749

              Interesting and useful. Since using short text there will be quite a
              bit of overhead from executing the regexp though.

              Did you also check for correctness? It's easy to make a regexp engine
              that's fast but wrong. And do they possibly run out of memory and
              crash? I rewrote the original Vim regexp engine to be iterative instead
              of recursive to handle that.

              I need some tests to do profiling, I could perhaps use some of this.


              --
              Our job was to build a computer information system for the branch banks. We
              were the perfect people for the job: Dean had seen a computer once, and I had
              heard Dean talk about it.
              (Scott Adams - The Dilbert principle)

              /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
              /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
              \\\ an exciting new programming language -- http://www.Zimbu.org ///
              \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

              --
              --
              You received this message from the "vim_dev" maillist.
              Do not top-post! Type your reply below the text you are replying to.
              For more information, visit http://www.vim.org/maillist.php

              ---
              You received this message because you are subscribed to the Google Groups "vim_dev" group.
              To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
              For more options, visit https://groups.google.com/groups/opt_out.
            • glts
              ... The Vim regexp engines always produced identical results, as did the Perl and Python scripts. There were a few bytes of discrepancies between Vim1/Vim2,
              Message 6 of 10 , May 25 11:18 AM
                On Sat, May 25, 2013 at 1:13 PM, Bram Moolenaar <Bram@...> wrote:
                > Did you also check for correctness? It's easy to make a regexp engine
                > that's fast but wrong. And do they possibly run out of memory and
                > crash? I rewrote the original Vim regexp engine to be iterative instead
                > of recursive to handle that.

                The Vim regexp engines always produced identical results, as did the
                Perl and Python scripts.

                There were a few bytes of discrepancies between Vim1/Vim2, Perl/Python,
                and egrep, but those were mostly due to invalid UTF-8 bytes in the data
                file. (I had wanted to use the exact same, flawed input data as the
                original benchmark, but of course this isn't necessary.)

                More challenging tests for memory usage etc. would be interesting but I
                have no idea how to go about this.

                > I need some tests to do profiling, I could perhaps use some of this.

                Sure, it's free for the taking.

                --
                --
                You received this message from the "vim_dev" maillist.
                Do not top-post! Type your reply below the text you are replying to.
                For more information, visit http://www.vim.org/maillist.php

                ---
                You received this message because you are subscribed to the Google Groups "vim_dev" group.
                To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
                For more options, visit https://groups.google.com/groups/opt_out.
              • Marc Weber
                If we are at it - and other tools are already faster, is there a chance to also add a engine 3 reusing the fast code? If we partially break with VimL (by
                Message 7 of 10 , May 25 6:46 PM
                  If we are at it - and other tools are already faster, is there a chance
                  to also add a engine 3 reusing the fast code?

                  If we partially break with VimL (by improving python support) why
                  not also add a commonly used regex syntax if speed is different.

                  I've tried debugging gnugrep which was said to be faster:
                  dfasearch.c contains:
                  if ((err = re_compile_pattern (p, len,
                  &(patterns[pcount].regexbuf))) != NULL)

                  I was not able to step into re_compile_pattern, probably because its
                  using glibc's re_compile_pattern.

                  The problem is that I don't think you can use those regular expressions
                  to search for multiline patterns if can be found in different patterns.

                  Eg there is re_search_2 which does not work with an internal "regular
                  expression matching state", but concatenates two strings:

                  /* Like `re_search', but search in the concatenation of STRING1 and
                  STRING2. Also, stop searching at index START + STOP. */
                  extern int re_search_2 (struct re_pattern_buffer *__buffer,
                  const char *__string1, int __length1,
                  const char *__string2, int __length2, int __start,
                  int __range, struct re_registers *__regs, int __stop);


                  PCRE even started jitting:
                  https://github.com/hnakamur/pcre-jit-benchmark
                  http://sljit.sourceforge.net/pcre.html

                  and it supports multi-segment matching (which I guess would be required
                  to implement vim_regexec_multi ?

                  Bram, do you think its worth a try adding pcre support?

                  It shouldn't be too hard to translate Vim regular expressions to pcre
                  syntax before compiling.

                  Why should we continue spending time on maintaining this ourselves?

                  I haven't looked at all details but I think adding pcre support is
                  straight forward.

                  Thoughts?

                  Marc Weber

                  --
                  --
                  You received this message from the "vim_dev" maillist.
                  Do not top-post! Type your reply below the text you are replying to.
                  For more information, visit http://www.vim.org/maillist.php

                  ---
                  You received this message because you are subscribed to the Google Groups "vim_dev" group.
                  To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
                  For more options, visit https://groups.google.com/groups/opt_out.
                • Bram Moolenaar
                  ... Adding another regexp syntax will confuse users. It s too easy to make mistakes. Translating one RE syntax into another works for simple things, it s
                  Message 8 of 10 , May 26 4:12 AM
                    Marc Weber wrote:

                    > If we are at it - and other tools are already faster, is there a chance
                    > to also add a engine 3 reusing the fast code?
                    >
                    > If we partially break with VimL (by improving python support) why
                    > not also add a commonly used regex syntax if speed is different.
                    >
                    > I've tried debugging gnugrep which was said to be faster:
                    > dfasearch.c contains:
                    > if ((err = re_compile_pattern (p, len,
                    > &(patterns[pcount].regexbuf))) != NULL)
                    >
                    > I was not able to step into re_compile_pattern, probably because its
                    > using glibc's re_compile_pattern.
                    >
                    > The problem is that I don't think you can use those regular expressions
                    > to search for multiline patterns if can be found in different patterns.
                    >
                    > Eg there is re_search_2 which does not work with an internal "regular
                    > expression matching state", but concatenates two strings:
                    >
                    > /* Like `re_search', but search in the concatenation of STRING1 and
                    > STRING2. Also, stop searching at index START + STOP. */
                    > extern int re_search_2 (struct re_pattern_buffer *__buffer,
                    > const char *__string1, int __length1,
                    > const char *__string2, int __length2, int __start,
                    > int __range, struct re_registers *__regs, int __stop);
                    >
                    >
                    > PCRE even started jitting:
                    > https://github.com/hnakamur/pcre-jit-benchmark
                    > http://sljit.sourceforge.net/pcre.html
                    >
                    > and it supports multi-segment matching (which I guess would be required
                    > to implement vim_regexec_multi ?
                    >
                    > Bram, do you think its worth a try adding pcre support?
                    >
                    > It shouldn't be too hard to translate Vim regular expressions to pcre
                    > syntax before compiling.
                    >
                    > Why should we continue spending time on maintaining this ourselves?
                    >
                    > I haven't looked at all details but I think adding pcre support is
                    > straight forward.

                    Adding another regexp syntax will confuse users. It's too easy to make
                    mistakes.

                    Translating one RE syntax into another works for simple things, it's
                    close to impossible for more complex things. Especially selecting the
                    right match out of alternatives is not something you can easily do by
                    changing the pattern. Getting it right for the new engine, which was
                    specifically made for Vim, already turns out to be very difficult.

                    The performance of pattern matching overal is good enough. It's only
                    for syntax highlighting that slowness has been reported. And there we
                    are using Vim expressions. Considering how slow syntax files are
                    updated, I don't expect more than a few to switch to another RE syntax.
                    They would more easily gain speed by optimizing the existing patterns.

                    --
                    The fastest way to get an engineer to solve a problem is to declare that the
                    problem is unsolvable. No engineer can walk away from an unsolvable problem
                    until it's solved.
                    (Scott Adams - The Dilbert principle)

                    /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                    /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
                    \\\ an exciting new programming language -- http://www.Zimbu.org ///
                    \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

                    --
                    --
                    You received this message from the "vim_dev" maillist.
                    Do not top-post! Type your reply below the text you are replying to.
                    For more information, visit http://www.vim.org/maillist.php

                    ---
                    You received this message because you are subscribed to the Google Groups "vim_dev" group.
                    To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
                    For more options, visit https://groups.google.com/groups/opt_out.
                  • glts
                    I ran this benchmark again for Vim 7.3.1152, and I d like to share the results with you because I think they re impressive. ... Same machine, 142 patches
                    Message 9 of 10 , Jun 9, 2013
                      I ran this benchmark again for Vim 7.3.1152, and I'd like to share the
                      results with you because I think they're impressive.

                      On Saturday, May 25, 2013 12:56:56 AM UTC+2, I wrote:
                      > URI Email Date Sum3 URI|Email Word
                      > re=1 16.34 13.65 4.07 34.06 29.46 0.49
                      > re=2 92.03 9.75 4.47 106.25 105.39 5.22
                      >
                      > Python 2.7.3 2.69 5.17 1.01 8.87 7.72 3.40
                      > Perl 5.14.2 0.35 0.33 0.32 1.00 8.12 0.31
                      > GNU egrep 2.10 0.21 0.16 0.56 0.93 10.86 0.03

                      Same machine, 142 patches later:

                      URI Email Date Sum3 URI|Email Word
                      re=1 16.73 13.84 4.08 34.65 29.43 0.49
                      re=2 2.94 3.07 1.52 7.53 5.61 2.31

                      That's a drastic improvement. Note the results for "URI|Email", where
                      Vim outperforms all of Python, Perl, and egrep. Cool!

                      --
                      --
                      You received this message from the "vim_dev" maillist.
                      Do not top-post! Type your reply below the text you are replying to.
                      For more information, visit http://www.vim.org/maillist.php

                      ---
                      You received this message because you are subscribed to the Google Groups "vim_dev" group.
                      To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
                      For more options, visit https://groups.google.com/groups/opt_out.
                    • Ben Fritz
                      ... It also looks like we re finally at the point where the NFA engine actually is faster, except for trivial matches. I assume word means a single literal
                      Message 10 of 10 , Jun 9, 2013
                        On Sunday, June 9, 2013 7:31:36 AM UTC-5, glts wrote:
                        >
                        > URI Email Date Sum3 URI|Email Word
                        >
                        > re=1 16.73 13.84 4.08 34.65 29.43 0.49
                        >
                        > re=2 2.94 3.07 1.52 7.53 5.61 2.31
                        >
                        >
                        >
                        > That's a drastic improvement. Note the results for "URI|Email", where
                        >
                        > Vim outperforms all of Python, Perl, and egrep. Cool!

                        It also looks like we're finally at the point where the NFA engine actually is faster, except for trivial matches. I assume "word" means a single literal word?

                        --
                        --
                        You received this message from the "vim_dev" maillist.
                        Do not top-post! Type your reply below the text you are replying to.
                        For more information, visit http://www.vim.org/maillist.php

                        ---
                        You received this message because you are subscribed to the Google Groups "vim_dev" group.
                        To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
                        For more options, visit https://groups.google.com/groups/opt_out.
                      Your message has been successfully submitted and would be delivered to recipients shortly.