Loading ...
Sorry, an error occurred while loading the content.

Re: Some performance figures for the new regexp engine

Expand Messages
  • Bram Moolenaar
    ... Interesting and useful. Since using short text there will be quite a bit of overhead from executing the regexp though. Did you also check for correctness?
    Message 1 of 10 , May 25, 2013
    • 0 Attachment
      Glts wrote:

      > there is simple but straightforward "Benchmark of Regex Libraries" at
      > http://lh3lh3.users.sourceforge.net/reb.shtml
      >
      > I measured search durations for both regexp engines in Vim 7.3.1010,
      > using the same regular expressions and the same data they used. I added
      > one simpler test, which is listed in the final column ("Word") below.
      >
      > Please refer to that web site for more information.
      >
      > URI Email Date Sum3 URI|Email Word
      > re=1 16.34 13.65 4.07 34.06 29.46 0.49
      > re=2 92.03 9.75 4.47 106.25 105.39 5.22
      >
      > Python 2.7.3 2.69 5.17 1.01 8.87 7.72 3.40
      > Perl 5.14.2 0.35 0.33 0.32 1.00 8.12 0.31
      > GNU egrep 2.10 0.21 0.16 0.56 0.93 10.86 0.03
      >
      > The figures (in s) are averages of five runs each, on Linux, 64-bit
      > i7-2700K CPU @ 3.50GHz x 8. (The Python, Perl, and egrep figures are
      > just for comparison with the original benchmark.)
      >
      > I made a gist with the necessary materials here:
      > https://gist.github.com/glts/5646749

      Interesting and useful. Since using short text there will be quite a
      bit of overhead from executing the regexp though.

      Did you also check for correctness? It's easy to make a regexp engine
      that's fast but wrong. And do they possibly run out of memory and
      crash? I rewrote the original Vim regexp engine to be iterative instead
      of recursive to handle that.

      I need some tests to do profiling, I could perhaps use some of this.


      --
      Our job was to build a computer information system for the branch banks. We
      were the perfect people for the job: Dean had seen a computer once, and I had
      heard Dean talk about it.
      (Scott Adams - The Dilbert principle)

      /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
      /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
      \\\ an exciting new programming language -- http://www.Zimbu.org ///
      \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

      --
      --
      You received this message from the "vim_dev" maillist.
      Do not top-post! Type your reply below the text you are replying to.
      For more information, visit http://www.vim.org/maillist.php

      ---
      You received this message because you are subscribed to the Google Groups "vim_dev" group.
      To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
      For more options, visit https://groups.google.com/groups/opt_out.
    • glts
      ... The Vim regexp engines always produced identical results, as did the Perl and Python scripts. There were a few bytes of discrepancies between Vim1/Vim2,
      Message 2 of 10 , May 25, 2013
      • 0 Attachment
        On Sat, May 25, 2013 at 1:13 PM, Bram Moolenaar <Bram@...> wrote:
        > Did you also check for correctness? It's easy to make a regexp engine
        > that's fast but wrong. And do they possibly run out of memory and
        > crash? I rewrote the original Vim regexp engine to be iterative instead
        > of recursive to handle that.

        The Vim regexp engines always produced identical results, as did the
        Perl and Python scripts.

        There were a few bytes of discrepancies between Vim1/Vim2, Perl/Python,
        and egrep, but those were mostly due to invalid UTF-8 bytes in the data
        file. (I had wanted to use the exact same, flawed input data as the
        original benchmark, but of course this isn't necessary.)

        More challenging tests for memory usage etc. would be interesting but I
        have no idea how to go about this.

        > I need some tests to do profiling, I could perhaps use some of this.

        Sure, it's free for the taking.

        --
        --
        You received this message from the "vim_dev" maillist.
        Do not top-post! Type your reply below the text you are replying to.
        For more information, visit http://www.vim.org/maillist.php

        ---
        You received this message because you are subscribed to the Google Groups "vim_dev" group.
        To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
        For more options, visit https://groups.google.com/groups/opt_out.
      • Marc Weber
        If we are at it - and other tools are already faster, is there a chance to also add a engine 3 reusing the fast code? If we partially break with VimL (by
        Message 3 of 10 , May 25, 2013
        • 0 Attachment
          If we are at it - and other tools are already faster, is there a chance
          to also add a engine 3 reusing the fast code?

          If we partially break with VimL (by improving python support) why
          not also add a commonly used regex syntax if speed is different.

          I've tried debugging gnugrep which was said to be faster:
          dfasearch.c contains:
          if ((err = re_compile_pattern (p, len,
          &(patterns[pcount].regexbuf))) != NULL)

          I was not able to step into re_compile_pattern, probably because its
          using glibc's re_compile_pattern.

          The problem is that I don't think you can use those regular expressions
          to search for multiline patterns if can be found in different patterns.

          Eg there is re_search_2 which does not work with an internal "regular
          expression matching state", but concatenates two strings:

          /* Like `re_search', but search in the concatenation of STRING1 and
          STRING2. Also, stop searching at index START + STOP. */
          extern int re_search_2 (struct re_pattern_buffer *__buffer,
          const char *__string1, int __length1,
          const char *__string2, int __length2, int __start,
          int __range, struct re_registers *__regs, int __stop);


          PCRE even started jitting:
          https://github.com/hnakamur/pcre-jit-benchmark
          http://sljit.sourceforge.net/pcre.html

          and it supports multi-segment matching (which I guess would be required
          to implement vim_regexec_multi ?

          Bram, do you think its worth a try adding pcre support?

          It shouldn't be too hard to translate Vim regular expressions to pcre
          syntax before compiling.

          Why should we continue spending time on maintaining this ourselves?

          I haven't looked at all details but I think adding pcre support is
          straight forward.

          Thoughts?

          Marc Weber

          --
          --
          You received this message from the "vim_dev" maillist.
          Do not top-post! Type your reply below the text you are replying to.
          For more information, visit http://www.vim.org/maillist.php

          ---
          You received this message because you are subscribed to the Google Groups "vim_dev" group.
          To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
          For more options, visit https://groups.google.com/groups/opt_out.
        • Bram Moolenaar
          ... Adding another regexp syntax will confuse users. It s too easy to make mistakes. Translating one RE syntax into another works for simple things, it s
          Message 4 of 10 , May 26, 2013
          • 0 Attachment
            Marc Weber wrote:

            > If we are at it - and other tools are already faster, is there a chance
            > to also add a engine 3 reusing the fast code?
            >
            > If we partially break with VimL (by improving python support) why
            > not also add a commonly used regex syntax if speed is different.
            >
            > I've tried debugging gnugrep which was said to be faster:
            > dfasearch.c contains:
            > if ((err = re_compile_pattern (p, len,
            > &(patterns[pcount].regexbuf))) != NULL)
            >
            > I was not able to step into re_compile_pattern, probably because its
            > using glibc's re_compile_pattern.
            >
            > The problem is that I don't think you can use those regular expressions
            > to search for multiline patterns if can be found in different patterns.
            >
            > Eg there is re_search_2 which does not work with an internal "regular
            > expression matching state", but concatenates two strings:
            >
            > /* Like `re_search', but search in the concatenation of STRING1 and
            > STRING2. Also, stop searching at index START + STOP. */
            > extern int re_search_2 (struct re_pattern_buffer *__buffer,
            > const char *__string1, int __length1,
            > const char *__string2, int __length2, int __start,
            > int __range, struct re_registers *__regs, int __stop);
            >
            >
            > PCRE even started jitting:
            > https://github.com/hnakamur/pcre-jit-benchmark
            > http://sljit.sourceforge.net/pcre.html
            >
            > and it supports multi-segment matching (which I guess would be required
            > to implement vim_regexec_multi ?
            >
            > Bram, do you think its worth a try adding pcre support?
            >
            > It shouldn't be too hard to translate Vim regular expressions to pcre
            > syntax before compiling.
            >
            > Why should we continue spending time on maintaining this ourselves?
            >
            > I haven't looked at all details but I think adding pcre support is
            > straight forward.

            Adding another regexp syntax will confuse users. It's too easy to make
            mistakes.

            Translating one RE syntax into another works for simple things, it's
            close to impossible for more complex things. Especially selecting the
            right match out of alternatives is not something you can easily do by
            changing the pattern. Getting it right for the new engine, which was
            specifically made for Vim, already turns out to be very difficult.

            The performance of pattern matching overal is good enough. It's only
            for syntax highlighting that slowness has been reported. And there we
            are using Vim expressions. Considering how slow syntax files are
            updated, I don't expect more than a few to switch to another RE syntax.
            They would more easily gain speed by optimizing the existing patterns.

            --
            The fastest way to get an engineer to solve a problem is to declare that the
            problem is unsolvable. No engineer can walk away from an unsolvable problem
            until it's solved.
            (Scott Adams - The Dilbert principle)

            /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
            /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
            \\\ an exciting new programming language -- http://www.Zimbu.org ///
            \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

            --
            --
            You received this message from the "vim_dev" maillist.
            Do not top-post! Type your reply below the text you are replying to.
            For more information, visit http://www.vim.org/maillist.php

            ---
            You received this message because you are subscribed to the Google Groups "vim_dev" group.
            To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
            For more options, visit https://groups.google.com/groups/opt_out.
          • glts
            I ran this benchmark again for Vim 7.3.1152, and I d like to share the results with you because I think they re impressive. ... Same machine, 142 patches
            Message 5 of 10 , Jun 9, 2013
            • 0 Attachment
              I ran this benchmark again for Vim 7.3.1152, and I'd like to share the
              results with you because I think they're impressive.

              On Saturday, May 25, 2013 12:56:56 AM UTC+2, I wrote:
              > URI Email Date Sum3 URI|Email Word
              > re=1 16.34 13.65 4.07 34.06 29.46 0.49
              > re=2 92.03 9.75 4.47 106.25 105.39 5.22
              >
              > Python 2.7.3 2.69 5.17 1.01 8.87 7.72 3.40
              > Perl 5.14.2 0.35 0.33 0.32 1.00 8.12 0.31
              > GNU egrep 2.10 0.21 0.16 0.56 0.93 10.86 0.03

              Same machine, 142 patches later:

              URI Email Date Sum3 URI|Email Word
              re=1 16.73 13.84 4.08 34.65 29.43 0.49
              re=2 2.94 3.07 1.52 7.53 5.61 2.31

              That's a drastic improvement. Note the results for "URI|Email", where
              Vim outperforms all of Python, Perl, and egrep. Cool!

              --
              --
              You received this message from the "vim_dev" maillist.
              Do not top-post! Type your reply below the text you are replying to.
              For more information, visit http://www.vim.org/maillist.php

              ---
              You received this message because you are subscribed to the Google Groups "vim_dev" group.
              To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
              For more options, visit https://groups.google.com/groups/opt_out.
            • Ben Fritz
              ... It also looks like we re finally at the point where the NFA engine actually is faster, except for trivial matches. I assume word means a single literal
              Message 6 of 10 , Jun 9, 2013
              • 0 Attachment
                On Sunday, June 9, 2013 7:31:36 AM UTC-5, glts wrote:
                >
                > URI Email Date Sum3 URI|Email Word
                >
                > re=1 16.73 13.84 4.08 34.65 29.43 0.49
                >
                > re=2 2.94 3.07 1.52 7.53 5.61 2.31
                >
                >
                >
                > That's a drastic improvement. Note the results for "URI|Email", where
                >
                > Vim outperforms all of Python, Perl, and egrep. Cool!

                It also looks like we're finally at the point where the NFA engine actually is faster, except for trivial matches. I assume "word" means a single literal word?

                --
                --
                You received this message from the "vim_dev" maillist.
                Do not top-post! Type your reply below the text you are replying to.
                For more information, visit http://www.vim.org/maillist.php

                ---
                You received this message because you are subscribed to the Google Groups "vim_dev" group.
                To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
                For more options, visit https://groups.google.com/groups/opt_out.
              Your message has been successfully submitted and would be delivered to recipients shortly.