Loading ...
Sorry, an error occurred while loading the content.

Re: (patch) new regexp

Expand Messages
  • Bram Moolenaar
    ... Thank you very much for continuing the work of last summer. It s great to see that you volunteered to continue work on this. And a very good start for a
    Message 1 of 15 , Apr 1 4:06 AM
    • 0 Attachment
      Xiaozhou Liu wrote:

      > The attached patch represents the current status of my work on the new
      > NFA based regexp engine, from last summer's GSoC project, mentored by
      > Russ Cox. I have been using this new engine myself for quite some time
      > now, so I feel it's ready for a wider review audience.
      >
      > Currently the implemented patterns are:
      > \|
      > \( \)
      > *
      > \+
      > ^
      > $
      > \<
      > \>
      >
      > The new engine will fallback to the old engine if encounters an
      > unsupported pattern.
      >
      > I have incorporated Russ' code on ordered alternation, so that behavior
      > has been inherited. Multibyte is supported, with the exception of
      > composing characters, which will be high-priority next stage work.
      >
      > The new engine passes all test cases except test44 (composing chars).
      > When run on test cases it will fallback to old engine on 55% of all
      > invocations. In my daily use however the NFA engine accounts for about
      > 80% of all regexp invocations.
      >
      > Please review the code, any comments and feedback is welcomed.
      > The code is in still very early alpha stage, of course, but if you are
      > brave enough, I encourage you to give it a try ;-). You will find that
      > pathological cases
      > (e.g. matching "\v(a+)+b" against "aaaaaaaaaaaaaaaaaaaaaaaaaaa")
      > are now gone. There is also a real-life javascript file you can find at
      > http://extjs.com/deploy/dev/adapter/ext/ext-base.js which can be used to
      > compare the performance differences on syntax highlighting between the
      > old and new engines.

      Thank you very much for continuing the work of last summer. It's great
      to see that you volunteered to continue work on this. And a very good
      start for a student to continue with it this year. It's now very likely
      we will really get to the point where this can be included in Vim.

      --
      panic("Foooooooood fight!");
      -- In the kernel source aha1542.c, after detecting a bad segment list

      /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
      /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
      \\\ download, build and distribute -- http://www.A-A-P.org ///
      \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_dev" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---
    • char101
      ... Hi, The patch does not work with Visual C++ 9.0 2008. The main problem is this macros +#define PUSH(s) { + if (stackp = stack_end) +
      Message 2 of 15 , Apr 2 11:41 PM
      • 0 Attachment
        On Apr 1, 3:53 pm, "Xiaozhou Liu" <sbb...@...> wrote:
        > The attached patch represents the current status of my work on the new
        > NFA based regexp engine, from last summer's GSoC project, mentored by
        > Russ Cox. I have been using this new engine myself for quite some time
        > now, so I feel it's ready for a wider review audience.

        Hi,

        The patch does not work with Visual C++ 9.0 2008. The main problem is
        this macros

        +#define PUSH(s) { \
        + if (stackp >= stack_end)\
        + return NULL; \
        + *stackp++ = s; \
        + }
        +
        +#define POP() ({ \
        + if (stackp <= stack) \
        + return NULL; \
        + *--stackp; \
        + })

        But even though I have changed those macros to

        #define PUSH(s) if (stackp >= stack_end) return NULL; *stackp =
        s; ++stackp;
        #define POP(var) if (stackp <= stack) return NULL; --stackp; var
        = *stackp;

        the resulting gvim executable seems to be losing its regex capability,
        for example typing abc and then searching for a does not work.

        Gvim source used : 7.1.291
        Features : both with mbyte and without mbyte

        ---
        Charles
        --~--~---------~--~----~------------~-------~--~----~
        You received this message from the "vim_dev" maillist.
        For more information, visit http://www.vim.org/maillist.php
        -~----------~----~----~----~------~----~------~--~---
      • Jürgen Krämer
        Hi, ... I don t see a difference in meaning between *stackp++ = s; and *stackp = s; ++stackp; and I don t know anything about the context or usage of the
        Message 3 of 15 , Apr 3 12:04 AM
        • 0 Attachment
          Hi,

          char101 wrote:
          > On Apr 1, 3:53 pm, "Xiaozhou Liu" <sbb...@...> wrote:
          >> The attached patch represents the current status of my work on the new
          >> NFA based regexp engine, from last summer's GSoC project, mentored by
          >> Russ Cox. I have been using this new engine myself for quite some time
          >> now, so I feel it's ready for a wider review audience.
          >
          > The patch does not work with Visual C++ 9.0 2008. The main problem is
          > this macros
          >
          > +#define PUSH(s) { \
          > + if (stackp >= stack_end)\
          > + return NULL; \
          > + *stackp++ = s; \
          > + }
          > +
          > +#define POP() ({ \
          > + if (stackp <= stack) \
          > + return NULL; \
          > + *--stackp; \
          > + })
          >
          > But even though I have changed those macros to
          >
          > #define PUSH(s) if (stackp >= stack_end) return NULL; *stackp =
          > s; ++stackp;
          > #define POP(var) if (stackp <= stack) return NULL; --stackp; var
          > = *stackp;
          >
          > the resulting gvim executable seems to be losing its regex capability,
          > for example typing abc and then searching for a does not work.

          I don't see a difference in meaning between "*stackp++ = s;" and
          "*stackp = s; ++stackp;" and I don't know anything about the context or
          usage of the original patch, but your change might break existing code,
          e.g.,

          if ( test )
          PUSH(anything)

          will expand differently because of the missing braces.

          Actually, I would write macros that expand to multiple statements as

          #define PUSH(s) do { \
          if (stackp >= stack_end)\
          return NULL; \
          *stackp++ = s; \
          } while (0)

          #define POP() do { \
          if (stackp <= stack) \
          return NULL; \
          *--stackp; \
          } while (0)

          so they can always be used like in

          if ( test )
          PUSH(anything);

          (note the additional semicolon), which is more in accordance with the
          normal C syntax.

          Regards,
          Jürgen

          --
          Sometimes I think the surest sign that intelligent life exists elsewhere
          in the universe is that none of it has tried to contact us. (Calvin)

          --~--~---------~--~----~------------~-------~--~----~
          You received this message from the "vim_dev" maillist.
          For more information, visit http://www.vim.org/maillist.php
          -~----------~----~----~----~------~----~------~--~---
        • char101
          ... You re right, I just split them out for clarity since I m not a C programmer. ... No, I have checked them out, the macroes are defined in a function block
          Message 4 of 15 , Apr 3 12:23 AM
          • 0 Attachment
            On Apr 3, 2:04 pm, Jürgen Krämer <jottka...@...> wrote:
            > I don't see a difference in meaning between "*stackp++ = s;" and
            > "*stackp = s; ++stackp;"

            You're right, I just split them out for clarity since I'm not a C
            programmer.

            > and I don't know anything about the context or
            > usage of the original patch, but your change might break existing code,
            > e.g.,

            No, I have checked them out, the macroes are defined in a function
            block and then undefined immediately before the functions ends. And
            these macros are not used after any if() for() while(), etc.

            >   #define PUSH(s)       do {                        \
            >                     if (stackp >= stack_end)\
            >                         return NULL;        \
            >                      *stackp++ = s;         \
            >                 } while (0)
            >
            >   #define POP()         do {                        \
            >                     if (stackp <= stack)    \
            >                         return NULL;        \
            >                     *--stackp;              \
            >                  } while (0)

            For PUSH it will be fine, but for POP I don't think that it'd work
            since it is used as an expression:

            e = POP();

            ---
            Charles
            --~--~---------~--~----~------------~-------~--~----~
            You received this message from the "vim_dev" maillist.
            For more information, visit http://www.vim.org/maillist.php
            -~----------~----~----~----~------~----~------~--~---
          • Nico Weber
            Hi, ... ({ ... }) is a gcc extension that converts statements to expressions. The expression has the value of the last statement in the block (
            Message 5 of 15 , Apr 3 5:18 AM
            • 0 Attachment
              Hi,

              > The patch does not work with Visual C++ 9.0 2008. The main problem is
              > this macros
              >
              > +#define PUSH(s) { \
              > + if (stackp >= stack_end)\
              > + return NULL; \
              > + *stackp++ = s; \
              > + }
              > +
              > +#define POP() ({ \
              > + if (stackp <= stack) \
              > + return NULL; \
              > + *--stackp; \
              > + })
              >
              > But even though I have changed those macros to
              >
              > #define PUSH(s) if (stackp >= stack_end) return NULL; *stackp =
              > s; ++stackp;
              > #define POP(var) if (stackp <= stack) return NULL; --stackp; var
              > = *stackp;
              >
              > the resulting gvim executable seems to be losing its regex capability,
              > for example typing abc and then searching for a does not work.

              ({ ... }) is a gcc extension that converts statements to expressions.
              The expression has the value of the last statement in the block ( http://gcc.gnu.org/onlinedocs/gcc-2.95.3/gcc_4.html#SEC62
              ). That is, the macro can be used as

              a = PUSH(s);

              or in other places where an expression is expected. If you convert the
              macro to a function, it should work on all compilers (and I guess it
              won't even be slower since the compiler will inline the function call
              automatically).

              Nico

              --~--~---------~--~----~------------~-------~--~----~
              You received this message from the "vim_dev" maillist.
              For more information, visit http://www.vim.org/maillist.php
              -~----------~----~----~----~------~----~------~--~---
            • Antony Scriven
              ... Looks like a gcc extension to me. Compiling with with -Wall -Wextra -ansi -pedantic picks up this sort of thing. Why not use ?: for the macros? --Antony
              Message 6 of 15 , Apr 3 5:41 AM
              • 0 Attachment
                On 03/04/2008, char101 <peacech@...> wrote:

                > On Apr 1, 3:53 pm, "Xiaozhou Liu" <sbb...@...> wrote:
                >
                > [...]
                >
                > The patch does not work with Visual C++ 9.0 2008. The
                > main problem is this macros
                >
                > +#define PUSH(s) { \
                > + if (stackp>= stack_end)\
                > + return NULL; \
                > + *stackp++ = s; \
                > + }
                > +
                > +#define POP() ({ \
                > + if (stackp <= stack) \
                > + return NULL; \
                > + *--stackp; \
                > + })

                Looks like a gcc extension to me. Compiling with with -Wall
                -Wextra -ansi -pedantic picks up this sort of thing. Why not
                use ?: for the macros? --Antony

                --~--~---------~--~----~------------~-------~--~----~
                You received this message from the "vim_dev" maillist.
                For more information, visit http://www.vim.org/maillist.php
                -~----------~----~----~----~------~----~------~--~---
              • char101
                ... After debugging the executable, it seems that the fault was 100% mine :D There is a compile error on regexp.c on this function which initial content was +
                Message 7 of 15 , Apr 3 6:16 AM
                • 0 Attachment
                  On Apr 3, 1:41 pm, char101 <peac...@...> wrote:
                  > the resulting gvim executable seems to be losing its regex capability,
                  > for example typing abc and then searching for a does not work.

                  After debugging the executable, it seems that the fault was 100%
                  mine :D

                  There is a compile error on regexp.c on this function which initial
                  content was

                  + regprog_T *
                  +vim_regcomp(expr, re_flags)
                  + char_u *expr;
                  + int re_flags;
                  +{
                  + regprog_T *prog = nfa_regengine.regcomp(expr, re_flags);
                  +
                  + return prog ? : bt_regengine.regcomp(expr, re_flags);
                  +}

                  and without much thought I just change it into

                  return prog ? NULL : bt_regengine.regcomp(expr, re_flags);

                  which should actually be

                  return prog ? bt_regengine.regcomp(expr, re_flags) : NULL;

                  So now the regexp works, but I got this error with tSkeleton plugin

                  -------------------------------------------------------- Start of
                  error
                  Error detected while processing function
                  tskeleton#PrepareBits..<SNR>23_Prepar
                  eBuffer..<SNR>23_CollectFunctions..tskeleton#Initialize:

                  line 3:

                  E714: List required

                  Error detected while processing function
                  tskeleton#PrepareBits..<SNR>23_Prepar
                  eBuffer..<SNR>23_CollectFunctions:

                  line 4:

                  E714: List required

                  Error detected while processing function
                  tskeleton#PrepareBits..<SNR>23_Prepar
                  eBuffer..<SNR>23_CollectFunctions..tskeleton#Initialize:

                  line 3:

                  E714: List required

                  Error detected while processing function
                  tskeleton#PrepareBits..<SNR>23_Prepar
                  eBuffer..<SNR>23_CollectFunctions:

                  line 4:

                  E714: List required
                  -------------------------------------- End of error

                  But I cannot look into tSkeleton file since I am having difficulty
                  reading those error lines. Can anyone help translating those error
                  messages, i.e. in which function does the error happens?

                  ---
                  Charles
                  --~--~---------~--~----~------------~-------~--~----~
                  You received this message from the "vim_dev" maillist.
                  For more information, visit http://www.vim.org/maillist.php
                  -~----------~----~----~----~------~----~------~--~---
                • James Vega
                  ... Based on previous discussions on the list, I don t think that s the intended behavior either. I m pretty sure the logic is If NFA compiler worked, return
                  Message 8 of 15 , Apr 3 6:48 AM
                  • 0 Attachment
                    On Thu, Apr 03, 2008 at 06:16:32AM -0700, char101 wrote:
                    >
                    > On Apr 3, 1:41 pm, char101 <peac...@...> wrote:
                    > > the resulting gvim executable seems to be losing its regex capability,
                    > > for example typing abc and then searching for a does not work.
                    >
                    > After debugging the executable, it seems that the fault was 100%
                    > mine :D
                    >
                    > There is a compile error on regexp.c on this function which initial
                    > content was
                    >
                    > + regprog_T *
                    > +vim_regcomp(expr, re_flags)
                    > + char_u *expr;
                    > + int re_flags;
                    > +{
                    > + regprog_T *prog = nfa_regengine.regcomp(expr, re_flags);
                    > +
                    > + return prog ? : bt_regengine.regcomp(expr, re_flags);
                    > +}
                    >
                    > and without much thought I just change it into
                    >
                    > return prog ? NULL : bt_regengine.regcomp(expr, re_flags);
                    >
                    > which should actually be
                    >
                    > return prog ? bt_regengine.regcomp(expr, re_flags) : NULL;

                    Based on previous discussions on the list, I don't think that's the
                    intended behavior either. I'm pretty sure the logic is "If NFA compiler
                    worked, return that otherwise return the BT compiled regex." This would
                    translate to:

                    return prog ? prog : bt_regengine.regcomp(expr, re_flags);

                    --
                    James
                    GPG Key: 1024D/61326D40 2003-09-02 James Vega <jamessan@...>
                  • char101
                    ... Thanks James, I changed it, works fine now Benchmark result using ext-js.vim (507 849 Bytes javascript file) Try1: Gvim.exe: 11.1533138743 Gvim-re.exe:
                    Message 9 of 15 , Apr 3 7:26 AM
                    • 0 Attachment
                      On Apr 3, 8:48 pm, James Vega <james...@...> wrote:
                      > Based on previous discussions on the list, I don't think that's the
                      > intended behavior either.  I'm pretty sure the logic is "If NFA compiler
                      > worked, return that otherwise return the BT compiled regex."  This would
                      > translate to:
                      >
                      >   return prog ? prog : bt_regengine.regcomp(expr, re_flags);

                      Thanks James, I changed it, works fine now

                      Benchmark result using ext-js.vim (507 849 Bytes javascript file)

                      Try1:
                      Gvim.exe: 11.1533138743
                      Gvim-re.exe: 39.4426857432

                      Try2:
                      Gvim.exe: 10.8661471241
                      Gvim-re.exe: 39.3700461863

                      Gvim-re is the one using the regexp patch.

                      The benchmark script (in Python)

                      import os
                      from time import clock, sleep

                      clock()
                      os.system('gvim-re.exe -c "runtime plugin/tohtml.vim" -c "wq!" ext-
                      all.js')
                      t1 = clock()
                      os.system('gvim.exe -c "runtime plugin/tohtml.vim" -c "wq!" ext-
                      all.js')
                      t2 = clock()

                      # This is try2, for try1 I just swap those commands above
                      print "Gvim.exe: ", t2-t1
                      print "Gvim-re.exe: ", t1

                      Try3:
                      Gvim.exe: 10.9457255287
                      Gvim-re.exe: 39.4854167767

                      Try3 is just running gvim.exe -c "wq!" ext-all.js

                      ---
                      Charles
                      --~--~---------~--~----~------------~-------~--~----~
                      You received this message from the "vim_dev" maillist.
                      For more information, visit http://www.vim.org/maillist.php
                      -~----------~----~----~----~------~----~------~--~---
                    • Nico Weber
                      ... Have compiled both versions yourself or is gvim.exe the official build? ... How are the results if you just do `gvim[-re].exe -u NONE -U NONE -c wq!`? Nico
                      Message 10 of 15 , Apr 4 1:43 AM
                      • 0 Attachment
                        > Benchmark result using ext-js.vim (507 849 Bytes javascript file)
                        >
                        > Try1:
                        > Gvim.exe: 11.1533138743
                        > Gvim-re.exe: 39.4426857432

                        Have compiled both versions yourself or is gvim.exe the official build?

                        > Try3 is just running gvim.exe -c "wq!" ext-all.js

                        How are the results if you just do `gvim[-re].exe -u NONE -U NONE -c
                        wq!`?

                        Nico

                        --~--~---------~--~----~------------~-------~--~----~
                        You received this message from the "vim_dev" maillist.
                        For more information, visit http://www.vim.org/maillist.php
                        -~----------~----~----~----~------~----~------~--~---
                      • Charles
                        ... I compiled both myself. Same settings, same compiler (VC++ 9). Optimization=MAXSPEED (with LTCG), processor=pentium4. Features included: OLE, Python, Mbyte
                        Message 11 of 15 , Apr 4 2:13 AM
                        • 0 Attachment
                          On Fri, Apr 4, 2008 at 3:43 PM, Nico Weber <nicolasweber@...> wrote:
                          > Have compiled both versions yourself or is gvim.exe the official build?

                          I compiled both myself. Same settings, same compiler (VC++ 9).
                          Optimization=MAXSPEED (with LTCG), processor=pentium4. Features
                          included: OLE, Python, Mbyte

                          > How are the results if you just do `gvim[-re].exe -u NONE -U NONE -c
                          > wq!`?

                          gvim.exe 0.279026812268
                          gvim-re.exe 0.263482935745

                          test.py
                          ---
                          import os
                          from time import clock

                          clock()
                          for i in range(0, 10):
                          os.system('gvim.exe -u NONE -U NONE -c q')
                          t1 = clock()
                          for i in range(0, 10):
                          os.system('gvim-re.exe -u NONE -U NONE -c q')
                          t2 = clock()

                          print "gvim.exe", t1/10
                          print "gvim-re.exe", (t2-t1)/10

                          --~--~---------~--~----~------------~-------~--~----~
                          You received this message from the "vim_dev" maillist.
                          For more information, visit http://www.vim.org/maillist.php
                          -~----------~----~----~----~------~----~------~--~---
                        • Charles
                          ... Another interpretation of your command sample: Gvim.exe: 0.395710994776 Gvim-re.exe: 0.495461553009 This is just loading the ext-all.js without filetype
                          Message 12 of 15 , Apr 4 2:23 AM
                          • 0 Attachment
                            On Fri, Apr 4, 2008 at 3:43 PM, Nico Weber <nicolasweber@...> wrote:
                            > How are the results if you just do `gvim[-re].exe -u NONE -U NONE -c
                            > wq!`?

                            Another interpretation of your command sample:

                            Gvim.exe: 0.395710994776
                            Gvim-re.exe: 0.495461553009

                            This is just loading the ext-all.js without filetype detection of
                            syntax highlighting

                            ---
                            os.system('gvim.exe -u NONE -U NONE -c "wq!" ext-all.js')
                            os.system('gvim-re.exe -u NONE -U NONE -c "wq!" ext-all.js')
                            ---

                            Gvim.exe: 5.40134745013
                            Gvim-re.exe: 19.7571365646

                            Loading with syntax highlighting independent of any (g)vimrc settings.

                            ---
                            os.system('gvim.exe -u NONE -U NONE --cmd "set guioptions+=M" --cmd
                            "filetype on" --cmd "syntax on" -c "wq!" ext-all.js')
                            os.system('gvim-re.exe -u NONE -U NONE --cmd "set guioptions+=M" --cmd
                            "filetype on" --cmd "syntax on" -c "wq!" ext-all.js')
                            ---

                            --~--~---------~--~----~------------~-------~--~----~
                            You received this message from the "vim_dev" maillist.
                            For more information, visit http://www.vim.org/maillist.php
                            -~----------~----~----~----~------~----~------~--~---
                          • Xiaozhou Liu
                            ... Thanks for your testing! I will remove gcc extension in the code and check for what makes the new regexp so slow. - Xiaozhou
                            Message 13 of 15 , Apr 5 6:13 PM
                            • 0 Attachment
                              On Fri, Apr 4, 2008 at 5:23 PM, Charles <peacech@...> wrote:
                              >
                              > On Fri, Apr 4, 2008 at 3:43 PM, Nico Weber <nicolasweber@...> wrote:
                              >
                              > > How are the results if you just do `gvim[-re].exe -u NONE -U NONE -c
                              > > wq!`?
                              >
                              > Another interpretation of your command sample:
                              >
                              > Gvim.exe: 0.395710994776
                              > Gvim-re.exe: 0.495461553009
                              >
                              > This is just loading the ext-all.js without filetype detection of
                              > syntax highlighting
                              >
                              > ---
                              > os.system('gvim.exe -u NONE -U NONE -c "wq!" ext-all.js')
                              > os.system('gvim-re.exe -u NONE -U NONE -c "wq!" ext-all.js')
                              > ---
                              >
                              > Gvim.exe: 5.40134745013
                              > Gvim-re.exe: 19.7571365646
                              >
                              > Loading with syntax highlighting independent of any (g)vimrc settings.
                              >
                              > ---
                              > os.system('gvim.exe -u NONE -U NONE --cmd "set guioptions+=M" --cmd
                              > "filetype on" --cmd "syntax on" -c "wq!" ext-all.js')
                              > os.system('gvim-re.exe -u NONE -U NONE --cmd "set guioptions+=M" --cmd
                              > "filetype on" --cmd "syntax on" -c "wq!" ext-all.js')
                              > ---


                              Thanks for your testing!
                              I will remove gcc extension in the code and check for what
                              makes the new regexp so slow.

                              - Xiaozhou

                              --~--~---------~--~----~------------~-------~--~----~
                              You received this message from the "vim_dev" maillist.
                              For more information, visit http://www.vim.org/maillist.php
                              -~----------~----~----~----~------~----~------~--~---
                            Your message has been successfully submitted and would be delivered to recipients shortly.