Loading ...
Sorry, an error occurred while loading the content.
 

Re: (patch) new regexp

Expand Messages
  • Xiaozhou Liu
    This patch is against 7.1.236 or beyond. -Xiaozhou --~--~---------~--~----~------------~-------~--~----~ You received this message from the vim_dev maillist.
    Message 1 of 15 , Apr 1, 2008
      This patch is against 7.1.236 or beyond.

      -Xiaozhou

      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_dev" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---
    • Bram Moolenaar
      ... Thank you very much for continuing the work of last summer. It s great to see that you volunteered to continue work on this. And a very good start for a
      Message 2 of 15 , Apr 1, 2008
        Xiaozhou Liu wrote:

        > The attached patch represents the current status of my work on the new
        > NFA based regexp engine, from last summer's GSoC project, mentored by
        > Russ Cox. I have been using this new engine myself for quite some time
        > now, so I feel it's ready for a wider review audience.
        >
        > Currently the implemented patterns are:
        > \|
        > \( \)
        > *
        > \+
        > ^
        > $
        > \<
        > \>
        >
        > The new engine will fallback to the old engine if encounters an
        > unsupported pattern.
        >
        > I have incorporated Russ' code on ordered alternation, so that behavior
        > has been inherited. Multibyte is supported, with the exception of
        > composing characters, which will be high-priority next stage work.
        >
        > The new engine passes all test cases except test44 (composing chars).
        > When run on test cases it will fallback to old engine on 55% of all
        > invocations. In my daily use however the NFA engine accounts for about
        > 80% of all regexp invocations.
        >
        > Please review the code, any comments and feedback is welcomed.
        > The code is in still very early alpha stage, of course, but if you are
        > brave enough, I encourage you to give it a try ;-). You will find that
        > pathological cases
        > (e.g. matching "\v(a+)+b" against "aaaaaaaaaaaaaaaaaaaaaaaaaaa")
        > are now gone. There is also a real-life javascript file you can find at
        > http://extjs.com/deploy/dev/adapter/ext/ext-base.js which can be used to
        > compare the performance differences on syntax highlighting between the
        > old and new engines.

        Thank you very much for continuing the work of last summer. It's great
        to see that you volunteered to continue work on this. And a very good
        start for a student to continue with it this year. It's now very likely
        we will really get to the point where this can be included in Vim.

        --
        panic("Foooooooood fight!");
        -- In the kernel source aha1542.c, after detecting a bad segment list

        /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
        /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
        \\\ download, build and distribute -- http://www.A-A-P.org ///
        \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

        --~--~---------~--~----~------------~-------~--~----~
        You received this message from the "vim_dev" maillist.
        For more information, visit http://www.vim.org/maillist.php
        -~----------~----~----~----~------~----~------~--~---
      • char101
        ... Hi, The patch does not work with Visual C++ 9.0 2008. The main problem is this macros +#define PUSH(s) { + if (stackp = stack_end) +
        Message 3 of 15 , Apr 2, 2008
          On Apr 1, 3:53 pm, "Xiaozhou Liu" <sbb...@...> wrote:
          > The attached patch represents the current status of my work on the new
          > NFA based regexp engine, from last summer's GSoC project, mentored by
          > Russ Cox. I have been using this new engine myself for quite some time
          > now, so I feel it's ready for a wider review audience.

          Hi,

          The patch does not work with Visual C++ 9.0 2008. The main problem is
          this macros

          +#define PUSH(s) { \
          + if (stackp >= stack_end)\
          + return NULL; \
          + *stackp++ = s; \
          + }
          +
          +#define POP() ({ \
          + if (stackp <= stack) \
          + return NULL; \
          + *--stackp; \
          + })

          But even though I have changed those macros to

          #define PUSH(s) if (stackp >= stack_end) return NULL; *stackp =
          s; ++stackp;
          #define POP(var) if (stackp <= stack) return NULL; --stackp; var
          = *stackp;

          the resulting gvim executable seems to be losing its regex capability,
          for example typing abc and then searching for a does not work.

          Gvim source used : 7.1.291
          Features : both with mbyte and without mbyte

          ---
          Charles
          --~--~---------~--~----~------------~-------~--~----~
          You received this message from the "vim_dev" maillist.
          For more information, visit http://www.vim.org/maillist.php
          -~----------~----~----~----~------~----~------~--~---
        • Jürgen Krämer
          Hi, ... I don t see a difference in meaning between *stackp++ = s; and *stackp = s; ++stackp; and I don t know anything about the context or usage of the
          Message 4 of 15 , Apr 3, 2008
            Hi,

            char101 wrote:
            > On Apr 1, 3:53 pm, "Xiaozhou Liu" <sbb...@...> wrote:
            >> The attached patch represents the current status of my work on the new
            >> NFA based regexp engine, from last summer's GSoC project, mentored by
            >> Russ Cox. I have been using this new engine myself for quite some time
            >> now, so I feel it's ready for a wider review audience.
            >
            > The patch does not work with Visual C++ 9.0 2008. The main problem is
            > this macros
            >
            > +#define PUSH(s) { \
            > + if (stackp >= stack_end)\
            > + return NULL; \
            > + *stackp++ = s; \
            > + }
            > +
            > +#define POP() ({ \
            > + if (stackp <= stack) \
            > + return NULL; \
            > + *--stackp; \
            > + })
            >
            > But even though I have changed those macros to
            >
            > #define PUSH(s) if (stackp >= stack_end) return NULL; *stackp =
            > s; ++stackp;
            > #define POP(var) if (stackp <= stack) return NULL; --stackp; var
            > = *stackp;
            >
            > the resulting gvim executable seems to be losing its regex capability,
            > for example typing abc and then searching for a does not work.

            I don't see a difference in meaning between "*stackp++ = s;" and
            "*stackp = s; ++stackp;" and I don't know anything about the context or
            usage of the original patch, but your change might break existing code,
            e.g.,

            if ( test )
            PUSH(anything)

            will expand differently because of the missing braces.

            Actually, I would write macros that expand to multiple statements as

            #define PUSH(s) do { \
            if (stackp >= stack_end)\
            return NULL; \
            *stackp++ = s; \
            } while (0)

            #define POP() do { \
            if (stackp <= stack) \
            return NULL; \
            *--stackp; \
            } while (0)

            so they can always be used like in

            if ( test )
            PUSH(anything);

            (note the additional semicolon), which is more in accordance with the
            normal C syntax.

            Regards,
            Jürgen

            --
            Sometimes I think the surest sign that intelligent life exists elsewhere
            in the universe is that none of it has tried to contact us. (Calvin)

            --~--~---------~--~----~------------~-------~--~----~
            You received this message from the "vim_dev" maillist.
            For more information, visit http://www.vim.org/maillist.php
            -~----------~----~----~----~------~----~------~--~---
          • char101
            ... You re right, I just split them out for clarity since I m not a C programmer. ... No, I have checked them out, the macroes are defined in a function block
            Message 5 of 15 , Apr 3, 2008
              On Apr 3, 2:04 pm, Jürgen Krämer <jottka...@...> wrote:
              > I don't see a difference in meaning between "*stackp++ = s;" and
              > "*stackp = s; ++stackp;"

              You're right, I just split them out for clarity since I'm not a C
              programmer.

              > and I don't know anything about the context or
              > usage of the original patch, but your change might break existing code,
              > e.g.,

              No, I have checked them out, the macroes are defined in a function
              block and then undefined immediately before the functions ends. And
              these macros are not used after any if() for() while(), etc.

              >   #define PUSH(s)       do {                        \
              >                     if (stackp >= stack_end)\
              >                         return NULL;        \
              >                      *stackp++ = s;         \
              >                 } while (0)
              >
              >   #define POP()         do {                        \
              >                     if (stackp <= stack)    \
              >                         return NULL;        \
              >                     *--stackp;              \
              >                  } while (0)

              For PUSH it will be fine, but for POP I don't think that it'd work
              since it is used as an expression:

              e = POP();

              ---
              Charles
              --~--~---------~--~----~------------~-------~--~----~
              You received this message from the "vim_dev" maillist.
              For more information, visit http://www.vim.org/maillist.php
              -~----------~----~----~----~------~----~------~--~---
            • Nico Weber
              Hi, ... ({ ... }) is a gcc extension that converts statements to expressions. The expression has the value of the last statement in the block (
              Message 6 of 15 , Apr 3, 2008
                Hi,

                > The patch does not work with Visual C++ 9.0 2008. The main problem is
                > this macros
                >
                > +#define PUSH(s) { \
                > + if (stackp >= stack_end)\
                > + return NULL; \
                > + *stackp++ = s; \
                > + }
                > +
                > +#define POP() ({ \
                > + if (stackp <= stack) \
                > + return NULL; \
                > + *--stackp; \
                > + })
                >
                > But even though I have changed those macros to
                >
                > #define PUSH(s) if (stackp >= stack_end) return NULL; *stackp =
                > s; ++stackp;
                > #define POP(var) if (stackp <= stack) return NULL; --stackp; var
                > = *stackp;
                >
                > the resulting gvim executable seems to be losing its regex capability,
                > for example typing abc and then searching for a does not work.

                ({ ... }) is a gcc extension that converts statements to expressions.
                The expression has the value of the last statement in the block ( http://gcc.gnu.org/onlinedocs/gcc-2.95.3/gcc_4.html#SEC62
                ). That is, the macro can be used as

                a = PUSH(s);

                or in other places where an expression is expected. If you convert the
                macro to a function, it should work on all compilers (and I guess it
                won't even be slower since the compiler will inline the function call
                automatically).

                Nico

                --~--~---------~--~----~------------~-------~--~----~
                You received this message from the "vim_dev" maillist.
                For more information, visit http://www.vim.org/maillist.php
                -~----------~----~----~----~------~----~------~--~---
              • Antony Scriven
                ... Looks like a gcc extension to me. Compiling with with -Wall -Wextra -ansi -pedantic picks up this sort of thing. Why not use ?: for the macros? --Antony
                Message 7 of 15 , Apr 3, 2008
                  On 03/04/2008, char101 <peacech@...> wrote:

                  > On Apr 1, 3:53 pm, "Xiaozhou Liu" <sbb...@...> wrote:
                  >
                  > [...]
                  >
                  > The patch does not work with Visual C++ 9.0 2008. The
                  > main problem is this macros
                  >
                  > +#define PUSH(s) { \
                  > + if (stackp>= stack_end)\
                  > + return NULL; \
                  > + *stackp++ = s; \
                  > + }
                  > +
                  > +#define POP() ({ \
                  > + if (stackp <= stack) \
                  > + return NULL; \
                  > + *--stackp; \
                  > + })

                  Looks like a gcc extension to me. Compiling with with -Wall
                  -Wextra -ansi -pedantic picks up this sort of thing. Why not
                  use ?: for the macros? --Antony

                  --~--~---------~--~----~------------~-------~--~----~
                  You received this message from the "vim_dev" maillist.
                  For more information, visit http://www.vim.org/maillist.php
                  -~----------~----~----~----~------~----~------~--~---
                • char101
                  ... After debugging the executable, it seems that the fault was 100% mine :D There is a compile error on regexp.c on this function which initial content was +
                  Message 8 of 15 , Apr 3, 2008
                    On Apr 3, 1:41 pm, char101 <peac...@...> wrote:
                    > the resulting gvim executable seems to be losing its regex capability,
                    > for example typing abc and then searching for a does not work.

                    After debugging the executable, it seems that the fault was 100%
                    mine :D

                    There is a compile error on regexp.c on this function which initial
                    content was

                    + regprog_T *
                    +vim_regcomp(expr, re_flags)
                    + char_u *expr;
                    + int re_flags;
                    +{
                    + regprog_T *prog = nfa_regengine.regcomp(expr, re_flags);
                    +
                    + return prog ? : bt_regengine.regcomp(expr, re_flags);
                    +}

                    and without much thought I just change it into

                    return prog ? NULL : bt_regengine.regcomp(expr, re_flags);

                    which should actually be

                    return prog ? bt_regengine.regcomp(expr, re_flags) : NULL;

                    So now the regexp works, but I got this error with tSkeleton plugin

                    -------------------------------------------------------- Start of
                    error
                    Error detected while processing function
                    tskeleton#PrepareBits..<SNR>23_Prepar
                    eBuffer..<SNR>23_CollectFunctions..tskeleton#Initialize:

                    line 3:

                    E714: List required

                    Error detected while processing function
                    tskeleton#PrepareBits..<SNR>23_Prepar
                    eBuffer..<SNR>23_CollectFunctions:

                    line 4:

                    E714: List required

                    Error detected while processing function
                    tskeleton#PrepareBits..<SNR>23_Prepar
                    eBuffer..<SNR>23_CollectFunctions..tskeleton#Initialize:

                    line 3:

                    E714: List required

                    Error detected while processing function
                    tskeleton#PrepareBits..<SNR>23_Prepar
                    eBuffer..<SNR>23_CollectFunctions:

                    line 4:

                    E714: List required
                    -------------------------------------- End of error

                    But I cannot look into tSkeleton file since I am having difficulty
                    reading those error lines. Can anyone help translating those error
                    messages, i.e. in which function does the error happens?

                    ---
                    Charles
                    --~--~---------~--~----~------------~-------~--~----~
                    You received this message from the "vim_dev" maillist.
                    For more information, visit http://www.vim.org/maillist.php
                    -~----------~----~----~----~------~----~------~--~---
                  • James Vega
                    ... Based on previous discussions on the list, I don t think that s the intended behavior either. I m pretty sure the logic is If NFA compiler worked, return
                    Message 9 of 15 , Apr 3, 2008
                      On Thu, Apr 03, 2008 at 06:16:32AM -0700, char101 wrote:
                      >
                      > On Apr 3, 1:41 pm, char101 <peac...@...> wrote:
                      > > the resulting gvim executable seems to be losing its regex capability,
                      > > for example typing abc and then searching for a does not work.
                      >
                      > After debugging the executable, it seems that the fault was 100%
                      > mine :D
                      >
                      > There is a compile error on regexp.c on this function which initial
                      > content was
                      >
                      > + regprog_T *
                      > +vim_regcomp(expr, re_flags)
                      > + char_u *expr;
                      > + int re_flags;
                      > +{
                      > + regprog_T *prog = nfa_regengine.regcomp(expr, re_flags);
                      > +
                      > + return prog ? : bt_regengine.regcomp(expr, re_flags);
                      > +}
                      >
                      > and without much thought I just change it into
                      >
                      > return prog ? NULL : bt_regengine.regcomp(expr, re_flags);
                      >
                      > which should actually be
                      >
                      > return prog ? bt_regengine.regcomp(expr, re_flags) : NULL;

                      Based on previous discussions on the list, I don't think that's the
                      intended behavior either. I'm pretty sure the logic is "If NFA compiler
                      worked, return that otherwise return the BT compiled regex." This would
                      translate to:

                      return prog ? prog : bt_regengine.regcomp(expr, re_flags);

                      --
                      James
                      GPG Key: 1024D/61326D40 2003-09-02 James Vega <jamessan@...>
                    • char101
                      ... Thanks James, I changed it, works fine now Benchmark result using ext-js.vim (507 849 Bytes javascript file) Try1: Gvim.exe: 11.1533138743 Gvim-re.exe:
                      Message 10 of 15 , Apr 3, 2008
                        On Apr 3, 8:48 pm, James Vega <james...@...> wrote:
                        > Based on previous discussions on the list, I don't think that's the
                        > intended behavior either.  I'm pretty sure the logic is "If NFA compiler
                        > worked, return that otherwise return the BT compiled regex."  This would
                        > translate to:
                        >
                        >   return prog ? prog : bt_regengine.regcomp(expr, re_flags);

                        Thanks James, I changed it, works fine now

                        Benchmark result using ext-js.vim (507 849 Bytes javascript file)

                        Try1:
                        Gvim.exe: 11.1533138743
                        Gvim-re.exe: 39.4426857432

                        Try2:
                        Gvim.exe: 10.8661471241
                        Gvim-re.exe: 39.3700461863

                        Gvim-re is the one using the regexp patch.

                        The benchmark script (in Python)

                        import os
                        from time import clock, sleep

                        clock()
                        os.system('gvim-re.exe -c "runtime plugin/tohtml.vim" -c "wq!" ext-
                        all.js')
                        t1 = clock()
                        os.system('gvim.exe -c "runtime plugin/tohtml.vim" -c "wq!" ext-
                        all.js')
                        t2 = clock()

                        # This is try2, for try1 I just swap those commands above
                        print "Gvim.exe: ", t2-t1
                        print "Gvim-re.exe: ", t1

                        Try3:
                        Gvim.exe: 10.9457255287
                        Gvim-re.exe: 39.4854167767

                        Try3 is just running gvim.exe -c "wq!" ext-all.js

                        ---
                        Charles
                        --~--~---------~--~----~------------~-------~--~----~
                        You received this message from the "vim_dev" maillist.
                        For more information, visit http://www.vim.org/maillist.php
                        -~----------~----~----~----~------~----~------~--~---
                      • Nico Weber
                        ... Have compiled both versions yourself or is gvim.exe the official build? ... How are the results if you just do `gvim[-re].exe -u NONE -U NONE -c wq!`? Nico
                        Message 11 of 15 , Apr 4, 2008
                          > Benchmark result using ext-js.vim (507 849 Bytes javascript file)
                          >
                          > Try1:
                          > Gvim.exe: 11.1533138743
                          > Gvim-re.exe: 39.4426857432

                          Have compiled both versions yourself or is gvim.exe the official build?

                          > Try3 is just running gvim.exe -c "wq!" ext-all.js

                          How are the results if you just do `gvim[-re].exe -u NONE -U NONE -c
                          wq!`?

                          Nico

                          --~--~---------~--~----~------------~-------~--~----~
                          You received this message from the "vim_dev" maillist.
                          For more information, visit http://www.vim.org/maillist.php
                          -~----------~----~----~----~------~----~------~--~---
                        • Charles
                          ... I compiled both myself. Same settings, same compiler (VC++ 9). Optimization=MAXSPEED (with LTCG), processor=pentium4. Features included: OLE, Python, Mbyte
                          Message 12 of 15 , Apr 4, 2008
                            On Fri, Apr 4, 2008 at 3:43 PM, Nico Weber <nicolasweber@...> wrote:
                            > Have compiled both versions yourself or is gvim.exe the official build?

                            I compiled both myself. Same settings, same compiler (VC++ 9).
                            Optimization=MAXSPEED (with LTCG), processor=pentium4. Features
                            included: OLE, Python, Mbyte

                            > How are the results if you just do `gvim[-re].exe -u NONE -U NONE -c
                            > wq!`?

                            gvim.exe 0.279026812268
                            gvim-re.exe 0.263482935745

                            test.py
                            ---
                            import os
                            from time import clock

                            clock()
                            for i in range(0, 10):
                            os.system('gvim.exe -u NONE -U NONE -c q')
                            t1 = clock()
                            for i in range(0, 10):
                            os.system('gvim-re.exe -u NONE -U NONE -c q')
                            t2 = clock()

                            print "gvim.exe", t1/10
                            print "gvim-re.exe", (t2-t1)/10

                            --~--~---------~--~----~------------~-------~--~----~
                            You received this message from the "vim_dev" maillist.
                            For more information, visit http://www.vim.org/maillist.php
                            -~----------~----~----~----~------~----~------~--~---
                          • Charles
                            ... Another interpretation of your command sample: Gvim.exe: 0.395710994776 Gvim-re.exe: 0.495461553009 This is just loading the ext-all.js without filetype
                            Message 13 of 15 , Apr 4, 2008
                              On Fri, Apr 4, 2008 at 3:43 PM, Nico Weber <nicolasweber@...> wrote:
                              > How are the results if you just do `gvim[-re].exe -u NONE -U NONE -c
                              > wq!`?

                              Another interpretation of your command sample:

                              Gvim.exe: 0.395710994776
                              Gvim-re.exe: 0.495461553009

                              This is just loading the ext-all.js without filetype detection of
                              syntax highlighting

                              ---
                              os.system('gvim.exe -u NONE -U NONE -c "wq!" ext-all.js')
                              os.system('gvim-re.exe -u NONE -U NONE -c "wq!" ext-all.js')
                              ---

                              Gvim.exe: 5.40134745013
                              Gvim-re.exe: 19.7571365646

                              Loading with syntax highlighting independent of any (g)vimrc settings.

                              ---
                              os.system('gvim.exe -u NONE -U NONE --cmd "set guioptions+=M" --cmd
                              "filetype on" --cmd "syntax on" -c "wq!" ext-all.js')
                              os.system('gvim-re.exe -u NONE -U NONE --cmd "set guioptions+=M" --cmd
                              "filetype on" --cmd "syntax on" -c "wq!" ext-all.js')
                              ---

                              --~--~---------~--~----~------------~-------~--~----~
                              You received this message from the "vim_dev" maillist.
                              For more information, visit http://www.vim.org/maillist.php
                              -~----------~----~----~----~------~----~------~--~---
                            • Xiaozhou Liu
                              ... Thanks for your testing! I will remove gcc extension in the code and check for what makes the new regexp so slow. - Xiaozhou
                              Message 14 of 15 , Apr 5, 2008
                                On Fri, Apr 4, 2008 at 5:23 PM, Charles <peacech@...> wrote:
                                >
                                > On Fri, Apr 4, 2008 at 3:43 PM, Nico Weber <nicolasweber@...> wrote:
                                >
                                > > How are the results if you just do `gvim[-re].exe -u NONE -U NONE -c
                                > > wq!`?
                                >
                                > Another interpretation of your command sample:
                                >
                                > Gvim.exe: 0.395710994776
                                > Gvim-re.exe: 0.495461553009
                                >
                                > This is just loading the ext-all.js without filetype detection of
                                > syntax highlighting
                                >
                                > ---
                                > os.system('gvim.exe -u NONE -U NONE -c "wq!" ext-all.js')
                                > os.system('gvim-re.exe -u NONE -U NONE -c "wq!" ext-all.js')
                                > ---
                                >
                                > Gvim.exe: 5.40134745013
                                > Gvim-re.exe: 19.7571365646
                                >
                                > Loading with syntax highlighting independent of any (g)vimrc settings.
                                >
                                > ---
                                > os.system('gvim.exe -u NONE -U NONE --cmd "set guioptions+=M" --cmd
                                > "filetype on" --cmd "syntax on" -c "wq!" ext-all.js')
                                > os.system('gvim-re.exe -u NONE -U NONE --cmd "set guioptions+=M" --cmd
                                > "filetype on" --cmd "syntax on" -c "wq!" ext-all.js')
                                > ---


                                Thanks for your testing!
                                I will remove gcc extension in the code and check for what
                                makes the new regexp so slow.

                                - Xiaozhou

                                --~--~---------~--~----~------------~-------~--~----~
                                You received this message from the "vim_dev" maillist.
                                For more information, visit http://www.vim.org/maillist.php
                                -~----------~----~----~----~------~----~------~--~---
                              Your message has been successfully submitted and would be delivered to recipients shortly.