Loading ...
Sorry, an error occurred while loading the content.
 

[NFA] Slow Regexp with many branches

Expand Messages
  • Grüner Gimpel
    Hi list, I noticed that using Vim 7.4a the Vimwiki plugin works measurably slower with re=0 than with re=1. I figured out the reason is this regexp in the
    Message 1 of 6 , Jul 9, 2013
      Hi list,

      I noticed that using Vim 7.4a the Vimwiki plugin works measurably slower with re=0 than with re=1. I figured out the reason is this regexp in the syntax definition:

      [[:alnum:]]\@<!\%(\%(\%(http\|https\|file\|ftp\|gopher\|telnet\|nntp\|ldap\|rsync\|imap\|pop\|irc\|ircs\|cvs\|svn\|svn+ssh\|git\|ssh\|fish\|sftp\):\%(\/\/\)\)\|\%(mailto\|news\|xmpp\|sip\|sips\|doi\|urn\|tel\):\)\S\{-1,}\%(([^ \t()]*)\)\=\S*

      You may know what it is supposed to match.
      Removing some of the branches makes it faster and so does removing [[:alnum:]]\@<!

      I don't know if there is a standard method for measuring speed of the syntax highlight, but here is what I have done:
      - open a relatively large file
      - :set syn=off
      - :syntax match Number /[[:alnum:]]\@<!\%(\%(\%(http\|https\|file\|ftp\|gopher\|telnet\|nntp\|ldap\|rsync\|imap\|pop\|irc\|ircs\|cvs\|svn\|svn+ssh\|git\|ssh\|fish\|sftp\):\%(\/\/\)\)\|\%(mailto\|news\|xmpp\|sip\|sips\|doi\|urn\|tel\):\)\S\{-1,}\%(([^ \t()]*)\)\=\S*/
      - scroll with <c-e> and <c-y>
      - extremely slow with re=0 but normal slow (*wink*) with re=1

      So, can this behavior be considered as a bug?

      --
      --
      You received this message from the "vim_dev" maillist.
      Do not top-post! Type your reply below the text you are replying to.
      For more information, visit http://www.vim.org/maillist.php

      ---
      You received this message because you are subscribed to the Google Groups "vim_dev" group.
      To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
      For more options, visit https://groups.google.com/groups/opt_out.
    • Bram Moolenaar
      ... The main problem is the [[:alnum:]] @
      Message 2 of 6 , Jul 9, 2013
        Grüner_Gimpel wrote:

        > I noticed that using Vim 7.4a the Vimwiki plugin works measurably
        > slower with re=0 than with re=1. I figured out the reason is this
        > regexp in the syntax definition:
        >
        > [[:alnum:]]\@<!\%(\%(\%(http\|https\|file\|ftp\|gopher\|telnet\|nntp\|ldap\|rsync\|imap\|pop\|irc\|ircs\|cvs\|svn\|svn+ssh\|git\|ssh\|fish\|sftp\):\%(\/\/\)\)\|\%(mailto\|news\|xmpp\|sip\|sips\|doi\|urn\|tel\):\)\S\{-1,}\%(([^ \t()]*)\)\=\S*
        >
        > You may know what it is supposed to match.
        > Removing some of the branches makes it faster and so does removing
        > [[:alnum:]]\@<!
        >
        > I don't know if there is a standard method for measuring speed of the
        > syntax highlight, but here is what I have done:
        > - open a relatively large file
        > - :set syn=off
        > - :syntax match Number /[[:alnum:]]\@<!\%(\%(\%(http\|https\|file\|ftp\|gopher\|telnet\|nntp\|ldap\|rsync\|imap\|pop\|irc\|ircs\|cvs\|svn\|svn+ssh\|git\|ssh\|fish\|sftp\):\%(\/\/\)\)\|\%(mailto\|news\|xmpp\|sip\|sips\|doi\|urn\|tel\):\)\S\{-1,}\%(([^ \t()]*)\)\=\S*/
        > - scroll with <c-e> and <c-y>
        > - extremely slow with re=0 but normal slow (*wink*) with re=1
        >
        > So, can this behavior be considered as a bug?

        The main problem is the [[:alnum:]]\@<! item. Have you tried adding a
        limit? Should be [[:alnum:]]\@1<!

        Or perhaps using \< instead will work.

        For measuring times: use :syntime.

        --
        No children may attend school with their breath smelling of "wild onions."
        [real standing law in West Virginia, United States of America]

        /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
        /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
        \\\ an exciting new programming language -- http://www.Zimbu.org ///
        \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

        --
        --
        You received this message from the "vim_dev" maillist.
        Do not top-post! Type your reply below the text you are replying to.
        For more information, visit http://www.vim.org/maillist.php

        ---
        You received this message because you are subscribed to the Google Groups "vim_dev" group.
        To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
        For more options, visit https://groups.google.com/groups/opt_out.
      • Nikolay Pavlov
        ...
        Message 3 of 6 , Jul 9, 2013


          On Jul 9, 2013 4:58 PM, "Bram Moolenaar" <Bram@...> wrote:
          >
          >
          > Grüner_Gimpel wrote:
          >
          > > I noticed that using Vim 7.4a the Vimwiki plugin works measurably
          > > slower with re=0 than with re=1. I figured out the reason is this
          > > regexp in the syntax definition:
          > >
          > >     [[:alnum:]]\@<!\%(\%(\%(http\|https\|file\|ftp\|gopher\|telnet\|nntp\|ldap\|rsync\|imap\|pop\|irc\|ircs\|cvs\|svn\|svn+ssh\|git\|ssh\|fish\|sftp\):\%(\/\/\)\)\|\%(mailto\|news\|xmpp\|sip\|sips\|doi\|urn\|tel\):\)\S\{-1,}\%(([^ \t()]*)\)\=\S*
          > >
          > > You may know what it is supposed to match.
          > > Removing some of the branches makes it faster and so does removing
          > > [[:alnum:]]\@<!
          > >
          > > I don't know if there is a standard method for measuring speed of the
          > > syntax highlight, but here is what I have done:
          > > - open a relatively large file
          > > - :set syn=off
          > > - :syntax match Number /[[:alnum:]]\@<!\%(\%(\%(http\|https\|file\|ftp\|gopher\|telnet\|nntp\|ldap\|rsync\|imap\|pop\|irc\|ircs\|cvs\|svn\|svn+ssh\|git\|ssh\|fish\|sftp\):\%(\/\/\)\)\|\%(mailto\|news\|xmpp\|sip\|sips\|doi\|urn\|tel\):\)\S\{-1,}\%(([^ \t()]*)\)\=\S*/
          > > - scroll with <c-e> and <c-y>
          > > - extremely slow with re=0 but normal slow (*wink*) with re=1
          > >
          > > So, can this behavior be considered as a bug?
          >
          > The main problem is the  [[:alnum:]]\@<!  item.  Have you tried adding a
          > limit?  Should be   [[:alnum:]]\@1<!
          >
          > Or perhaps using \<  instead will work.
          >
          > For measuring times: use :syntime.

          Could not it deduce a limit by itself in such simple cases?

          > --
          > No children may attend school with their breath smelling of "wild onions."
          >                 [real standing law in West Virginia, United States of America]
          >
          >  /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net   \\\
          > ///        sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
          > \\\  an exciting new programming language -- http://www.Zimbu.org        ///
          >  \\\            help me help AIDS victims -- http://ICCF-Holland.org    ///
          >
          > --
          > --
          > You received this message from the "vim_dev" maillist.
          > Do not top-post! Type your reply below the text you are replying to.
          > For more information, visit http://www.vim.org/maillist.php
          >
          > ---
          > You received this message because you are subscribed to the Google Groups "vim_dev" group.
          > To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
          > For more options, visit https://groups.google.com/groups/opt_out.
          >
          >

          --
          --
          You received this message from the "vim_dev" maillist.
          Do not top-post! Type your reply below the text you are replying to.
          For more information, visit http://www.vim.org/maillist.php
           
          ---
          You received this message because you are subscribed to the Google Groups "vim_dev" group.
          To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
          For more options, visit https://groups.google.com/groups/opt_out.
           
           
        • Bram Moolenaar
          ... It does, but I m not sure what patterns it works for: nfa_max_width() -- Another bucket of what can only be described as human ordure hits ARTHUR. ARTHUR:
          Message 4 of 6 , Jul 9, 2013
            Nikolay Pavlov wrote:

            > On Jul 9, 2013 4:58 PM, "Bram Moolenaar" <Bram@...> wrote:
            > >
            > >
            > > Grüner_Gimpel wrote:
            > >
            > > > I noticed that using Vim 7.4a the Vimwiki plugin works measurably
            > > > slower with re=0 than with re=1. I figured out the reason is this
            > > > regexp in the syntax definition:
            > > >
            > > >
            > [[:alnum:]]\@<!\%(\%(\%(http\|https\|file\|ftp\|gopher\|telnet\|nntp\|ldap\|rsync\|imap\|pop\|irc\|ircs\|cvs\|svn\|svn+ssh\|git\|ssh\|fish\|sftp\):\%(\/\/\)\)\|\%(mailto\|news\|xmpp\|sip\|sips\|doi\|urn\|tel\):\)\S\{-1,}\%(([^
            > \t()]*)\)\=\S*
            > > >
            > > > You may know what it is supposed to match.
            > > > Removing some of the branches makes it faster and so does removing
            > > > [[:alnum:]]\@<!
            > > >
            > > > I don't know if there is a standard method for measuring speed of the
            > > > syntax highlight, but here is what I have done:
            > > > - open a relatively large file
            > > > - :set syn=off
            > > > - :syntax match Number
            > /[[:alnum:]]\@<!\%(\%(\%(http\|https\|file\|ftp\|gopher\|telnet\|nntp\|ldap\|rsync\|imap\|pop\|irc\|ircs\|cvs\|svn\|svn+ssh\|git\|ssh\|fish\|sftp\):\%(\/\/\)\)\|\%(mailto\|news\|xmpp\|sip\|sips\|doi\|urn\|tel\):\)\S\{-1,}\%(([^
            > \t()]*)\)\=\S*/
            > > > - scroll with <c-e> and <c-y>
            > > > - extremely slow with re=0 but normal slow (*wink*) with re=1
            > > >
            > > > So, can this behavior be considered as a bug?
            > >
            > > The main problem is the [[:alnum:]]\@<! item. Have you tried adding a
            > > limit? Should be [[:alnum:]]\@1<!
            > >
            > > Or perhaps using \< instead will work.
            > >
            > > For measuring times: use :syntime.
            >
            > Could not it deduce a limit by itself in such simple cases?

            It does, but I'm not sure what patterns it works for:
            nfa_max_width()

            --
            Another bucket of what can only be described as human ordure hits ARTHUR.
            ARTHUR: ... Right! (to the KNIGHTS) That settles it!
            "Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

            /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
            /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
            \\\ an exciting new programming language -- http://www.Zimbu.org ///
            \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

            --
            --
            You received this message from the "vim_dev" maillist.
            Do not top-post! Type your reply below the text you are replying to.
            For more information, visit http://www.vim.org/maillist.php

            ---
            You received this message because you are subscribed to the Google Groups "vim_dev" group.
            To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
            For more options, visit https://groups.google.com/groups/opt_out.
          • Dominique Pellé
            ... In this case the [[:alnum:]] @
            Message 5 of 6 , Jul 9, 2013
              Nikolay Pavlov wrote:

              >
              > On Jul 9, 2013 4:58 PM, "Bram Moolenaar" <Bram@...> wrote:
              >>
              >>
              >> Grüner_Gimpel wrote:
              >>
              >> > I noticed that using Vim 7.4a the Vimwiki plugin works measurably
              >> > slower with re=0 than with re=1. I figured out the reason is this
              >> > regexp in the syntax definition:
              >> >
              >> >
              >> > [[:alnum:]]\@<!\%(\%(\%(http\|https\|file\|ftp\|gopher\|telnet\|nntp\|ldap\|rsync\|imap\|pop\|irc\|ircs\|cvs\|svn\|svn+ssh\|git\|ssh\|fish\|sftp\):\%(\/\/\)\)\|\%(mailto\|news\|xmpp\|sip\|sips\|doi\|urn\|tel\):\)\S\{-1,}\%(([^
              >> > \t()]*)\)\=\S*
              >> >
              >> > You may know what it is supposed to match.
              >> > Removing some of the branches makes it faster and so does removing
              >> > [[:alnum:]]\@<!
              >> >
              >> > I don't know if there is a standard method for measuring speed of the
              >> > syntax highlight, but here is what I have done:
              >> > - open a relatively large file
              >> > - :set syn=off
              >> > - :syntax match Number
              >> > /[[:alnum:]]\@<!\%(\%(\%(http\|https\|file\|ftp\|gopher\|telnet\|nntp\|ldap\|rsync\|imap\|pop\|irc\|ircs\|cvs\|svn\|svn+ssh\|git\|ssh\|fish\|sftp\):\%(\/\/\)\)\|\%(mailto\|news\|xmpp\|sip\|sips\|doi\|urn\|tel\):\)\S\{-1,}\%(([^
              >> > \t()]*)\)\=\S*/
              >> > - scroll with <c-e> and <c-y>
              >> > - extremely slow with re=0 but normal slow (*wink*) with re=1
              >> >
              >> > So, can this behavior be considered as a bug?
              >>
              >> The main problem is the [[:alnum:]]\@<! item. Have you tried adding a
              >> limit? Should be [[:alnum:]]\@1<!
              >>
              >> Or perhaps using \< instead will work.
              >>
              >> For measuring times: use :syntime.
              >
              > Could not it deduce a limit by itself in such simple cases?


              In this case the [[:alnum:]]\@<! is likely to be the root cause
              as was already said.

              However, alternative with \| are also costly because as far
              as I know, the regexp engine has to either backtrack (with
              old engine, re=1) or look for all alternatives at the same
              time (with new engine, re=0).

              I wonder: does the regexp engine attempts to simplify regexps
              with alternatives? I'm thinking about this kind of optimizations:

              https\|http --> https\=
              svn+ssh\|svn --> svn\%(+ssh\)\=

              So at least the common prefix is checked only once in the new
              engine. And with the old engine, there is no backtracking when
              matching fails already in the common prefix.

              Regards
              Dominique

              --
              --
              You received this message from the "vim_dev" maillist.
              Do not top-post! Type your reply below the text you are replying to.
              For more information, visit http://www.vim.org/maillist.php

              ---
              You received this message because you are subscribed to the Google Groups "vim_dev" group.
              To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
              For more options, visit https://groups.google.com/groups/opt_out.
            • Bram Moolenaar
              ... No, the new engine does not find common substrings. It would help to implement this, pushing alternatives on the state list is consuming a lot of the
              Message 6 of 6 , Jul 9, 2013
                Dominique pelle wrote:

                > Nikolay Pavlov wrote:
                >
                > >
                > > On Jul 9, 2013 4:58 PM, "Bram Moolenaar" <Bram@...> wrote:
                > >>
                > >>
                > >> Grüner_Gimpel wrote:
                > >>
                > >> > I noticed that using Vim 7.4a the Vimwiki plugin works measurably
                > >> > slower with re=0 than with re=1. I figured out the reason is this
                > >> > regexp in the syntax definition:
                > >> >
                > >> >
                > >> > [[:alnum:]]\@<!\%(\%(\%(http\|https\|file\|ftp\|gopher\|telnet\|nntp\|ldap\|rsync\|imap\|pop\|irc\|ircs\|cvs\|svn\|svn+ssh\|git\|ssh\|fish\|sftp\):\%(\/\/\)\)\|\%(mailto\|news\|xmpp\|sip\|sips\|doi\|urn\|tel\):\)\S\{-1,}\%(([^
                > >> > \t()]*)\)\=\S*
                > >> >
                > >> > You may know what it is supposed to match.
                > >> > Removing some of the branches makes it faster and so does removing
                > >> > [[:alnum:]]\@<!
                > >> >
                > >> > I don't know if there is a standard method for measuring speed of the
                > >> > syntax highlight, but here is what I have done:
                > >> > - open a relatively large file
                > >> > - :set syn=off
                > >> > - :syntax match Number
                > >> > /[[:alnum:]]\@<!\%(\%(\%(http\|https\|file\|ftp\|gopher\|telnet\|nntp\|ldap\|rsync\|imap\|pop\|irc\|ircs\|cvs\|svn\|svn+ssh\|git\|ssh\|fish\|sftp\):\%(\/\/\)\)\|\%(mailto\|news\|xmpp\|sip\|sips\|doi\|urn\|tel\):\)\S\{-1,}\%(([^
                > >> > \t()]*)\)\=\S*/
                > >> > - scroll with <c-e> and <c-y>
                > >> > - extremely slow with re=0 but normal slow (*wink*) with re=1
                > >> >
                > >> > So, can this behavior be considered as a bug?
                > >>
                > >> The main problem is the [[:alnum:]]\@<! item. Have you tried adding a
                > >> limit? Should be [[:alnum:]]\@1<!
                > >>
                > >> Or perhaps using \< instead will work.
                > >>
                > >> For measuring times: use :syntime.
                > >
                > > Could not it deduce a limit by itself in such simple cases?
                >
                >
                > In this case the [[:alnum:]]\@<! is likely to be the root cause
                > as was already said.
                >
                > However, alternative with \| are also costly because as far
                > as I know, the regexp engine has to either backtrack (with
                > old engine, re=1) or look for all alternatives at the same
                > time (with new engine, re=0).
                >
                > I wonder: does the regexp engine attempts to simplify regexps
                > with alternatives? I'm thinking about this kind of optimizations:
                >
                > https\|http --> https\=
                > svn+ssh\|svn --> svn\%(+ssh\)\=
                >
                > So at least the common prefix is checked only once in the new
                > engine. And with the old engine, there is no backtracking when
                > matching fails already in the common prefix.

                No, the new engine does not find common substrings. It would help to
                implement this, pushing alternatives on the state list is consuming a
                lot of the time. Anyone feel like looking into this?

                --
                Did Adam and Eve have navels?

                /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
                \\\ an exciting new programming language -- http://www.Zimbu.org ///
                \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

                --
                --
                You received this message from the "vim_dev" maillist.
                Do not top-post! Type your reply below the text you are replying to.
                For more information, visit http://www.vim.org/maillist.php

                ---
                You received this message because you are subscribed to the Google Groups "vim_dev" group.
                To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
                For more options, visit https://groups.google.com/groups/opt_out.
              Your message has been successfully submitted and would be delivered to recipients shortly.