Loading ...
Sorry, an error occurred while loading the content.

Possible bug(s) in new regex engine involving \@> and \?

Expand Messages
  • Brett Stahlman
    Possible bugs in new regex engine involving @ and ? Using the following line of text... 0123456789 ...run the following two :substitute commands with both
    Message 1 of 10 , Dec 21, 2013
    • 0 Attachment
      Possible bugs in new regex engine involving \@> and \?

      Using the following line of text...
      0123456789

      ...run the following two :substitute commands with both old and new regex engine, and notice the differences...

      s/\(01\)\(23\)\@>\(.*\)/--\1--\2--\3/
      Old (\%=1)
      --01--23--456789
      New (\%=2)
      ----23--456789

      s/\(01\)\(23\d\@=\)\?\(.*\)/--\1--\2--\3/
      Old (\%=1)
      --01--23--456789
      New (\%=2)
      --01----23456789

      Note: The \d\@= in the second example could be replaced with other matching zero-width assertions (e.g., \%v) without changing the results.

      Brett S.

      --
      --
      You received this message from the "vim_use" maillist.
      Do not top-post! Type your reply below the text you are replying to.
      For more information, visit http://www.vim.org/maillist.php

      ---
      You received this message because you are subscribed to the Google Groups "vim_use" group.
      To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@....
      For more options, visit https://groups.google.com/groups/opt_out.
    • Marcin Szamotulski
      ... Hi, Which version of vim are you using. I cannot reproduce here (Vim 7.4.126, GNU/Linux) the first one, but I can reproduce the second one. Best regards,
      Message 2 of 10 , Dec 22, 2013
      • 0 Attachment
        On 05:52 Sat 21 Dec , Brett Stahlman wrote:
        > Possible bugs in new regex engine involving \@> and \?
        >
        > Using the following line of text...
        > 0123456789
        >
        > ...run the following two :substitute commands with both old and new regex engine, and notice the differences...
        >
        > s/\(01\)\(23\)\@>\(.*\)/--\1--\2--\3/
        > Old (\%=1)
        > --01--23--456789
        > New (\%=2)
        > ----23--456789
        >
        > s/\(01\)\(23\d\@=\)\?\(.*\)/--\1--\2--\3/
        > Old (\%=1)
        > --01--23--456789
        > New (\%=2)
        > --01----23456789
        >
        > Note: The \d\@= in the second example could be replaced with other matching zero-width assertions (e.g., \%v) without changing the results.
        >
        > Brett S.

        Hi,

        Which version of vim are you using. I cannot reproduce here (Vim
        7.4.126, GNU/Linux) the first one, but I can reproduce the second one.

        Best regards,
        Marcin

        --
        --
        You received this message from the "vim_use" maillist.
        Do not top-post! Type your reply below the text you are replying to.
        For more information, visit http://www.vim.org/maillist.php

        ---
        You received this message because you are subscribed to the Google Groups "vim_use" group.
        To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@....
        For more options, visit https://groups.google.com/groups/opt_out.
      • Marcin Szamotulski
        ... It seems that there is a problem with grouping, since event these ... Best, Marcin -- -- You received this message from the vim_use maillist. Do not
        Message 3 of 10 , Dec 22, 2013
        • 0 Attachment
          On 20:53 Sun 22 Dec , Marcin Szamotulski wrote:
          > On 05:52 Sat 21 Dec , Brett Stahlman wrote:
          > > Possible bugs in new regex engine involving \@> and \?
          > >
          > > Using the following line of text...
          > > 0123456789
          > >
          > > ...run the following two :substitute commands with both old and new regex engine, and notice the differences...
          > >
          > > s/\(01\)\(23\)\@>\(.*\)/--\1--\2--\3/
          > > Old (\%=1)
          > > --01--23--456789
          > > New (\%=2)
          > > ----23--456789
          > >
          > > s/\(01\)\(23\d\@=\)\?\(.*\)/--\1--\2--\3/
          > > Old (\%=1)
          > > --01--23--456789
          > > New (\%=2)
          > > --01----23456789
          > >
          > > Note: The \d\@= in the second example could be replaced with other matching zero-width assertions (e.g., \%v) without changing the results.
          > >
          > > Brett S.
          >
          > Hi,
          >
          > Which version of vim are you using. I cannot reproduce here (Vim
          > 7.4.126, GNU/Linux) the first one, but I can reproduce the second one.
          >
          > Best regards,
          > Marcin


          It seems that there is a problem with grouping, since event these
          patterns fail with 'set re=2':
          :s/\(01\)\(23\%(\d\@=\)\)\?\(.*\)/--\1--\2--\3
          :s/\(01\)\(23\%(\d\)\@=\)\?\(.*\)/--\1--\2--\3


          Best,
          Marcin

          --
          --
          You received this message from the "vim_use" maillist.
          Do not top-post! Type your reply below the text you are replying to.
          For more information, visit http://www.vim.org/maillist.php

          ---
          You received this message because you are subscribed to the Google Groups "vim_use" group.
          To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@....
          For more options, visit https://groups.google.com/groups/opt_out.
        • Brett Stahlman
          ... VIM - Vi IMproved 7.4 (2013 Aug 10, compiled Aug 10 2013 14:38:33) MS-Windows 32-bit GUI version with OLE support Compiled by mool@tororo Big version with
          Message 4 of 10 , Dec 22, 2013
          • 0 Attachment
            On Sunday, December 22, 2013 2:53:38 PM UTC-6, coot_. wrote:
            > On 05:52 Sat 21 Dec , Brett Stahlman wrote:
            >
            > > Possible bugs in new regex engine involving \@> and \?
            >
            > >
            >
            > > Using the following line of text...
            >
            > > 0123456789
            >
            > >
            >
            > > ...run the following two :substitute commands with both old and new regex engine, and notice the differences...
            >
            > >
            >
            > > s/\(01\)\(23\)\@>\(.*\)/--\1--\2--\3/
            >
            > > Old (\%=1)
            >
            > > --01--23--456789
            >
            > > New (\%=2)
            >
            > > ----23--456789
            >
            > >
            >
            > > s/\(01\)\(23\d\@=\)\?\(.*\)/--\1--\2--\3/
            >
            > > Old (\%=1)
            >
            > > --01--23--456789
            >
            > > New (\%=2)
            >
            > > --01----23456789
            >
            > >
            >
            > > Note: The \d\@= in the second example could be replaced with other matching zero-width assertions (e.g., \%v) without changing the results.
            >
            > >
            >
            > > Brett S.
            >
            >
            >
            > Hi,
            >
            >
            >
            > Which version of vim are you using. I cannot reproduce here (Vim
            >
            > 7.4.126, GNU/Linux) the first one, but I can reproduce the second one.

            VIM - Vi IMproved 7.4 (2013 Aug 10, compiled Aug 10 2013 14:38:33)
            MS-Windows 32-bit GUI version with OLE support
            Compiled by mool@tororo
            Big version with GUI. Features included (+) or not (-):

            Brett S.
            >
            >
            >
            > Best regards,
            >
            > Marcin

            --
            --
            You received this message from the "vim_use" maillist.
            Do not top-post! Type your reply below the text you are replying to.
            For more information, visit http://www.vim.org/maillist.php

            ---
            You received this message because you are subscribed to the Google Groups "vim_use" group.
            To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@....
            For more options, visit https://groups.google.com/groups/opt_out.
          • Brett Stahlman
            ... Yes. But without the ?, re=2 behaves as expected. Brett S. ... -- -- You received this message from the vim_use maillist. Do not top-post! Type your
            Message 5 of 10 , Dec 22, 2013
            • 0 Attachment
              On Sunday, December 22, 2013 3:05:06 PM UTC-6, coot_. wrote:
              > On 20:53 Sun 22 Dec , Marcin Szamotulski wrote:
              >
              > > On 05:52 Sat 21 Dec , Brett Stahlman wrote:
              >
              > > > Possible bugs in new regex engine involving \@> and \?
              >
              > > >
              >
              > > > Using the following line of text...
              >
              > > > 0123456789
              >
              > > >
              >
              > > > ...run the following two :substitute commands with both old and new regex engine, and notice the differences...
              >
              > > >
              >
              > > > s/\(01\)\(23\)\@>\(.*\)/--\1--\2--\3/
              >
              > > > Old (\%=1)
              >
              > > > --01--23--456789
              >
              > > > New (\%=2)
              >
              > > > ----23--456789
              >
              > > >
              >
              > > > s/\(01\)\(23\d\@=\)\?\(.*\)/--\1--\2--\3/
              >
              > > > Old (\%=1)
              >
              > > > --01--23--456789
              >
              > > > New (\%=2)
              >
              > > > --01----23456789
              >
              > > >
              >
              > > > Note: The \d\@= in the second example could be replaced with other matching zero-width assertions (e.g., \%v) without changing the results.
              >
              > > >
              >
              > > > Brett S.
              >
              > >
              >
              > > Hi,
              >
              > >
              >
              > > Which version of vim are you using. I cannot reproduce here (Vim
              >
              > > 7.4.126, GNU/Linux) the first one, but I can reproduce the second one.
              >
              > >
              >
              > > Best regards,
              >
              > > Marcin
              >
              >
              >
              >
              >
              > It seems that there is a problem with grouping, since event these
              >
              > patterns fail with 'set re=2':

              Yes. But without the \?, re=2 behaves as expected.
              Brett S.

              >
              > :s/\(01\)\(23\%(\d\@=\)\)\?\(.*\)/--\1--\2--\3
              >
              > :s/\(01\)\(23\%(\d\)\@=\)\?\(.*\)/--\1--\2--\3
              >
              >
              >
              >
              >
              > Best,
              >
              > Marcin

              --
              --
              You received this message from the "vim_use" maillist.
              Do not top-post! Type your reply below the text you are replying to.
              For more information, visit http://www.vim.org/maillist.php

              ---
              You received this message because you are subscribed to the Google Groups "vim_use" group.
              To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@....
              For more options, visit https://groups.google.com/groups/opt_out.
            • Bram Moolenaar
              ... I ll add a remark in the todo list. Thanks for the examples. Can you simplify them further? Can you also see the effect with only a search? -- Laughing
              Message 6 of 10 , Jan 5, 2014
              • 0 Attachment
                Brett Stahlman wrote:

                > Possible bugs in new regex engine involving \@> and \?
                >
                > Using the following line of text...
                > 0123456789
                >
                > ...run the following two :substitute commands with both old and new regex engine, and notice the differences...
                >
                > s/\(01\)\(23\)\@>\(.*\)/--\1--\2--\3/
                > Old (\%=1)
                > --01--23--456789
                > New (\%=2)
                > ----23--456789
                >
                > s/\(01\)\(23\d\@=\)\?\(.*\)/--\1--\2--\3/
                > Old (\%=1)
                > --01--23--456789
                > New (\%=2)
                > --01----23456789
                >
                > Note: The \d\@= in the second example could be replaced with other
                > matching zero-width assertions (e.g., \%v) without changing the
                > results.

                I'll add a remark in the todo list. Thanks for the examples.
                Can you simplify them further? Can you also see the effect with only a
                search?

                --
                Laughing helps. It's like jogging on the inside.

                /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
                \\\ an exciting new programming language -- http://www.Zimbu.org ///
                \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

                --
                --
                You received this message from the "vim_use" maillist.
                Do not top-post! Type your reply below the text you are replying to.
                For more information, visit http://www.vim.org/maillist.php

                ---
                You received this message because you are subscribed to the Google Groups "vim_use" group.
                To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@....
                For more options, visit https://groups.google.com/groups/opt_out.
              • Brett Stahlman
                ... No problem. How about this? s/ (1 d @= ) ? (.* )/ 1: 2/ Executed on the following line... 123 ... Note that if I remove the * from the second capturing
                Message 7 of 10 , Jan 6, 2014
                • 0 Attachment
                  On Monday, January 6, 2014 12:18:17 AM UTC-6, Bram Moolenaar wrote:
                  > Brett Stahlman wrote:
                  >
                  >
                  >
                  > > Possible bugs in new regex engine involving \@> and \?
                  >
                  > >
                  >
                  > > Using the following line of text...
                  >
                  > > 0123456789
                  >
                  > >
                  >
                  > > ...run the following two :substitute commands with both old and new regex engine, and notice the differences...
                  >
                  > >
                  >
                  > > s/\(01\)\(23\)\@>\(.*\)/--\1--\2--\3/
                  >
                  > > Old (\%=1)
                  >
                  > > --01--23--456789
                  >
                  > > New (\%=2)
                  >
                  > > ----23--456789
                  >
                  > >
                  >
                  > > s/\(01\)\(23\d\@=\)\?\(.*\)/--\1--\2--\3/
                  >
                  > > Old (\%=1)
                  >
                  > > --01--23--456789
                  >
                  > > New (\%=2)
                  >
                  > > --01----23456789
                  >
                  > >
                  >
                  > > Note: The \d\@= in the second example could be replaced with other
                  >
                  > > matching zero-width assertions (e.g., \%v) without changing the
                  >
                  > > results.
                  >
                  >
                  >
                  > I'll add a remark in the todo list. Thanks for the examples.
                  >
                  > Can you simplify them further? Can you also see the effect with only a
                  >
                  > search?

                  No problem. How about this?
                  s/\(1\d\@=\)\?\(.*\)/\1:\2/

                  Executed on the following line...
                  123
                  ...the new regexp engine produces...
                  :123

                  Note that if I remove the * from the second capturing group, the substitute works as expected. It's as though the \? is not behaving greedily when it's followed by something capable of eating what it leaves behind.

                  I do see the same behavior in a search: in particular, incremental searching highlights "123" for the following pattern:
                  \(1\d\@=\)\?\zs\(.*\)
                  With the old regexp engine, only "23" is highlighted, as expected.

                  Thanks,
                  Brett Stahlman


                  >
                  >
                  >
                  > --
                  >
                  > Laughing helps. It's like jogging on the inside.
                  >
                  >
                  >
                  > /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                  >
                  > /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
                  >
                  > \\\ an exciting new programming language -- http://www.Zimbu.org ///
                  >
                  > \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

                  --
                  --
                  You received this message from the "vim_use" maillist.
                  Do not top-post! Type your reply below the text you are replying to.
                  For more information, visit http://www.vim.org/maillist.php

                  ---
                  You received this message because you are subscribed to the Google Groups "vim_use" group.
                  To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@....
                  For more options, visit https://groups.google.com/groups/opt_out.
                • Brett Stahlman
                  ... The behavior isn t limited to ?. Other optional quantifiers (e.g., *) appear to behave the same. Moreover, it appears that any type of assertion within
                  Message 8 of 10 , Jan 7, 2014
                  • 0 Attachment
                    On Monday, January 6, 2014 12:59:31 PM UTC-6, Brett Stahlman wrote:
                    > On Monday, January 6, 2014 12:18:17 AM UTC-6, Bram Moolenaar wrote:
                    > > Brett Stahlman wrote:
                    > >
                    > >
                    > >
                    > > > Possible bugs in new regex engine involving \@> and \?
                    > >
                    > > >
                    > >
                    > > > Using the following line of text...
                    > >
                    > > > 0123456789
                    > >
                    > > >
                    > >
                    > > > ...run the following two :substitute commands with both old and new regex engine, and notice the differences...
                    > >
                    > > >
                    > >
                    > > > s/\(01\)\(23\)\@>\(.*\)/--\1--\2--\3/
                    > >
                    > > > Old (\%=1)
                    > >
                    > > > --01--23--456789
                    > >
                    > > > New (\%=2)
                    > >
                    > > > ----23--456789
                    > >
                    > > >
                    > >
                    > > > s/\(01\)\(23\d\@=\)\?\(.*\)/--\1--\2--\3/
                    > >
                    > > > Old (\%=1)
                    > >
                    > > > --01--23--456789
                    > >
                    > > > New (\%=2)
                    > >
                    > > > --01----23456789
                    > >
                    > > >
                    > >
                    > > > Note: The \d\@= in the second example could be replaced with other
                    > >
                    > > > matching zero-width assertions (e.g., \%v) without changing the
                    > >
                    > > > results.
                    > >
                    > >
                    > >
                    > > I'll add a remark in the todo list. Thanks for the examples.
                    > >
                    > > Can you simplify them further? Can you also see the effect with only a
                    > >
                    > > search?
                    >
                    > No problem. How about this?
                    > s/\(1\d\@=\)\?\(.*\)/\1:\2/
                    >
                    > Executed on the following line...
                    > 123
                    > ...the new regexp engine produces...
                    > :123
                    >
                    > Note that if I remove the * from the second capturing group, the substitute works as expected. It's as though the \? is not behaving greedily when it's followed by something capable of eating what it leaves behind.

                    The behavior isn't limited to \?. Other optional quantifiers (e.g., *) appear to behave the same. Moreover, it appears that any type of assertion within the group will work (though the assertion must come after the piece that matches, not before). Here are some other examples to illustrate:
                    Note: These patterns should match at the "2" in "123".

                    \(11\@<=\)*\zs\d*
                    \(11\@!\)*\zs\d*
                    Note: This one works as expected!
                    \(2\@!1\)*\zs\d*

                    Note: If you remove the * at the end of the pattern, all examples work correctly! It's as though the presence of the zero-width assertion within the first group causes the group's quantifier to behave non-greedily, if and only if there's a subsequent greedy quantifier capable of consuming whatever the first quantifier leaves.

                    Thanks,
                    Brett Stahlman

                    >
                    > I do see the same behavior in a search: in particular, incremental searching highlights "123" for the following pattern:
                    > \(1\d\@=\)\?\zs\(.*\)
                    > With the old regexp engine, only "23" is highlighted, as expected.
                    >
                    > Thanks,
                    > Brett Stahlman
                    >
                    >
                    > >
                    > >
                    > >
                    > > --
                    > >
                    > > Laughing helps. It's like jogging on the inside.
                    > >
                    > >
                    > >
                    > > /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                    > >
                    > > /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
                    > >
                    > > \\\ an exciting new programming language -- http://www.Zimbu.org ///
                    > >
                    > > \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

                    --
                    --
                    You received this message from the "vim_use" maillist.
                    Do not top-post! Type your reply below the text you are replying to.
                    For more information, visit http://www.vim.org/maillist.php

                    ---
                    You received this message because you are subscribed to the Google Groups "vim_use" group.
                    To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@....
                    For more options, visit https://groups.google.com/groups/opt_out.
                  • Brett Stahlman
                    Not sure whether this is the same issue or not, but it seems to be similar, at least, so I m including it on this thread. If nothing else, it can serve as a
                    Message 9 of 10 , Nov 5, 2014
                    • 0 Attachment
                      Not sure whether this is the same issue or not, but it seems to be
                      similar, at least, so I'm including it on this thread. If nothing
                      else, it can serve as a simple test for any eventual code fix...

                      --Old Engine--
                      echo matchlist('ababa', '\%#=1^\(a\%(ba\)*\(b\|$\)\)\?\(.*\)', '')
                      ['ababa', 'ababa', '', '', '', '', '', '', '', '']

                      --New Engine--
                      echo matchlist('ababa', '\%#=2^\(a\%(ba\)*\(b\|$\)\)\?\(.*\)', '')
                      ['ababa', 'abab', 'b', 'a', '', '', '', '', '', '']

                      Note how the new engine fails to match the second "ba" in the first
                      capture, apparently because the later...
                      \(b\|$\)
                      ...takes the b and won't give it back.

                      Thanks,
                      Brett S.

                      On Mon, Jan 6, 2014 at 12:18 AM, Bram Moolenaar <Bram@...> wrote:
                      >
                      > Brett Stahlman wrote:
                      >
                      >> Possible bugs in new regex engine involving \@> and \?
                      >>
                      >> Using the following line of text...
                      >> 0123456789
                      >>
                      >> ...run the following two :substitute commands with both old and new regex engine, and notice the differences...
                      >>
                      >> s/\(01\)\(23\)\@>\(.*\)/--\1--\2--\3/
                      >> Old (\%=1)
                      >> --01--23--456789
                      >> New (\%=2)
                      >> ----23--456789
                      >>
                      >> s/\(01\)\(23\d\@=\)\?\(.*\)/--\1--\2--\3/
                      >> Old (\%=1)
                      >> --01--23--456789
                      >> New (\%=2)
                      >> --01----23456789
                      >>
                      >> Note: The \d\@= in the second example could be replaced with other
                      >> matching zero-width assertions (e.g., \%v) without changing the
                      >> results.
                      >
                      > I'll add a remark in the todo list. Thanks for the examples.
                      > Can you simplify them further? Can you also see the effect with only a
                      > search?
                      >
                      > --
                      > Laughing helps. It's like jogging on the inside.
                      >
                      > /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                      > /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
                      > \\\ an exciting new programming language -- http://www.Zimbu.org ///
                      > \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

                      --
                      --
                      You received this message from the "vim_use" maillist.
                      Do not top-post! Type your reply below the text you are replying to.
                      For more information, visit http://www.vim.org/maillist.php

                      ---
                      You received this message because you are subscribed to the Google Groups "vim_use" group.
                      To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@....
                      For more options, visit https://groups.google.com/d/optout.
                    • Bram Moolenaar
                      ... It s probably a different problem, since this pattern doesn t use @ . -- f y cn rd ths thn y cn hv grt jb n cmptr prgrmmng /// Bram Moolenaar --
                      Message 10 of 10 , Nov 5, 2014
                      • 0 Attachment
                        Brett Stahlman wrote:

                        > Not sure whether this is the same issue or not, but it seems to be
                        > similar, at least, so I'm including it on this thread. If nothing
                        > else, it can serve as a simple test for any eventual code fix...
                        >
                        > --Old Engine--
                        > echo matchlist('ababa', '\%#=1^\(a\%(ba\)*\(b\|$\)\)\?\(.*\)', '')
                        > ['ababa', 'ababa', '', '', '', '', '', '', '', '']
                        >
                        > --New Engine--
                        > echo matchlist('ababa', '\%#=2^\(a\%(ba\)*\(b\|$\)\)\?\(.*\)', '')
                        > ['ababa', 'abab', 'b', 'a', '', '', '', '', '', '']
                        >
                        > Note how the new engine fails to match the second "ba" in the first
                        > capture, apparently because the later...
                        > \(b\|$\)
                        > ...takes the b and won't give it back.

                        It's probably a different problem, since this pattern doesn't use \@>.

                        --
                        f y cn rd ths thn y cn hv grt jb n cmptr prgrmmng

                        /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                        /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
                        \\\ an exciting new programming language -- http://www.Zimbu.org ///
                        \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

                        --
                        --
                        You received this message from the "vim_use" maillist.
                        Do not top-post! Type your reply below the text you are replying to.
                        For more information, visit http://www.vim.org/maillist.php

                        ---
                        You received this message because you are subscribed to the Google Groups "vim_use" group.
                        To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@....
                        For more options, visit https://groups.google.com/d/optout.
                      Your message has been successfully submitted and would be delivered to recipients shortly.