Loading ...
Sorry, an error occurred while loading the content.

Issue 99 in vim: Feature: Extended regular expressions

Expand Messages
  • vim@googlecode.com
    Status: New Owner: ---- Labels: Type-Defect Priority-Medium New issue 99 by m...@morearty.com: Feature: Extended regular expressions
    Message 1 of 4 , Dec 11, 2012
    • 0 Attachment
      Status: New
      Owner: ----
      Labels: Type-Defect Priority-Medium

      New issue 99 by m...@...: Feature: Extended regular expressions
      http://code.google.com/p/vim/issues/detail?id=99

      (See https://code.google.com/r/mike-vim-extended-regex/ for the source code
      of this feature. Diffs are here:
      https://code.google.com/r/mike-vim-extended-regex/source/list )

      I've implemented support for extended regular expressions in Vim, somewhat
      similar to Perl's extended regex feature, which allows you to make
      complicated regexes (especially in Vimscript files) easier to read, by
      including whitespace and comments in them. (Vim already allows multiline
      regexes.) I'm hoping that, after any changes suggested on this forum, this
      will be a useful addition to Vim.

      One of the trickiest, and perhaps most contentious, parts is choosing the
      syntax to use -- how to turn on extended mode, and what comments should
      look like. I am very open to feedback and changes on this. Below, I
      present the reasoning behind my initial choices.

      This is what I have implemented:

      - To turn on extended mode, put \# at the beginning of your regex.
      - A comment is enclosed in double-braces, like {{ this }}.
      - To match a space rather than having it be ignored, use "\ ".

      Here is a simple example. syntax/c.vim includes this, for syntax
      highlighting of backslash-escaped sequences inside strings in C:

      " String and Character constants
      " Highlight special characters (those which have a backslash)
      differently
      syn match cSpecial display contained "\\\(x\x\+\|\o\{1,3}\|.\|$\)"

      With extended regular expressions, the above could be written with
      whitespace and comments:

      " String and Character constants
      " Highlight special characters (those which have a backslash)
      differently
      syn match cSpecial display contained
      \ "\#
      \ \\ {{ literal backslash, followed by one
      of... }}
      \ \(
      \ x \x\+ {{ hex, e.g. '\x2c' }}
      \ \|
      \ \o\{1,3} {{ octal, e.g. '\755' }}
      \ \|
      \ . {{ e.g. '\n' or '\t' etc. }}
      \ \|
      \ $ {{ end of line }}
      \ \)"

      I have not yet written tests or docs. If you want, I would be happy to do
      so.

      As for the syntax: Obviously it is best not to invent a brand new syntax
      unless there is a good reason to do so. I would have preferred to use
      Perl's syntax, which is:

      - To turn on extended mode, use "x" in the flags area after the regex, e.g.
      /foo/x
      - A comment begins with (?# and ends with )

      Unfortunately, neither one of those worked out especially well in Vim. For
      turning on extended mode, Vim makes only very light use of "flags" after
      regular expressions. In fact, although it allows a few flags after the :s
      (substitute) command, in general it doesn't use flags after regular
      expressions. In Vim, usually the same effect is achieved by putting
      special codes at the beginning of a regex, such as \c to ignore case.

      And for comments, Using (?# ... ) would work, but would be somewhat
      awkward. In Perl, both the () operator and the ? operator are "magic" by
      default (do not need to be escaped with a backslash to give them special
      meaning). But in Vim, the opposite is true: By default, () just matches
      parentheses, and ? just matches a question mark. So in a Vim regex, a
      comment would look like \(\?# this \), which is just too ugly and too
      tricky for people to remember.

      So I played around with a number of alternative syntax options.

      -----

      1. Syntax for turning on extended mode:

      Consistent with other regex syntax in Vim, it seemed to me that the best
      way to let the user turn on extended mode would be the presence of some
      special sequence at the beginning of the regex, similar to Vim's current
      use of \c or \C for case sensitivity, \m \M \v \V to choose a "magic" mode,
      and so on. Here is a list of all available one-character backslash
      sequences:

      \! \" \# \$ \' \, \- \: \; \g \j \q \y \^ \`

      I would have liked to use \x or \e to indicate extended mode, but both of
      those are already used. (\x means any hex digit; \e means the escape key.)

      Given those choices, my favorite was:

      \#

      ... mainly because "#" is used in many programming languages to begin a
      comment.

      Other possibilities: Vim already uses \% and \z as prefixes for a number of
      other commands, so two options that seem pretty good to me are:

      \%e
      or
      \zx

      I sort of like \%e. It has the advantage of being somewhat mnemonic (e for
      extended), and also it avoids using up a punctuation character (#) that
      might be better saved for other future enhancements.

      -----

      2. Syntax for comments:

      One issue is: Should turning comments on/off require "magic" characters or
      not? At first I thought, of course it would have to include magic
      characters; but then it occurred to me that we could just use a character
      sequence that is somewhat unlikely to appear in regexes, and that is easy
      to represent as regular characters (rather than comment delimiters) in a
      regex if necessary.

      I like {{ double braces }} because:

      - They look nice and are easy to type.
      - They don't conflict with any other regex syntax patterns. Yes, braces
      are used to indicate a count, e.g. x{1,3} for one to three x's, but that
      uses single braces.
      - It is easy to represent a match for the actual characters "{{" in an
      extended regex: Just put a space between them, "{ {".

      Other options:

      If we use \# to turn on extended mode, I thought it might be nice to use
      some sort of comment delimiter that includes the "#" character, but I
      couldn't come up with anything that good. The best I could come up with is
      ## to begin a comment and ## again to end a comment, but that could lead to
      trouble if the user tries to mark off a comment with "#############".
      Other possibilities:
      #( )
      {# #}

      We can't use "#" by itself for comments, with end-of-line indicating the
      end of the comment, because of the way Vim multiline strings work. In Vim,
      when you write

      let x = "this is
      \ a string"

      What you get is, "this is a string". There is no embedded newline in the
      result.

      I also thought it might be nice to somehow use the " double-quote character
      to indicate comments, since that is Vimscript's comment character; but the
      double-quote character would be a bad choice because often, in Vimscript,
      the regex itself is double-quoted, so you would have to backslash-escape
      all the embedded double-quote characters, which would get a bit messy.

      -----

      A few more details about the syntax:

      - Comments support nesting. This is mainly useful while debugging your
      regex, to "comment out" part of it.

      - Comments and extra whitespace are not allowed in places such as inside
      collections such as [a-z], repetition indicators such as {1,3}, in the
      middle of special sequences such as "\%$", and so on.

      --
      You received this message from the "vim_dev" maillist.
      Do not top-post! Type your reply below the text you are replying to.
      For more information, visit http://www.vim.org/maillist.php
    • Mike Morearty
      ... Hi, I just wanted to follow up on this feature patch. There has been no discussion or response of any kind, and I just wanted to make sure I did it right.
      Message 2 of 4 , Dec 17, 2012
      • 0 Attachment
        On Tuesday, December 11, 2012 4:55:50 PM UTC-8, v...@... wrote:
        > Status: New
        > Owner: ----
        > Labels: Type-Defect Priority-Medium
        >
        > New issue 99 by m...@...: Feature: Extended regular expressions
        > http://code.google.com/p/vim/issues/detail?id=99

        Hi, I just wanted to follow up on this feature patch. There has been no discussion or response of any kind, and I just wanted to make sure I did it right. If it's simply a matter of people having more important things they need to get to first, then that's fine.

        This patch adds extended regular expression support to Vim. I added it to the bugbase on code.google.com, and that got auto-forwarded to this mailing list. Is that the right way for me to suggest a feature and offer a patch?

        Thanks! - Mike Morearty <mike@...>

        --
        You received this message from the "vim_dev" maillist.
        Do not top-post! Type your reply below the text you are replying to.
        For more information, visit http://www.vim.org/maillist.php
      • Ben Fritz
        ... Yes, this is a correct way to do it. Bypassing the tracker and going right to the list is generally preferred but all issues entered come here anyway.
        Message 3 of 4 , Dec 17, 2012
        • 0 Attachment
          On Monday, December 17, 2012 11:08:45 AM UTC-6, Mike Morearty wrote:
          > On Tuesday, December 11, 2012 4:55:50 PM UTC-8, v...@... wrote:
          >
          > > Status: New
          >
          > > Owner: ----
          >
          > > Labels: Type-Defect Priority-Medium
          >
          > >
          >
          > > New issue 99 by m...@...: Feature: Extended regular expressions
          >
          > > http://code.google.com/p/vim/issues/detail?id=99
          >
          >
          >
          > Hi, I just wanted to follow up on this feature patch. There has been no discussion or response of any kind, and I just wanted to make sure I did it right. If it's simply a matter of people having more important things they need to get to first, then that's fine.
          >
          >
          >
          > This patch adds extended regular expression support to Vim. I added it to the bugbase on code.google.com, and that got auto-forwarded to this mailing list. Is that the right way for me to suggest a feature and offer a patch?
          >
          >
          >

          Yes, this is a correct way to do it. Bypassing the tracker and going right to the list is generally preferred but all issues entered come here anyway.

          Eventually Bram may add the patch to his todo list, available at http://code.google.com/p/vim/source/browse/runtime/doc/todo.txt

          For what it's worth, I think your idea is a good one in general. I don't have much of an opinion on the syntax at this point.

          This is mostly only useful in a script, so you will need to make sure that it does not interfere with interactive mode. Since you trigger the feature using a new \# flag it shouldn't interfere.

          I'm not sure if there's any existing convention on the introduction of new regex items. The kind of modifier you are adding, which affects the entire pattern, is normally 2 characters now, e.g. \v \C \V etc.

          --
          You received this message from the "vim_dev" maillist.
          Do not top-post! Type your reply below the text you are replying to.
          For more information, visit http://www.vim.org/maillist.php
        • vim@...
          Updates: Labels: patch Comment #1 on issue 99 by chrisbr...@googlemail.com: Feature: Extended regular expressions
          Message 4 of 4 , Jan 9
          • 0 Attachment
            Updates:
            Labels: patch

            Comment #1 on issue 99 by chrisbr...@...: Feature: Extended
            regular expressions
            https://code.google.com/p/vim/issues/detail?id=99

            (No comment was entered for this change.)

            --
            You received this message because this project is configured to send all
            issue notifications to this address.
            You may adjust your notification preferences at:
            https://code.google.com/hosting/settings

            --
            --
            You received this message from the "vim_dev" maillist.
            Do not top-post! Type your reply below the text you are replying to.
            For more information, visit http://www.vim.org/maillist.php

            ---
            You received this message because you are subscribed to the Google Groups "vim_dev" group.
            To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
            For more options, visit https://groups.google.com/d/optout.
          Your message has been successfully submitted and would be delivered to recipients shortly.