Loading ...
Sorry, an error occurred while loading the content.

67783Re: [patch] improved equivalent classes in regular expressions

Expand Messages
  • Dominique Pellé
    Jan 21, 2013
    • 0 Attachment
      Christian Brabandt wrote:

      > Hi Dominique!
      >
      > On Mi, 16 Jan 2013, Dominique Pellé wrote:
      >
      >> When using equivalent class [[=x=]], I realized that what I
      >> generally want, is to use it on the full strings rather than on
      >> a single characters. Searching for "foobar" with...
      >>
      >> /[[=f=]][[=o=]][[=o=]][[=b=]][[=a=]][[=r=]]
      >>
      >> ... works but is rather unpleasant. I wish there was a flag
      >> such as \q switch on equivalent class, which would
      >> work like \c for case insensitivity. So instead of the above
      >> regexp, I could search for:
      >>
      >> /\qfoobar
      >>
      >> As far as I know \q is unused in Vim regexp, so
      >> that should not break compatibility.
      >>
      >> Maybe there could also be a function normalize({expr}}
      >> (any better name?) that given a string with diacritics
      >> "fňóbâr" returns "foobar" in similar way to tolower({expr}})
      >> which returns a lowercase version of the string.
      >>
      >> Before I spend time trying to do that, would it be useful
      >> and accepted?
      >
      > Indeed, that looks like a useful addition.

      I have no time now for that unfortunately, but maybe in a few weeks.

      > I have another idea with regards to equivalence classes:
      > When searching for /[[=ß=]] this should translate into /sz. But that is
      > more complicated, since a search for /[s][z] wouldn't match ß (eszet)
      > anymore.

      You obviously speak better German than me, but isn't the German
      ess-zett equivalent to ss rather than sz? I'm curious why /sz.

      >> Regarding the few characters that are no longer equivalent,
      >> I find it odd from a user point of view. For example U+01e4
      >> (LATIN CAPITAL LETTER G WITH STROKE) was equivalent
      >> to uppercase G but it is no longer equivalent to G.
      >> Yet some other letters with stroke are still equivalent.
      >> For example, U+0141 (LATIN CAPITAL LETTER L WITH STROKE)
      >> is still equivalent to L. It seems inconsistent, even if that's
      >> what the ISO standard says. Previous behavior made more
      >> sense to me for U+1e4 at least.
      >
      > Fixed with the latest patch.

      Yes, I saw that. Thanks!

      --
      You received this message from the "vim_dev" maillist.
      Do not top-post! Type your reply below the text you are replying to.
      For more information, visit http://www.vim.org/maillist.php
    • Show all 11 messages in this topic