RE : GSoC Regexp engine

  • Asiri Rathnayake
    Message 1 of 1 , Jun 3, 2007
      Nice to hear from you!

      On 5/31/07, Ian Young <ian.greenleaf@...> wrote:
      > Hi all,
      > I'm Ian, one of the two students working on improving the regexp
      > engine in Vim for this year's Google Summer of Code. I haven't had a
      > whole lot to contribute as of yet, but now that work is underway, I'll
      > probably pop up here asking lots of questions some days.
      > Right now we're working on getting things set up and building a
      > testing suite, but I thought I would spark some discussion on a design
      > decision that will be coming up after we finish this phase, which is
      > whether to implement the new model ourselves, or use an alternative
      > engine, like TRE: <http://laurikari.net/tre/>. I'm tempted to
      > implement one ourselves, as it's an intellectually stimulating

      yes, and share the fun with us too...

      > prospect, but that doesn't mean I won't listen to reason if TRE or
      > another option is far better. I don't know much about the internals of
      > TRE, but according to previous posts to this list, it utilizes three
      > engines: a slow one for handling backreferences (presumably similar to
      > Vim's current engine), a fast one for most cases (what we are looking
      > to implement), and one for their 'fuzzy matching' feature.
      > I have a couple questions to start things off. First: I couldn't see
      > much need for 'fuzzy matching' in Vim, but some of you are probably
      > much better acquainted with regexp use cases than I am. Would this be
      > a useful feature to have available?

      >From my previous experiences with the list, I think it can be left out
      for now. ( and nikolai believe even without the fuzzy stuff it would
      take a hell lot of effort )

      > Second: We might have to do some
      > gymnastics to work with multibyte characters, as discussed here: <
      > http://tech.groups.yahoo.com/group/vimdev/message/46408>. I haven't
      > worked with multibyte characters before, so I'm not clear on the
      > subtleties.

      Me neither.

      > Would this translation to wide characters before passing
      > to the engine cause much of a performance hit and/or be excessively
      > complicated to implement? On a side note, TRE's main page says it has
      > both wide character and multibyte character support. I couldn't find a
      > version history, so I'm not sure if this is a new feature that Nikolai
      > isn't aware of, or if we need something more.
      > I'm interested to hear what you all have to say. We don't need to make
      > this decision until middle of next week at the earliest, but I thought
      > I would get the discussion going now.
      > Ian

      Best of luck ...

      - Asiri
