RE : GSoC Regexp engine
- Nice to hear from you!
On 5/31/07, Ian Young <ian.greenleaf@...> wrote:
> Hi all,
> I'm Ian, one of the two students working on improving the regexp
> engine in Vim for this year's Google Summer of Code. I haven't had a
> whole lot to contribute as of yet, but now that work is underway, I'll
> probably pop up here asking lots of questions some days.
> Right now we're working on getting things set up and building a
> testing suite, but I thought I would spark some discussion on a design
> decision that will be coming up after we finish this phase, which is
> whether to implement the new model ourselves, or use an alternative
> engine, like TRE: <http://laurikari.net/tre/>. I'm tempted to
> implement one ourselves, as it's an intellectually stimulating
yes, and share the fun with us too...
> prospect, but that doesn't mean I won't listen to reason if TRE or
> another option is far better. I don't know much about the internals of
> TRE, but according to previous posts to this list, it utilizes three
> engines: a slow one for handling backreferences (presumably similar to
> Vim's current engine), a fast one for most cases (what we are looking
> to implement), and one for their 'fuzzy matching' feature.
> I have a couple questions to start things off. First: I couldn't see
> much need for 'fuzzy matching' in Vim, but some of you are probably
> much better acquainted with regexp use cases than I am. Would this be
> a useful feature to have available?
>From my previous experiences with the list, I think it can be left out
for now. ( and nikolai believe even without the fuzzy stuff it would
take a hell lot of effort )
> Second: We might have to do some
> gymnastics to work with multibyte characters, as discussed here: <
> http://tech.groups.yahoo.com/group/vimdev/message/46408>. I haven't
> worked with multibyte characters before, so I'm not clear on the
> Would this translation to wide characters before passing
> to the engine cause much of a performance hit and/or be excessively
> complicated to implement? On a side note, TRE's main page says it has
> both wide character and multibyte character support. I couldn't find a
> version history, so I'm not sure if this is a new feature that Nikolai
> isn't aware of, or if we need something more.
> I'm interested to hear what you all have to say. We don't need to make
> this decision until middle of next week at the earliest, but I thought
> I would get the discussion going now.
Best of luck ...