Loading ...
Sorry, an error occurred while loading the content.

301Re: [jasspa] Greediness of regexp '+', '*' operators

Expand Messages
  • Jon
    Sep 13, 2000
    • 0 Attachment
      We had a lot of discussion about this (myself + Steve) when
      the new RE engine was developed. The old behaviour was the
      minimal set as you pointed out below, which did appear to
      be a little more logical (I had actually modified the search
      to this behaviour years ago). However it is very confusing when
      you move to other packages when the search has the same syntax
      and you get different results. For this reason it was more prudent
      to be conforment with other packages which basically means that
      your RE must be unambiguous hence for the search below then one
      would use:-

      "<FONT[^>]*>"

      In fact the old shortened search actually used to fail more
      because it used to bail out earlier. One could specify a
      RE that was quite clearly within a line and would
      not find it because it never looked far enough (OK - I admit
      the old search engine was flawed).

      I would also point out that when you specify the shortened
      RE you also sometimes do not get what you want. In the
      same way that you are getting "too much" matching
      below, with the shortened RE you sometimes do not
      "get enough". So to be honest I think you have just
      made the RE syntax a little bigger and now have 2 problems
      instead of one !! (One also has to bear in mind that the
      search engine is a real hairy piece of code and is not
      to be messed with lightly).

      So, I've kind of made up my mind the greedy RE is better -
      you just have to be a little bit more specific as to what
      you want. Steve's new RE engine is now real fast and works
      a treat for incremental searches with '*'s and '+'s
      present (used to be dead slow).

      Well that's the end of my ramblings !!

      Jon.

      Thomas Hundt wrote:
      >
      >
      > When used in isearch-forward or query-replace-string regular expressions, the '+' and '*' quantifiers will match as many characters as possible, apparently stopping at a newline.
      >
      > For example, I wanted to remove the FONT tags in the html below, by doing a query-replace-string of "<FONT.+>" with "". But ME went and matched not what I wanted ("<FONT FACE="Verdana, MS Sans Serif, Geneva" SIZE="-1">") but the whole rest of the line, too: "<FONT FACE="Verdana, MS Sans Serif, Geneva" SIZE="-1"><B>Mixed Drinks/Liquor</B></FONT></NOBR></TD>". The "+" matched as many characters as possible. Some people call this "greediness".
      >
      > This is a problem not just in ME, but crops up in various places. One way of dealing with it (seen in TCL and Perl) is a "?" qualifier used after the "*" or "+" to tell it to act in non-greedy fashion, i.e., to match as few characters as possible. I think it would be nice if ME had something like this.
      >
      > [example html code]
      > <TD><NOBR><FONT FACE="Verdana, MS Sans Serif, Geneva" SIZE="-1"><B>Mixed Drinks/Liquor</B></FONT></NOBR></TD>
      > <TD><NOBR><FONT FACE="Verdana, MS Sans Serif, Geneva" SIZE="-1"><B>Wine</B></FONT></NOBR></TD>
      > <TD><NOBR><FONT FACE="Verdana, MS Sans Serif, Geneva" SIZE="-1"><B>Beer</B></FONT></NOBR></TD>
      > </TR>
      >
      > -Th
      >
      > __________________________________________________________________________
      >
      > This is an unmoderated list. JASSPA is not responsible for the content of
      >
      > any material posted to this list.