Loading ...
Sorry, an error occurred while loading the content.

Greediness of regexp '+', '*' operators

Expand Messages
  • Thomas Hundt
    When used in isearch-forward or query-replace-string regular expressions, the + and * quantifiers will match as many characters as possible, apparently
    Message 1 of 1 , Sep 13, 2000
    • 0 Attachment
      When used in isearch-forward or query-replace-string regular expressions, the '+' and '*' quantifiers will match as many characters as possible, apparently stopping at a newline.

      For example, I wanted to remove the FONT tags in the html below, by doing a query-replace-string of "<FONT.+>" with "". But ME went and matched not what I wanted ("<FONT FACE="Verdana, MS Sans Serif, Geneva" SIZE="-1">") but the whole rest of the line, too: "<FONT FACE="Verdana, MS Sans Serif, Geneva" SIZE="-1"><B>Mixed Drinks/Liquor</B></FONT></NOBR></TD>". The "+" matched as many characters as possible. Some people call this "greediness".

      This is a problem not just in ME, but crops up in various places. One way of dealing with it (seen in TCL and Perl) is a "?" qualifier used after the "*" or "+" to tell it to act in non-greedy fashion, i.e., to match as few characters as possible. I think it would be nice if ME had something like this.

      [example html code]
      <TD><NOBR><FONT FACE="Verdana, MS Sans Serif, Geneva" SIZE="-1"><B>Mixed Drinks/Liquor</B></FONT></NOBR></TD>
      <TD><NOBR><FONT FACE="Verdana, MS Sans Serif, Geneva" SIZE="-1"><B>Wine</B></FONT></NOBR></TD>
      <TD><NOBR><FONT FACE="Verdana, MS Sans Serif, Geneva" SIZE="-1"><B>Beer</B></FONT></NOBR></TD>
      </TR>


      -Th
    Your message has been successfully submitted and would be delivered to recipients shortly.