Loading ...
Sorry, an error occurred while loading the content.
 

RE: [jasspa] Possible feature request: regexp nongreedy matching

Expand Messages
  • Phillips, Steven
    Tom, The * operator is exactly the same as the + operator in regex except it will match no matches, i.e. its equivalent to a + and ? combined. It is
    Message 1 of 2 , Jul 21, 2005
      Tom,

      The '*' operator is exactly the same as the '+' operator in regex except it will match no matches, i.e. its equivalent to a '+' and '?' combined. It is just as greedy as the '+' so MEs regex behaviour is as I would expect.

      This is a common problem which sometimes has no easy solution but usually it does, you just need to get a little more perverse! For example your problem below is easily solved by using "<[^>]*>" this works in most situations.

      It starts to have a few problems when you want to match the following (in C):

      { "string with } oh dear" }

      Not a problem for html (as a string would contain >) but is a common problem in other languages. In this case get the regex to match the string within the {...} tag so it will skip over the '}' in the string, to do this use "{\([^}"]\|"[^"]*"\)*}"

      Now lets make it a little harder, the string can contain quotes which would break the above, e.g.:

      { "string with } \" oh dear" }

      So now we use "{\([^}"]\|"\([^"\\]\|\\.\)*"\)*}", and so it goes on. Where you do tend to get really stuck is when the end 'tag' is a string, for example consider matching '<div>...</div>', you have to use something like "<div>\([^<]\|<[^/]\|</[^d]\|</d[^i]\|</di[^v]\|</div[^>]\)*</div>", this is where ".*?" would be most useful as the above could be done as just "<div>.*?</div>".

      Steve


      > -----Original Message-----
      > From: jasspa@yahoogroups.com [mailto:jasspa@yahoogroups.com] On Behalf Of
      > Thomas Hundt
      > Sent: Thursday, July 21, 2005 4:12 AM
      > To: JASSPA MicroEmacs Mailing List (W)
      > Subject: [jasspa] Possible feature request: regexp nongreedy matching
      >
      > I love the availability of regular expressions in search and replace
      > operations. And I'm glad that they are becoming more "standard" (e.g.,
      > use of \d and \D instead of that stuff I could never remember before).
      >
      > I'd love it even more if it had the ability to specify nongreedy
      > matching. (In Perl this is the '?' after a '+' or '*'.)
      >
      > Example of where it would be useful: I wanted to remove the HTML from
      > some text, like this (note these lines are wrapped, each begins with
      > "<div" and ends with "</div>"):
      >
      > <div style="position:absolute;top:14500;left:108"><nobr>these barriers,
      > some interesting information </nobr></div>
      > <div style="position:absolute;top:14520;left:108"><nobr>was gathered.
      > </nobr></div>
      > <div style="position:absolute;top:14562;left:108"><nobr>CVPD officers
      > administered the surveys. </nobr></div>
      > <div style="position:absolute;top:14582;left:108"><nobr>They found that
      > many of the offenders liked </nobr></div>
      > <div style="position:absolute;top:14603;left:108"><nobr>to target
      > parking lots since they offered so </nobr></div>
      > <div style="position:absolute;top:14624;left:108"><nobr>many vehicle
      > choices in unguarded settings. </nobr></div>
      > <div style="position:absolute;top:14644;left:108"><nobr>Many said they
      > took orders from "higher-</nobr></div>
      > <div style="position:absolute;top:14665;left:108"><nobr>ups" for
      > specific vehicles, makes, and </nobr></div>
      > <div style="position:absolute;top:14686;left:108"><nobr>models. Many
      > worked with a second person </nobr></div>
      >
      > The quick and dirty way is to do a search and replace of <.+> with an
      > empty string. However, this doesn't work: It insists on matching as
      > greedily as possible, i.e., the longest string possible. (This is
      > documented and not unexpected.) So, I changed it to <.*> which
      > supposedly matches the shortest string possible. This still didn't do
      > what I wanted -- it matched more than one tag at a time, and included
      > some non-tag text. I'm not even sure why. It seemed to work okay on
      > the first line I ran it against, and then wanted to match the entire
      > next line. Very strange. (Perhaps it's getting confused with the "\>"
      > construct which matches an empty string?)
      >
      > I ended up writing the buffer to a file and running Perl against it:
      > perl -nwe "s/\<.+?>//g; print " <foo.txt
      > which did exactly what I wanted, namely, spitting out this text ('>'
      > added by me):
      > these barriers, some interesting information
      > was gathered.
      > CVPD officers administered the surveys.
      > They found that many of the offenders liked
      > to target parking lots since they offered so
      > many vehicle choices in unguarded settings.
      > Many said they took orders from ôhigher-
      > upsö for specific vehicles, makes, and
      > models. Many worked with a second person
      >
      > Side note: There used to be a way to pipe a buffer through a command but
      > I couldn't get it to work in this case (ipipe-shell-command). Probably
      > it's too much to ask for a poor Windoze system to do stuff like this
      > properly. ;-) Or it's a paths/environment problem. I'm not as
      > concerned about this, I'm just mentioning it as an fyi in case there are
      > known issues around this.
      >
      > Thanks for your consideration
      >
      > -Tom Hundt
      >
      >
      >
      >
      >
      >
      >
      >
      > __________________________________________________________________________
      >
      > This is an unmoderated list. JASSPA is not responsible for the content of
      > any material posted to this list.
      >
      > To unsubscribe, send a mail message to
      >
      > mailto:jasspa-unsubscribe@yahoogroups.com
      >
      > or visit http://groups.yahoo.com/group/jasspa and
      > modify your account settings manually.
      >
      >
      >
      > Yahoo! Groups Links
      >
      >
      >
      >
    Your message has been successfully submitted and would be delivered to recipients shortly.