regular expression replacements
- A while ago I was planning to make a binding for the PCRS regular
expression library (see below), but I haven't gotten around to working
out how to do bindings. Instead, I've written the following snippet of
// Regular expression substitutions, with wild cards.
// usage example: "wombats are cuddly" rSubstitute
// ==> "bat-woms are cuddly"
regexCache := Map clone
Buffer rSubstitute := method (findRegex, substituteRegex,
r := regexCache atIfAbsentPut (findRegex, Regex clone setPattern
(findRegex)) // cache compiled regular expressions, for speed
r setString (self asString)
regexMap := Map clone
maxSubstitutionTerms := (substituteRegex length / 3 + 1 roundUp) // a
safe upper bound on the number of substitution terms, each of which
takes 3 chars to specify ("\\1" etc.)
r allMatches foreach (i, v,
if (v type != "List") then (v := list (v))
for (n, maxSubstitutionTerms, 0,
groupN := v at (n)
regexMap atPut ("\\" .. n asString, groupN or "")
self := self replace (v at (0), substituteRegex replaceMap (regexMap))
Worth putting in the distribution?
From: Jason Grossman <Jason.Grossman@...>
Date: 16 September 2004 8:19:15 EDT
Subject: [Io] yet more on regexp libraries
Aha. This looks about perfect: http://www.oesterhelt.org/pcrs . It's
modestly numbered version 0.03, but the web site says it's stable ...
and since it's based on PCRE, that claim doesn't seem too unreasonable.
Another thought I had along the way was that it might not be too
ridiculous to bind in the whole of sed! There's a stable version of
sed in less than 100kb at http://www.catb.org/~esr/sed . But PCRS
would probably make for a much neater API.
This might really help to capture part of the perl market - especially
people who are currently tossing up between perl, ruby and python.
From the PCRE docs:
PCRS is a small library, written as a supplement to the PCRE library,
that implements regex based substitution with the syntax and semantics
of Perl's s/// operator. ...
The function pcrs_compile() is called to compile a
pcrs_job from a pattern, substitute and options string.
The resulting pcrs_job structure is dynamically allocated
and it is the caller's responsibility to call
pcrs_free_job() when it's no longer needed.
pcrs_compile_command() is a convenience wrapper function
that parses a Perl command of the form s/pattern/substi-
tute/[options] into its components and then calls
pcrs_compile(). As in Perl, you are not bound to the '/'
character: Whatever follows the 's' will be used as the
delimiter. Patterns or substitutes that contain the delim-
iter need to quote it: s/th\/is/th\/at/ will replace th/is
by th/at and can be written more simply as s|th/is|th/at|.
pattern, substitute, options and command must be zero-ter-
minated C strings. substitute and options may be NULL, in
which case they are treated like the empty string.
Return value and diagnostics
On success, both functions return a pointer to the com-
piled job. On failure, NULL is returned. In that case,
the pcrs error code is written to *err.