Re: [NH] find/replace in tags
- Interviewed by CNN on 6/10/2008 21:39, Don Strack told the world:
>> --------------------------------------------------OK, let's take one detail at a time.
>> Reg. Exp.: \<td class\="[^"]*"\>([^<]*)\</td\>
>> Replace with: <td>$1</td>
>> [ ] Case Sensitive [x] Regular Exp.
>> Does this help?
>> I suggest you dive into Regular Expressions. The NTP help file
>> some hints, and you might find some links to tutorials here:
> I guess I asked the wrong question. What I meant to ask was:
> How do I strip everything from inside the <td> tags. I tried the above
> string, I'm not able to figure out what to leave in and what to take out.
> I see what the \ before the various characters does, to maintain the
> <td> and </td>, and the " part of the class=, but what do the other
> parts do?
Square brackets define a "class" of characters. Meaning that they match
any character that belongs to that class.
The "^" as the first character inside the square brackets INVERTS the
definition of the class. Thus, [^"] means "any character EXCEPT double
The asterisk means "0 or more occurrences of the thing that comes
before". So, [^"]* will match any string that does NOT contain a double
quote -- that is, it will match everything from the point it begins
until it finds a double quote (the double quote will "stay outside" of
the match, so to speak).
The parentheses assign whatever matched the string inside to a variable.
So, the second example will find a string that ends just before the next
HTML tag, and assign it to a variable. Then, in the "replace" field, you
can use a token (like $1) to insert back that string.
This is explained in the "Help on Regular Expressions" in the NoteTab
Help menu. There are also lots of places around the Web where you can
find more about regular expressions.
Jury: Twelve people who decide who has the better lawyer
* TagZilla 0.066 on Seamonkey 1.1.12
- Don Strack (donstrack@...) wrote (in part) (on 2008-10-06 at
> Whatever I try, it either finds nothing, or selects everything betweenDon: In case you are still struggling after Rudolf's and Marcello's posts?
> the first <td> and the last </td>
Sounds like you might be getting bitten by regex "greedy" mode.
Though I'm not sure how it would get from first to last unless the html
code is all in one line. You can control whether a . (dot) can span more
than one line by putting either (?s) -span multiple lines or (?-s)
-span ONLY one line at the beginning of your regular expression. If
neither is specified, (?-s) is implied. So:
o will find from the first <td> to the LAST </td> in the
entire file (greedy mode)
o will find from the first <td> to the NEXT </td> even if its
on the next or following lines. (non-greedy mode)
Your original question was:
> How do I strip the "class" contents of any and all tags, as below:ie. class can appear in other tags than just <td> and (I assume) any
> change this:
> <td class="[whatever]">[whatever]<
> to this:
other attributes are to be left unchanged as is - CORRECT?
I *think* this should work:
^!replace "\bclass=".*?" >> "" rwsai
In English: match from a word boundary (\b) followed by literal class="
to the very next occurrence of quote (.*?") and replace everything
matched by nothing.
If this doesn't work and/or you can't make it work with a tweaking you
can guess at .... Could you take one old file and get it to the state
before you wanted to apply this fix and manually change it to what you
want the result to be and zip the two files together as before.html and
I created a folder in our yahoo-group files:
you can post the zip.
Regards ... Alec (buralex@gmail & WinLiveMess - alec.m.burgess@skype)
[Non-text portions of this message have been removed]