22066Re: [NTB] Problems with search/replace regexp chunks
- Feb 25, 2011On 2/25/2011 4:31 PM, filwi586 wrote:
>There are no fixed limits specific to the trial or even the light
> Hi folks,
> I'm looking for a light editor and NoteTab Pro seemed like a good one.
> But trying it out I've encountered a strange problem: it seems like
> there's a very small limit on the size of the chunks you can find
> using a regular expression.
> I'm trying to replace parts of an html document. The regexp works fine
> (on the form <tag>(.|\s)*?<tag>) when i try to capture tags that are
> close to one another (in terms of the amount of code between them).
> But once the tags get further away, like trying to capture an entire
> table or the entire document (<html>(.|\s)*?</html>) NoteTab gives me
> back a "not found" message.
> I've checked that I'm at the beginning of the document and searching
> in the right direction and I've tried doing it using clips (with a
> ^!Find or ^!Replace) to no avail.
> Is there some sort of size limit on what you can search for or a
> maximum size on what a find can return? Or is it only the trial
> version that's limited?
version with respect to regex. However the editor uses PCRE as its regex
engine and you need to learn your way around PCRE. There is a regex help
file which is available in the Help menu.
I've not tried your pattern, but it appears you are trying to include
line breaks in what matches dot. The best way to do that is to enable
the dotall option by including (?s) at the start of your pattern. Use of
nested unlimited repeats, e.g., by putting an asterisk outside of
parentheses is not the best way to go and consumes substantial
unnecessary resources. Another observation, usually there is only one
set of <html> tags in a document. So it would be better to be greedy
with "+" than nongreedy with "*?". Also there is bound to be at least
one character between the tags, so it would be better to just use a
plus. While you can capture the material between the tags, its not
necessary. You could avoid a capture by using a look behind assertion
for, or \K after, "<html>" and a look ahead assertion for the closing
tag. e.g.: (?s)<html>\K.+(?=</html>)
For tags sets that might have more than one set in the document, or if
the match is likely closer to the start than the end of the document
you could try (?s)<tagname>\K.+?(?=</tagname>)
In replace operations, you can reference the whole match as $0
Using the suggested patterns, the whole match should include only what's
between the desired tags.
Regex is an advanced topic, please join the clips list and ask there if
you need more help with this.
- << Previous post in topic Next post in topic >>