16651Re: [Clip] Re: Creation of clip
- Jun 21, 2007Flo wrote:
> Sheri wrote...Hi Flo,
>> How many keywords? If not more than a few hundred could
>> possibly use something like this (uses regular expression
>> ^!Setlistdelimiter ^P ;next is one long line ^!Set
>> i)^.*(comprehensive|switch|system).*^%dollar%";0)$ ;end long
>> line ^!Toolbar New Document ^!InsertText ^%linesout%
> In fact, the alternation to be used with ^$GetDocMatchAll$ seems to
> be limited. When testing this with a file of 250 keywords, and a text
> of 16,000 lines, it works fine. It fails when taking those 250
> keywords as text, and 16.000 words as keywords. NT5 reacts with the
> "Regex error: internal error: overran compiling workspace".
> (You may test it with those files at http://flogehrke.homepage.t-
> online.de/491/ntf-wordlist.zip we used for testing another clip some
> month ago.)
> Is this limitation definable in any way?
I don't think it is definable per se. You could test generated patterns
in clips with ^!IfRegexOK. You can retrieve the error message (if not
ok) with ^$GetRegexErrorMsg$. A clip could possibly take corrective
action for some errors (like reducing the number of alternatives to
processed at one time).
PCRE 7.2 was just released, and it says it corrected this:
"A pattern with a very large number of alternatives (more than several
hundred) was running out of internal workspace during the pre-compile
phase, where pcre_compile() figures out how much memory will be needed.
A bit of new cunning has reduced the workspace needed for groups with
alternatives. The 1000-alternative test pattern now uses 12 bytes of
workspace instead of running out of the 4096 that are available."
I don't think it will be too long before NoteTab incorporates the
update. However, there are other factors besides "internal workspace"
that affect how many alternatives will work. When working on the stop
list clip, I remember an error message that the regular expression was
"too long". In one of the stop list clips, I applied the keywords in
approximately 10K chunks and that worked at that time (think it was pcre
- << Previous post in topic Next post in topic >>