Loading ...
Sorry, an error occurred while loading the content.

16651Re: [Clip] Re: Creation of clip

Expand Messages
  • Sheri
    Jun 21, 2007
    • 0 Attachment
      Flo wrote:
      > Sheri wrote...
      >
      >
      >> How many keywords? If not more than a few hundred could
      >> possibly use something like this (uses regular expression
      >> matching).
      >>
      >> ^!Setlistdelimiter ^P ;next is one long line ^!Set
      >> %linesout%=^$GetDocMatchAll("(?-
      >> i)^.*(comprehensive|switch|system).*^%dollar%";0)$ ;end long
      >> line ^!Toolbar New Document ^!InsertText ^%linesout%
      >>
      >
      > In fact, the alternation to be used with ^$GetDocMatchAll$ seems to
      > be limited. When testing this with a file of 250 keywords, and a text
      > of 16,000 lines, it works fine. It fails when taking those 250
      > keywords as text, and 16.000 words as keywords. NT5 reacts with the
      > message...
      >
      > "Regex error: internal error: overran compiling workspace".
      >
      > (You may test it with those files at http://flogehrke.homepage.t-
      > online.de/491/ntf-wordlist.zip we used for testing another clip some
      > month ago.)
      >
      > Is this limitation definable in any way?
      >
      > Flo
      >
      >
      >
      Hi Flo,

      I don't think it is definable per se. You could test generated patterns
      in clips with ^!IfRegexOK. You can retrieve the error message (if not
      ok) with ^$GetRegexErrorMsg$. A clip could possibly take corrective
      action for some errors (like reducing the number of alternatives to
      processed at one time).

      PCRE 7.2 was just released, and it says it corrected this:

      "A pattern with a very large number of alternatives (more than several
      hundred) was running out of internal workspace during the pre-compile
      phase, where pcre_compile() figures out how much memory will be needed.
      A bit of new cunning has reduced the workspace needed for groups with
      alternatives. The 1000-alternative test pattern now uses 12 bytes of
      workspace instead of running out of the 4096 that are available."

      I don't think it will be too long before NoteTab incorporates the
      update. However, there are other factors besides "internal workspace"
      that affect how many alternatives will work. When working on the stop
      list clip, I remember an error message that the regular expression was
      "too long". In one of the stop list clips, I applied the keywords in
      approximately 10K chunks and that worked at that time (think it was pcre
      6.7 then).

      Regards,
      Sheri
    • Show all 30 messages in this topic