Loading ...
Sorry, an error occurred while loading the content.

22066Re: [NTB] Problems with search/replace regexp chunks

Expand Messages
  • Sheri
    Feb 25, 2011
    • 0 Attachment
      On 2/25/2011 4:31 PM, filwi586 wrote:
      > Hi folks,
      > I'm looking for a light editor and NoteTab Pro seemed like a good one.
      > But trying it out I've encountered a strange problem: it seems like
      > there's a very small limit on the size of the chunks you can find
      > using a regular expression.
      > I'm trying to replace parts of an html document. The regexp works fine
      > (on the form <tag>(.|\s)*?<tag>) when i try to capture tags that are
      > close to one another (in terms of the amount of code between them).
      > But once the tags get further away, like trying to capture an entire
      > table or the entire document (<html>(.|\s)*?</html>) NoteTab gives me
      > back a "not found" message.
      > I've checked that I'm at the beginning of the document and searching
      > in the right direction and I've tried doing it using clips (with a
      > ^!Find or ^!Replace) to no avail.
      > Is there some sort of size limit on what you can search for or a
      > maximum size on what a find can return? Or is it only the trial
      > version that's limited?

      There are no fixed limits specific to the trial or even the light
      version with respect to regex. However the editor uses PCRE as its regex
      engine and you need to learn your way around PCRE. There is a regex help
      file which is available in the Help menu.

      I've not tried your pattern, but it appears you are trying to include
      line breaks in what matches dot. The best way to do that is to enable
      the dotall option by including (?s) at the start of your pattern. Use of
      nested unlimited repeats, e.g., by putting an asterisk outside of
      parentheses is not the best way to go and consumes substantial
      unnecessary resources. Another observation, usually there is only one
      set of <html> tags in a document. So it would be better to be greedy
      with "+" than nongreedy with "*?". Also there is bound to be at least
      one character between the tags, so it would be better to just use a
      plus. While you can capture the material between the tags, its not
      necessary. You could avoid a capture by using a look behind assertion
      for, or \K after, "<html>" and a look ahead assertion for the closing
      tag. e.g.: (?s)<html>\K.+(?=</html>)

      For tags sets that might have more than one set in the document, or if
      the match is likely closer to the start than the end of the document
      you could try (?s)<tagname>\K.+?(?=</tagname>)

      In replace operations, you can reference the whole match as $0

      Using the suggested patterns, the whole match should include only what's
      between the desired tags.

      Regex is an advanced topic, please join the clips list and ask there if
      you need more help with this.

    • Show all 3 messages in this topic