Loading ...
Sorry, an error occurred while loading the content.

Problems with search/replace regexp chunks

Expand Messages
  • filwi586
    Hi folks, I m looking for a light editor and NoteTab Pro seemed like a good one. But trying it out I ve encountered a strange problem: it seems like there s a
    Message 1 of 3 , Feb 25, 2011
    • 0 Attachment
      Hi folks,

      I'm looking for a light editor and NoteTab Pro seemed like a good one. But trying it out I've encountered a strange problem: it seems like there's a very small limit on the size of the chunks you can find using a regular expression.

      I'm trying to replace parts of an html document. The regexp works fine (on the form <tag>(.|\s)*?<tag>) when i try to capture tags that are close to one another (in terms of the amount of code between them). But once the tags get further away, like trying to capture an entire table or the entire document (<html>(.|\s)*?</html>) NoteTab gives me back a "not found" message.

      I've checked that I'm at the beginning of the document and searching in the right direction and I've tried doing it using clips (with a ^!Find or ^!Replace) to no avail.

      Is there some sort of size limit on what you can search for or a maximum size on what a find can return? Or is it only the trial version that's limited?
    • Sheri
      ... There are no fixed limits specific to the trial or even the light version with respect to regex. However the editor uses PCRE as its regex engine and you
      Message 2 of 3 , Feb 25, 2011
      • 0 Attachment
        On 2/25/2011 4:31 PM, filwi586 wrote:
        >
        > Hi folks,
        >
        > I'm looking for a light editor and NoteTab Pro seemed like a good one.
        > But trying it out I've encountered a strange problem: it seems like
        > there's a very small limit on the size of the chunks you can find
        > using a regular expression.
        >
        > I'm trying to replace parts of an html document. The regexp works fine
        > (on the form <tag>(.|\s)*?<tag>) when i try to capture tags that are
        > close to one another (in terms of the amount of code between them).
        > But once the tags get further away, like trying to capture an entire
        > table or the entire document (<html>(.|\s)*?</html>) NoteTab gives me
        > back a "not found" message.
        >
        > I've checked that I'm at the beginning of the document and searching
        > in the right direction and I've tried doing it using clips (with a
        > ^!Find or ^!Replace) to no avail.
        >
        > Is there some sort of size limit on what you can search for or a
        > maximum size on what a find can return? Or is it only the trial
        > version that's limited?
        >

        There are no fixed limits specific to the trial or even the light
        version with respect to regex. However the editor uses PCRE as its regex
        engine and you need to learn your way around PCRE. There is a regex help
        file which is available in the Help menu.

        I've not tried your pattern, but it appears you are trying to include
        line breaks in what matches dot. The best way to do that is to enable
        the dotall option by including (?s) at the start of your pattern. Use of
        nested unlimited repeats, e.g., by putting an asterisk outside of
        parentheses is not the best way to go and consumes substantial
        unnecessary resources. Another observation, usually there is only one
        set of <html> tags in a document. So it would be better to be greedy
        with "+" than nongreedy with "*?". Also there is bound to be at least
        one character between the tags, so it would be better to just use a
        plus. While you can capture the material between the tags, its not
        necessary. You could avoid a capture by using a look behind assertion
        for, or \K after, "<html>" and a look ahead assertion for the closing
        tag. e.g.: (?s)<html>\K.+(?=</html>)

        For tags sets that might have more than one set in the document, or if
        the match is likely closer to the start than the end of the document
        you could try (?s)<tagname>\K.+?(?=</tagname>)

        In replace operations, you can reference the whole match as $0

        Using the suggested patterns, the whole match should include only what's
        between the desired tags.

        Regex is an advanced topic, please join the clips list and ask there if
        you need more help with this.

        Regards,
        Sheri
      • filwi586
        Thanks! Worked like a charm :)
        Message 3 of 3 , Feb 26, 2011
        • 0 Attachment
          Thanks! Worked like a charm :)
        Your message has been successfully submitted and would be delivered to recipients shortly.