24122RE: RE: RE: RE: RE: [Clip] REGEX Search Backward
- Nov 2, 2013
^!Find "(?s)\A.+\K(https?://|www\.)[^\x20\r\n<>]+" AIORSW
There may be problems with this, because it may ignore the http section if the www section is present.
I don't know why it doesn't work for you. Try it in a non-outline document, where it works for me every time.
I investigated the problem and discovered the following:
- The use of A and/or W options had no impact on the Find command in this application.
- If I placed my cursor in the middle of any URL in the text, then the search would not work. No change in the document or error message resulted.
- If I placed my cursor at the start of the last URL in the text, then the search would not work. No change in the document or error message resulted.
- If I placed my cursor anywhere after the last URL in the text, then the search would not work. No change in the document or error message resulted.
- If I placed my cursor anywhere in the text other than in any URL or at the start of the last URL or after the last URL, then the search worked.
- The above is true for both document and outline text.
Your note that excludes a " from the search is good. Some URLs begin simply www... How should I modify the regex criteria to account for them also?
I tested on a regular document, and the following worked to capture the last URL.
^!Find "(?s)\A.+\Khttps?://[^\x20\r\n<>]+" AIORSW
Note the change in options. It repeatedly captures the last url regardless of where in the document the cursor is positioned. Note that if you have any html in your document, you should put a double quote in your negative class to prevent capturing trailing double quotes that are not part of the url.
Don't beat yourself up. You are the alpha smart guy here.
I tried your latest regex, and it worked, but only if I began the search from the start of the text. Is that what you intended?
I suppose it is reasonable that I cannot start the search from the end of the text, but I find it odd that I cannot start the search from anywhere before the last URL.
Greed and \K are not related, the \K only defines the start of the capture section. Oh, wait! I didn't specify the \A, and that's why it didn't work. Duh.
^!Find "(?s)\A.+\Khttps?://[^\x20\r\n<>]+" IORS
A lesson in regex - it can be pretty tricky, but it works. It is SO easy to overlook little things that make things not work as expected. I have stared at lines for a long time and not seen what is wrong - I call them dumb loops, and it happens to me far too often.
That worked, but only if I started my search anywhere above the last instance of a URL in the text.
A search for (?s)\A did not find the top of the text like I thought you had indicated. Are these regex options that must be specified before search criteria? Oh, wait - I wrote it without the \A! Dumb.
The search ^!Find "https?://[^\x20\r\n<>]+" IORS finds the next instance of a URL.
How does \K 'reset start of match' (defined in NTP regex help file) equate to greed?
Because regex is naturally 'greedy', it will always go to the last instance that meets the criteria in the absence of a control to prevent that. Also, you don't need commas with the options. I have never used that 'B' option, but it is not a regex expression, it is part of the scripting language of NoteTab. The regular expression engine is from pcre and is regularly updated, while the scripting language in NoteTab is independent of pcre other than the fact that it facilitates USING regex in its programming scripts.
^!Find "(?s).+\Khttps?://[^\x20\r\n<>]+" IORS
should work. That first .+ tells it to gather everything up to finding the term of interest, even if it means passing up that term multiple times. So it will find the last one. If you use an A option, it will only find the first one. (or you could use .+? to achieve the same goal). The \K says to ignore everything up to that last http. Assumes you only want to highlight/select that last URL.
I have a habit of always placing my options in alphabetical order and upper case, which enables me to always find a specific set of options, knowing that they will be in only one order, regardless of how many there are.
I do not understand how to use your suggestion. Perhaps you can provide an example. Here is what I attempted to do from the bottom of my outline topic text, last character:
^!Find "https?://[^\x20\r\n<>]+" R,I,O,S,B
; R: Specifies that the search criteria represents a regular expression.
; I: Ignores character case.
; O: Only searches in current outline topic.
; S: Silent search. NoteTab will not display any message box.
; B: Searches backwards.
How would you rewrite the Find command?
My objective is to find the last URL in the topic.
- << Previous post in topic Next post in topic >>