24130RE: RE: [Clip] REGEX Search Backward
- Nov 3, 2013
Your latest Find/regex combination hit the sweet spot. The list of URLs is a simple list without quotes or brackets. Each URL is on a separate line proceeded by only new-line characters. So I created the following command line based on your ideas.
^!Find "(?s).+\r\n\K(https?://|www\.)[^\x20"\r\n<>]+" IORSWI remain confused about how this works.I understand all but the portion (?s). (A search for ?s in the NTP regex help file finds a thousand 'is' and 'as'. Aaaarrrggghhhh!!!)What is the purpose of each character in (?s), and taken as a whole?Do I need both \r and \n to get the job done? Testing suggests that only the \n is required.If I use both of the new-line characters, then is one or both included in the results of the search by some greedy process?Is there a single character (^%NL%) that includes both? Is ^%NL% recognized/legal in a regex search?Apparently, the order of the characters before the \K matters. \n must follow \r. If both are required to form a new line, then why does their order matter?Finally, is there a good searchable regex reference (web, book, help file) where I can get useful information? For example, I cannot even search for .+ in the regex help file included with NTP.Regex tools?
Thank you both for your expert help in sorting though all this.
---In email@example.com, <firstname.lastname@example.org> wrote:
Additionally, many URL's are enclosed in angle brackets. In order to start the capture at the beginning of the url in every case, and assuming you don't want to capture the angle brackets if present, then another negative class should be added to the .+ term so that none of these things can be caught up in the greediness.
^!Find "(?s).+[^\r\n</\"][</"]*\K(https?://|www\.)[^\x20"\r\n<>]+" IORSW
So, now the .+ can't end with <, " or /. If < or " are present, they are passed but not captured. Now, if the http is first, it will be captured, but if the www is first, it will be captured.
> The regex finds and highlights only www.logicalchess.com/ instead ofYes, John already mentioned that problem himself. If the start can be
> the full http://www.logicalchess.com/.
either http or www and the term before is greedy, then you'll capture as
little as possible. To solve this you have to look at what always comes
directly before your string. It may be an equals (=) or a quote, if the
URL is always placed in quotes. Assuming the latter I get:
^!Find "(?s).+"\K(https?://|www\.)[^\x20"\r\n<>]+" IORSW
As you never specified what comes outside your search string, I had to
- << Previous post in topic Next post in topic >>