Loading ...
Sorry, an error occurred while loading the content.

24130RE: RE: [Clip] REGEX Search Backward

Expand Messages
  • nullclip
    Nov 3, 2013

      Axel/John,


      Your latest Find/regex combination hit the sweet spot.  The list of URLs is a simple list without quotes or brackets.  Each URL is on a separate line proceeded by only new-line characters.  So I created the following command line based on your ideas.


      ^!Find "(?s).+\r\n\K(https?://|www\.)[^\x20"\r\n<>]+" IORSW


      I remain confused about how this works.

      I understand all but the portion (?s).  (A search for ?s in the NTP regex help file finds a thousand 'is' and 'as'.  Aaaarrrggghhhh!!!)

      What is the purpose of each character in (?s), and taken as a whole?

      Do I need both \r and \n to get the job done?  Testing suggests that only the \n is required.

      If I use both of the new-line characters, then is one or both included in the results of the search by some greedy process?

      Is there a single character (^%NL%) that includes both?  Is ^%NL% recognized/legal in a regex search?

      Apparently, the order of the characters before the \K matters.  \n must follow \r.  If both are required to form a new line, then why does their order matter?

      Finally, is there a good searchable regex reference (web, book, help file) where I can get useful information?  For example, I cannot even search for .+ in the regex help file included with NTP.

      Regex tools?


      Thank you both for your expert help in sorting though all this.



      ---In ntb-clips@yahoogroups.com, <ntb-clips@yahoogroups.com> wrote:

      Additionally, many URL's are enclosed in angle brackets. In order to start the capture at the beginning of the url in every case, and assuming you don't want to capture the angle brackets if present, then another negative class should be added to the .+ term so that none of these things can be caught up in the greediness.

      ^!Find "(?s).+[^\r\n</\"][</"]*\K(https?://|www\.)[^\x20"\r\n<>]+" IORSW

      So, now the .+ can't end with <, " or /. If < or " are present, they are passed but not captured. Now, if the http is first, it will be captured, but if the www is first, it will be captured.

       

      Regards,
      John
      RecipeTools Web Site: http://recipetools.gotdns.com/
      John's Mags Yahoo Group:  http://groups.yahoo.com/group/johnsmags/

       

      From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Axel Berger
      Sent: Saturday, November 02, 2013 23:58
      To: ntb-clips@yahoogroups.com
      Subject: Re: [Clip] REGEX Search Backward

       

       

      nullclip@... wrote:

      > The regex finds and highlights only www.logicalchess.com/ instead of
      > the full http://www.logicalchess.com/.

      Yes, John already mentioned that problem himself. If the start can be
      either http or www and the term before is greedy, then you'll capture as
      little as possible. To solve this you have to look at what always comes
      directly before your string. It may be an equals (=) or a quote, if the
      URL is always placed in quotes. Assuming the latter I get:

      ^!Find "(?s).+"\K(https?://|www\.)[^\x20"\r\n<>]+" IORSW

      As you never specified what comes outside your search string, I had to
      guess here.

      Axel

    • Show all 25 messages in this topic