Loading ...
Sorry, an error occurred while loading the content.

RE: [NTS] Finding Pairwise Matches

Expand Messages
  • Art Kocsis
    ... Thank you John for your response. It does seem to work fine. I made some slight changes to your suggestion: Used left & right chevrons as the [] chars
    Message 1 of 6 , Sep 27, 2012
    • 0 Attachment
      >From: ntb-scripts@yahoogroups.com [mailto:ntb-scripts@yahoogroups.com] On
      >Behalf Of Art Kocsis
      >Sent: Sunday, September 23, 2012 02:39
      >To: NoteTab-Scripts
      >Subject: [NTS] Finding Pairwise Matches
      >I am past pulling hair and am now down to scalp and it's getting bloody so
      >maybe someone here can help.
      >Art,
      >
      At 9/23/2012 06:05 AM, Jophn wrote:
      >;Replace first double quote with opening bracket
      >^!Replace "^[^\r\n\"]*\K\"" >> "[" ARSW
      >;Replace next double quote with closing bracket
      >^!Replace "^[^\r\n\"]*\K\"" >> "]" ARSW
      >;Repeat as long as double quotes exist
      >^!IfError Next Else Skip_-2
      >;Replace commas between opening and closing brackets
      >^!Replace "\[[^\r\n,\]]*\K,(?=.*\])" >> "" ARSW
      >^!IfError Next Else Skip_-1
      >;Change brackets back to double quotes
      >^!Replace "[\[\]]" >> "\"" ARSW
      >^!IfError Next Else Skip_-1
      >Should be pretty fast.
      >=============================
      >nnnn,"xx,xx",,ss,ss,ssss
      >nnnn,"xx,xx",,"xx,,xx",ssss
      >nnnn,"yyyy",,,"yyyy",ssss
      >nnnn,"yyyy",,"yyyy",,,"xx,xx,",sss
      >nnnn,"yyyy",,"x,xxx","yyyy",,,"xx,xx,",sss
      >nnnn,sssss,,"xx,xx,",sss,"yyyy",
      >nnnn,sssss,,,"yyyy","xx,xx,",sss

      Thank you John for your response. It does seem to work fine.
      I made some slight changes to your suggestion:
      Used left & right chevrons as the [] chars could appear in the text
      Used a RegEx pattern to change only matched pair of quotes to chevrons
      as your separate commands would also change orphaned quotes

      So the essence of the clip is elegant, compact and entirely RegEx - just
      what I wanted (earlier tests for uniqueness of temp char & chevrons not shown):
      ;====================================
      ^!Set %tc%=§
      ;First replace all matching double quote pairs with left & right chevron
      pairs (« and »)
      ^!Replace "^.*?\K\"(.*?)\"" >> "«$1»" AIRSTW
      ^!IfError Next Else Skip_-1

      ;Next, replace all embedded commas between chevron pairs with the unique
      temp char
      ^!Replace "«[^\r\n,»]*\K,(?=.*»)" >> "^%tc%" AIRSTW
      ^!IfError Next Else Skip_-1

      ;Next, delete all left & right chevrons
      ^!Replace "«" >> "" AIRSTW
      ^!Replace "»" >> "" AIRSTW

      ;Finally, check if there were any unmatched double quote chars remaining
      ^!Find "\"" AIRSTW
      ^!IfError Skip_1
      ^!Continue ###### File Error! File contains unmatched double quote char,
      Continue or exit?
      ;====================================

      Your pattern, "«[^\r\n,»]*\K,(?=.*»)" is simple, straightforward and pretty
      obvious once seen. However, I don't think I would have gotten there. I was
      so totally focused on jumping over the possible matching pairs without
      commas that I didn't stop to analyze the pattern with commas. Duh! So,
      thank you again. My scalp can heal now.


      Although I fully understand your pattern I do not understand why the
      greedy/not greedy specifications in mine do NOT work. It was my
      understanding that greedy, meant "consume as much as possible that match"
      and non-greedy meant "stop after the first matching pattern". In both cases
      I expected at least one match if allowed by subsequent criteria.

      However, for example,

      given: nnn,«yyyy»,,«x,xxx»,«yyyy»,,
      the pattern: («[^\,]*?»)*.*?\K«(.*?)\,(.*?)»
      matches: «yyyy»,,«x,xxx»

      Why does the "(«[^\,]*?»)*" NOT consume the "«yyyy»" and reset the match
      point past it?

      Further investigation revealed something not right with NTB/RegEx and why I
      was losing so much hair.

      It deserves its own subject line and exposition so see next post.

      Art
    • flo.gehrke
      ... A single «[^ ,]*?» or even «[^,]+» (no need to escape comma in character class) would match that «yyyy» section but your RegEx is demanding
      Message 2 of 6 , Sep 30, 2012
      • 0 Attachment
        --- In ntb-scripts@yahoogroups.com, Art Kocsis <artkns@...> wrote:

        > given: nnn,«yyyy»,,«x,xxx»,«yyyy»,,
        > the pattern: («[^\,]*?»)*.*?\K«(.*?)\,(.*?)»
        > matches: «yyyy»,,«x,xxx»
        >
        > Why does the "(«[^\,]*?»)*" NOT consume the "«yyyy»" and reset the
        > match point past it?

        A single '«[^\,]*?»' or even '«[^,]+»' (no need to escape comma in character class) would match that '«yyyy»' section but your RegEx is demanding more than that.

        In short, the engine starts testing '(«[^\,]*?»)*.*?\K«'. Your subject string, however, starts with 'nnn...'. So the engine doesn't achieve any submatch until it's testing '.*?\K«'. Now backtracking to '.' it matches 'nnn,«' because each character is matched with the dot. So 'nnn,' is not skipped, and it goes on till 'nnn,«yyyy»,,«x,xxx»' is matched in the end.

        BTW, for me, a simple clip like...

        ^!Jump Doc_Start
        :Loop
        ^!Find ""[^"]+"" RS
        ^!IfError End
        ^!IfMatch "[^,]+" "^$GetSelection$" Skip
        ^!InsertText ""^$StrReplace(,;;^$GetSelection$;A)$""
        ^!Goto Loop

        (designed for Ntb 7.0) would perfectly do the job (removing commas between opening and closing brackets) when run against your sample string...

        nnnnnnnnn,"xxxx",,,"ss,ss,",xxx
        nnnnnnnnn,"xx,xx",,"ss,ss",xxx
        nnnnnnnnn,xxxx,,"ss,ss,"xxx

        Even...

        ^!Jump Doc_Start
        ^!Find ""\w+,(\w|,)*"" RS
        ^!IfError End
        ^!InsertText ""^$StrReplace(,;;^$GetSelection$;A)$""
        ^!Goto Skip_-3

        would do it if there were no more variations (?) in the string.

        Members being at war with RegEx will be happy to see that they could even find a solution without any RegEx at all:

        ^!Jump Doc_Start
        :Loop
        ^!Find """ RS
        ^!IfError End
        ^!MoveCursor +1
        ^!Keyboard CTRL+M &50
        ^!IfFalse ^$StrPos(",";"^$GetSelection$";0)$ Skip
        ^!InsertSelect "^$StrReplace(",";"";"^$GetSelection$";A)$"
        ^!Jump Select_End
        ^!Goto Loop

        Regards,
        Flo
      Your message has been successfully submitted and would be delivered to recipients shortly.