Loading ...
Sorry, an error occurred while loading the content.

Re: [NTS] Finding Pairwise Matches

Expand Messages
  • Art Kocsis
    ... Thank you, Axel, for responding. Although I am not sure exactly what you had in mind using arrays I know there are many ways using clip commands to parse
    Message 1 of 6 , Sep 27 2:23 PM
    • 0 Attachment
      At 9/23/2012 03:19 AM, Axel wrote:
      >Art Kocsis wrote:
      > > I am trying to replace all commas between matched pairs of double quotes.
      >I other words all those inside quotes. I have not tried anything, but I
      >believe ^!SetArray deals with this problem correctly, so you can then
      >then delete inside the array elements. Two nested loops, won't be
      >blindingly fast.

      Thank you, Axel, for responding.

      Although I am not sure exactly what you had in mind using arrays I know
      there are many ways using clip commands to parse the lines for the matching
      double quotes. However, my goal and desire was to do the parsing,
      substitution and removals just using RegEx. Speed is not an issue but
      elegance, compactness and maintaining/expanding my skills in RegEx is.

      Art
    • Art Kocsis
      ... Thank you John for your response. It does seem to work fine. I made some slight changes to your suggestion: Used left & right chevrons as the [] chars
      Message 2 of 6 , Sep 27 10:48 PM
      • 0 Attachment
        >From: ntb-scripts@yahoogroups.com [mailto:ntb-scripts@yahoogroups.com] On
        >Behalf Of Art Kocsis
        >Sent: Sunday, September 23, 2012 02:39
        >To: NoteTab-Scripts
        >Subject: [NTS] Finding Pairwise Matches
        >I am past pulling hair and am now down to scalp and it's getting bloody so
        >maybe someone here can help.
        >Art,
        >
        At 9/23/2012 06:05 AM, Jophn wrote:
        >;Replace first double quote with opening bracket
        >^!Replace "^[^\r\n\"]*\K\"" >> "[" ARSW
        >;Replace next double quote with closing bracket
        >^!Replace "^[^\r\n\"]*\K\"" >> "]" ARSW
        >;Repeat as long as double quotes exist
        >^!IfError Next Else Skip_-2
        >;Replace commas between opening and closing brackets
        >^!Replace "\[[^\r\n,\]]*\K,(?=.*\])" >> "" ARSW
        >^!IfError Next Else Skip_-1
        >;Change brackets back to double quotes
        >^!Replace "[\[\]]" >> "\"" ARSW
        >^!IfError Next Else Skip_-1
        >Should be pretty fast.
        >=============================
        >nnnn,"xx,xx",,ss,ss,ssss
        >nnnn,"xx,xx",,"xx,,xx",ssss
        >nnnn,"yyyy",,,"yyyy",ssss
        >nnnn,"yyyy",,"yyyy",,,"xx,xx,",sss
        >nnnn,"yyyy",,"x,xxx","yyyy",,,"xx,xx,",sss
        >nnnn,sssss,,"xx,xx,",sss,"yyyy",
        >nnnn,sssss,,,"yyyy","xx,xx,",sss

        Thank you John for your response. It does seem to work fine.
        I made some slight changes to your suggestion:
        Used left & right chevrons as the [] chars could appear in the text
        Used a RegEx pattern to change only matched pair of quotes to chevrons
        as your separate commands would also change orphaned quotes

        So the essence of the clip is elegant, compact and entirely RegEx - just
        what I wanted (earlier tests for uniqueness of temp char & chevrons not shown):
        ;====================================
        ^!Set %tc%=§
        ;First replace all matching double quote pairs with left & right chevron
        pairs (« and »)
        ^!Replace "^.*?\K\"(.*?)\"" >> "«$1»" AIRSTW
        ^!IfError Next Else Skip_-1

        ;Next, replace all embedded commas between chevron pairs with the unique
        temp char
        ^!Replace "«[^\r\n,»]*\K,(?=.*»)" >> "^%tc%" AIRSTW
        ^!IfError Next Else Skip_-1

        ;Next, delete all left & right chevrons
        ^!Replace "«" >> "" AIRSTW
        ^!Replace "»" >> "" AIRSTW

        ;Finally, check if there were any unmatched double quote chars remaining
        ^!Find "\"" AIRSTW
        ^!IfError Skip_1
        ^!Continue ###### File Error! File contains unmatched double quote char,
        Continue or exit?
        ;====================================

        Your pattern, "«[^\r\n,»]*\K,(?=.*»)" is simple, straightforward and pretty
        obvious once seen. However, I don't think I would have gotten there. I was
        so totally focused on jumping over the possible matching pairs without
        commas that I didn't stop to analyze the pattern with commas. Duh! So,
        thank you again. My scalp can heal now.


        Although I fully understand your pattern I do not understand why the
        greedy/not greedy specifications in mine do NOT work. It was my
        understanding that greedy, meant "consume as much as possible that match"
        and non-greedy meant "stop after the first matching pattern". In both cases
        I expected at least one match if allowed by subsequent criteria.

        However, for example,

        given: nnn,«yyyy»,,«x,xxx»,«yyyy»,,
        the pattern: («[^\,]*?»)*.*?\K«(.*?)\,(.*?)»
        matches: «yyyy»,,«x,xxx»

        Why does the "(«[^\,]*?»)*" NOT consume the "«yyyy»" and reset the match
        point past it?

        Further investigation revealed something not right with NTB/RegEx and why I
        was losing so much hair.

        It deserves its own subject line and exposition so see next post.

        Art
      • flo.gehrke
        ... A single «[^ ,]*?» or even «[^,]+» (no need to escape comma in character class) would match that «yyyy» section but your RegEx is demanding
        Message 3 of 6 , Sep 30 8:31 AM
        • 0 Attachment
          --- In ntb-scripts@yahoogroups.com, Art Kocsis <artkns@...> wrote:

          > given: nnn,«yyyy»,,«x,xxx»,«yyyy»,,
          > the pattern: («[^\,]*?»)*.*?\K«(.*?)\,(.*?)»
          > matches: «yyyy»,,«x,xxx»
          >
          > Why does the "(«[^\,]*?»)*" NOT consume the "«yyyy»" and reset the
          > match point past it?

          A single '«[^\,]*?»' or even '«[^,]+»' (no need to escape comma in character class) would match that '«yyyy»' section but your RegEx is demanding more than that.

          In short, the engine starts testing '(«[^\,]*?»)*.*?\K«'. Your subject string, however, starts with 'nnn...'. So the engine doesn't achieve any submatch until it's testing '.*?\K«'. Now backtracking to '.' it matches 'nnn,«' because each character is matched with the dot. So 'nnn,' is not skipped, and it goes on till 'nnn,«yyyy»,,«x,xxx»' is matched in the end.

          BTW, for me, a simple clip like...

          ^!Jump Doc_Start
          :Loop
          ^!Find ""[^"]+"" RS
          ^!IfError End
          ^!IfMatch "[^,]+" "^$GetSelection$" Skip
          ^!InsertText ""^$StrReplace(,;;^$GetSelection$;A)$""
          ^!Goto Loop

          (designed for Ntb 7.0) would perfectly do the job (removing commas between opening and closing brackets) when run against your sample string...

          nnnnnnnnn,"xxxx",,,"ss,ss,",xxx
          nnnnnnnnn,"xx,xx",,"ss,ss",xxx
          nnnnnnnnn,xxxx,,"ss,ss,"xxx

          Even...

          ^!Jump Doc_Start
          ^!Find ""\w+,(\w|,)*"" RS
          ^!IfError End
          ^!InsertText ""^$StrReplace(,;;^$GetSelection$;A)$""
          ^!Goto Skip_-3

          would do it if there were no more variations (?) in the string.

          Members being at war with RegEx will be happy to see that they could even find a solution without any RegEx at all:

          ^!Jump Doc_Start
          :Loop
          ^!Find """ RS
          ^!IfError End
          ^!MoveCursor +1
          ^!Keyboard CTRL+M &50
          ^!IfFalse ^$StrPos(",";"^$GetSelection$";0)$ Skip
          ^!InsertSelect "^$StrReplace(",";"";"^$GetSelection$";A)$"
          ^!Jump Select_End
          ^!Goto Loop

          Regards,
          Flo
        Your message has been successfully submitted and would be delivered to recipients shortly.