Loading ...
Sorry, an error occurred while loading the content.
 

RE: [NTS] Finding Pairwise Matches

Expand Messages
  • John Shotsky
    Art, ;Replace first double quote with opening bracket ^!Replace ^[^ r n ]* K [ ARSW ;Replace next double quote with closing bracket ^!Replace
    Message 1 of 6 , Sep 23, 2012
      Art,
      ;Replace first double quote with opening bracket
      ^!Replace "^[^\r\n\"]*\K\"" >> "[" ARSW
      ;Replace next double quote with closing bracket
      ^!Replace "^[^\r\n\"]*\K\"" >> "]" ARSW
      ;Repeat as long as double quotes exist
      ^!IfError Next Else Skip_-2
      ;Replace commas between opening and closing brackets
      ^!Replace "\[[^\r\n,\]]*\K,(?=.*\])" >> "" ARSW
      ^!IfError Next Else Skip_-1
      ;Change brackets back to double quotes
      ^!Replace "[\[\]]" >> "\"" ARSW
      ^!IfError Next Else Skip_-1
      Should be pretty fast.
      =============================
      nnnnnnnnn,"xxxx",,,"ssss",xxx
      nnnnnnnnn,"xxxx",,"ssss",xxx
      nnnnnnnnn,xxxx,,"ssss"xxx

      Regards,
      John
      RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/

      From: ntb-scripts@yahoogroups.com [mailto:ntb-scripts@yahoogroups.com] On Behalf Of Art Kocsis
      Sent: Sunday, September 23, 2012 02:39
      To: NoteTab-Scripts
      Subject: [NTS] Finding Pairwise Matches


      I am past pulling hair and am now down to scalp and it's getting bloody so
      maybe someone here can help.

      I am trying to replace all commas between matched pairs of double quotes.
      My first cut worked like a champ. However, it also removed the commas from
      between the unmatched pairs of quotes as well. By matched pairs I mean for
      any given line, the quotes are taken in pairs (1&2, 3&4, etc). Text between
      quotes 2&3, 4&5, etc should be ignored. There can be any number of matched
      pairs in a line and any number of commas between any matched or unmatched
      pair of quotes.

      Sample text:
      nnnnnnnnn,"xxxx",,,"ss,ss,",xxx
      nnnnnnnnn,"xx,xx",,"ss,ss",xxx
      nnnnnnnnn,xxxx,,"ss,ss,"xxx

      First clip:
      :Loop1
      ^!Replace "\"(.*?)\,(.*?)\"" >> "\"$1�$2\"" AIRSTW
      ^!IfError Next Else Loop1

      I have tried just about everything to force pair wise matching but to no avail.

      This pattern, (\"[^\"\,]*\") correctly finds all matched quote pairs
      without embedded commas but attempting to use it as a look behind assertion
      has mixed results.

      (\"[^\"\,]*\")+\K. Works fine but lines may not always contain a matching
      pattern
      (\"[^\"\,]*\")*\K. Switching to a "*" quantifier destroys the entire
      assertion pattern.

      Only the "." matches. Why is the pattern/quantifier not greedy? The default
      is supposed to be greedy.

      Using the two look behind assertions:

      (\"[^\"\,]*\")+?.*?\K(\"(.*?)\,(.*?)\")

      Correctly finds all look behind assertions but skips lines like line#2

      (\"[^\"\,]*\")*?.*?\K(\"(.*?)\,(.*?)\")

      Again switching quantifiers allows the RegEx engine to take the zero
      instance option just use the pattern after the \K. It incorrectly matches
      the first three quotes. Again, why is the assertion not greedy?

      My eyes and head are going in circles. Any help would be appreciated. How
      an I force RegEx to use the assertion pattern when it exists?

      Namaste', Art



      [Non-text portions of this message have been removed]
    • Art Kocsis
      ... Thank you, Axel, for responding. Although I am not sure exactly what you had in mind using arrays I know there are many ways using clip commands to parse
      Message 2 of 6 , Sep 27, 2012
        At 9/23/2012 03:19 AM, Axel wrote:
        >Art Kocsis wrote:
        > > I am trying to replace all commas between matched pairs of double quotes.
        >I other words all those inside quotes. I have not tried anything, but I
        >believe ^!SetArray deals with this problem correctly, so you can then
        >then delete inside the array elements. Two nested loops, won't be
        >blindingly fast.

        Thank you, Axel, for responding.

        Although I am not sure exactly what you had in mind using arrays I know
        there are many ways using clip commands to parse the lines for the matching
        double quotes. However, my goal and desire was to do the parsing,
        substitution and removals just using RegEx. Speed is not an issue but
        elegance, compactness and maintaining/expanding my skills in RegEx is.

        Art
      • Art Kocsis
        ... Thank you John for your response. It does seem to work fine. I made some slight changes to your suggestion: Used left & right chevrons as the [] chars
        Message 3 of 6 , Sep 27, 2012
          >From: ntb-scripts@yahoogroups.com [mailto:ntb-scripts@yahoogroups.com] On
          >Behalf Of Art Kocsis
          >Sent: Sunday, September 23, 2012 02:39
          >To: NoteTab-Scripts
          >Subject: [NTS] Finding Pairwise Matches
          >I am past pulling hair and am now down to scalp and it's getting bloody so
          >maybe someone here can help.
          >Art,
          >
          At 9/23/2012 06:05 AM, Jophn wrote:
          >;Replace first double quote with opening bracket
          >^!Replace "^[^\r\n\"]*\K\"" >> "[" ARSW
          >;Replace next double quote with closing bracket
          >^!Replace "^[^\r\n\"]*\K\"" >> "]" ARSW
          >;Repeat as long as double quotes exist
          >^!IfError Next Else Skip_-2
          >;Replace commas between opening and closing brackets
          >^!Replace "\[[^\r\n,\]]*\K,(?=.*\])" >> "" ARSW
          >^!IfError Next Else Skip_-1
          >;Change brackets back to double quotes
          >^!Replace "[\[\]]" >> "\"" ARSW
          >^!IfError Next Else Skip_-1
          >Should be pretty fast.
          >=============================
          >nnnn,"xx,xx",,ss,ss,ssss
          >nnnn,"xx,xx",,"xx,,xx",ssss
          >nnnn,"yyyy",,,"yyyy",ssss
          >nnnn,"yyyy",,"yyyy",,,"xx,xx,",sss
          >nnnn,"yyyy",,"x,xxx","yyyy",,,"xx,xx,",sss
          >nnnn,sssss,,"xx,xx,",sss,"yyyy",
          >nnnn,sssss,,,"yyyy","xx,xx,",sss

          Thank you John for your response. It does seem to work fine.
          I made some slight changes to your suggestion:
          Used left & right chevrons as the [] chars could appear in the text
          Used a RegEx pattern to change only matched pair of quotes to chevrons
          as your separate commands would also change orphaned quotes

          So the essence of the clip is elegant, compact and entirely RegEx - just
          what I wanted (earlier tests for uniqueness of temp char & chevrons not shown):
          ;====================================
          ^!Set %tc%=§
          ;First replace all matching double quote pairs with left & right chevron
          pairs (« and »)
          ^!Replace "^.*?\K\"(.*?)\"" >> "«$1»" AIRSTW
          ^!IfError Next Else Skip_-1

          ;Next, replace all embedded commas between chevron pairs with the unique
          temp char
          ^!Replace "«[^\r\n,»]*\K,(?=.*»)" >> "^%tc%" AIRSTW
          ^!IfError Next Else Skip_-1

          ;Next, delete all left & right chevrons
          ^!Replace "«" >> "" AIRSTW
          ^!Replace "»" >> "" AIRSTW

          ;Finally, check if there were any unmatched double quote chars remaining
          ^!Find "\"" AIRSTW
          ^!IfError Skip_1
          ^!Continue ###### File Error! File contains unmatched double quote char,
          Continue or exit?
          ;====================================

          Your pattern, "«[^\r\n,»]*\K,(?=.*»)" is simple, straightforward and pretty
          obvious once seen. However, I don't think I would have gotten there. I was
          so totally focused on jumping over the possible matching pairs without
          commas that I didn't stop to analyze the pattern with commas. Duh! So,
          thank you again. My scalp can heal now.


          Although I fully understand your pattern I do not understand why the
          greedy/not greedy specifications in mine do NOT work. It was my
          understanding that greedy, meant "consume as much as possible that match"
          and non-greedy meant "stop after the first matching pattern". In both cases
          I expected at least one match if allowed by subsequent criteria.

          However, for example,

          given: nnn,«yyyy»,,«x,xxx»,«yyyy»,,
          the pattern: («[^\,]*?»)*.*?\K«(.*?)\,(.*?)»
          matches: «yyyy»,,«x,xxx»

          Why does the "(«[^\,]*?»)*" NOT consume the "«yyyy»" and reset the match
          point past it?

          Further investigation revealed something not right with NTB/RegEx and why I
          was losing so much hair.

          It deserves its own subject line and exposition so see next post.

          Art
        • flo.gehrke
          ... A single «[^ ,]*?» or even «[^,]+» (no need to escape comma in character class) would match that «yyyy» section but your RegEx is demanding
          Message 4 of 6 , Sep 30, 2012
            --- In ntb-scripts@yahoogroups.com, Art Kocsis <artkns@...> wrote:

            > given: nnn,«yyyy»,,«x,xxx»,«yyyy»,,
            > the pattern: («[^\,]*?»)*.*?\K«(.*?)\,(.*?)»
            > matches: «yyyy»,,«x,xxx»
            >
            > Why does the "(«[^\,]*?»)*" NOT consume the "«yyyy»" and reset the
            > match point past it?

            A single '«[^\,]*?»' or even '«[^,]+»' (no need to escape comma in character class) would match that '«yyyy»' section but your RegEx is demanding more than that.

            In short, the engine starts testing '(«[^\,]*?»)*.*?\K«'. Your subject string, however, starts with 'nnn...'. So the engine doesn't achieve any submatch until it's testing '.*?\K«'. Now backtracking to '.' it matches 'nnn,«' because each character is matched with the dot. So 'nnn,' is not skipped, and it goes on till 'nnn,«yyyy»,,«x,xxx»' is matched in the end.

            BTW, for me, a simple clip like...

            ^!Jump Doc_Start
            :Loop
            ^!Find ""[^"]+"" RS
            ^!IfError End
            ^!IfMatch "[^,]+" "^$GetSelection$" Skip
            ^!InsertText ""^$StrReplace(,;;^$GetSelection$;A)$""
            ^!Goto Loop

            (designed for Ntb 7.0) would perfectly do the job (removing commas between opening and closing brackets) when run against your sample string...

            nnnnnnnnn,"xxxx",,,"ss,ss,",xxx
            nnnnnnnnn,"xx,xx",,"ss,ss",xxx
            nnnnnnnnn,xxxx,,"ss,ss,"xxx

            Even...

            ^!Jump Doc_Start
            ^!Find ""\w+,(\w|,)*"" RS
            ^!IfError End
            ^!InsertText ""^$StrReplace(,;;^$GetSelection$;A)$""
            ^!Goto Skip_-3

            would do it if there were no more variations (?) in the string.

            Members being at war with RegEx will be happy to see that they could even find a solution without any RegEx at all:

            ^!Jump Doc_Start
            :Loop
            ^!Find """ RS
            ^!IfError End
            ^!MoveCursor +1
            ^!Keyboard CTRL+M &50
            ^!IfFalse ^$StrPos(",";"^$GetSelection$";0)$ Skip
            ^!InsertSelect "^$StrReplace(",";"";"^$GetSelection$";A)$"
            ^!Jump Select_End
            ^!Goto Loop

            Regards,
            Flo
          Your message has been successfully submitted and would be delivered to recipients shortly.