Loading ...
Sorry, an error occurred while loading the content.

Finding Pairwise Matches

Expand Messages
  • Art Kocsis
    I am past pulling hair and am now down to scalp and it s getting bloody so maybe someone here can help. I am trying to replace all commas between matched pairs
    Message 1 of 6 , Sep 23, 2012
    • 0 Attachment
      I am past pulling hair and am now down to scalp and it's getting bloody so
      maybe someone here can help.

      I am trying to replace all commas between matched pairs of double quotes.
      My first cut worked like a champ. However, it also removed the commas from
      between the unmatched pairs of quotes as well. By matched pairs I mean for
      any given line, the quotes are taken in pairs (1&2, 3&4, etc). Text between
      quotes 2&3, 4&5, etc should be ignored. There can be any number of matched
      pairs in a line and any number of commas between any matched or unmatched
      pair of quotes.

      Sample text:
      nnnnnnnnn,"xxxx",,,"ss,ss,",xxx
      nnnnnnnnn,"xx,xx",,"ss,ss",xxx
      nnnnnnnnn,xxxx,,"ss,ss,"xxx

      First clip:
      :Loop1
      ^!Replace "\"(.*?)\,(.*?)\"" >> "\"$1§$2\"" AIRSTW
      ^!IfError Next Else Loop1

      I have tried just about everything to force pair wise matching but to no avail.

      This pattern, (\"[^\"\,]*\") correctly finds all matched quote pairs
      without embedded commas but attempting to use it as a look behind assertion
      has mixed results.

      (\"[^\"\,]*\")+\K. Works fine but lines may not always contain a matching
      pattern
      (\"[^\"\,]*\")*\K. Switching to a "*" quantifier destroys the entire
      assertion pattern.

      Only the "." matches. Why is the pattern/quantifier not greedy? The default
      is supposed to be greedy.

      Using the two look behind assertions:

      (\"[^\"\,]*\")+?.*?\K(\"(.*?)\,(.*?)\")

      Correctly finds all look behind assertions but skips lines like line#2

      (\"[^\"\,]*\")*?.*?\K(\"(.*?)\,(.*?)\")

      Again switching quantifiers allows the RegEx engine to take the zero
      instance option just use the pattern after the \K. It incorrectly matches
      the first three quotes. Again, why is the assertion not greedy?

      My eyes and head are going in circles. Any help would be appreciated. How
      an I force RegEx to use the assertion pattern when it exists?

      Namaste', Art
    • Axel Berger
      ... I other words all those inside quotes. I have not tried anything, but I believe ^!SetArray deals with this problem correctly, so you can then then delete
      Message 2 of 6 , Sep 23, 2012
      • 0 Attachment
        Art Kocsis wrote:
        > I am trying to replace all commas between matched pairs of double quotes.

        I other words all those inside quotes. I have not tried anything, but I
        believe ^!SetArray deals with this problem correctly, so you can then
        then delete inside the array elements. Two nested loops, won't be
        blindingly fast.

        Axel
      • John Shotsky
        Art, ;Replace first double quote with opening bracket ^!Replace ^[^ r n ]* K [ ARSW ;Replace next double quote with closing bracket ^!Replace
        Message 3 of 6 , Sep 23, 2012
        • 0 Attachment
          Art,
          ;Replace first double quote with opening bracket
          ^!Replace "^[^\r\n\"]*\K\"" >> "[" ARSW
          ;Replace next double quote with closing bracket
          ^!Replace "^[^\r\n\"]*\K\"" >> "]" ARSW
          ;Repeat as long as double quotes exist
          ^!IfError Next Else Skip_-2
          ;Replace commas between opening and closing brackets
          ^!Replace "\[[^\r\n,\]]*\K,(?=.*\])" >> "" ARSW
          ^!IfError Next Else Skip_-1
          ;Change brackets back to double quotes
          ^!Replace "[\[\]]" >> "\"" ARSW
          ^!IfError Next Else Skip_-1
          Should be pretty fast.
          =============================
          nnnnnnnnn,"xxxx",,,"ssss",xxx
          nnnnnnnnn,"xxxx",,"ssss",xxx
          nnnnnnnnn,xxxx,,"ssss"xxx

          Regards,
          John
          RecipeTools Web Site: <http://recipetools.gotdns.com/> http://recipetools.gotdns.com/

          From: ntb-scripts@yahoogroups.com [mailto:ntb-scripts@yahoogroups.com] On Behalf Of Art Kocsis
          Sent: Sunday, September 23, 2012 02:39
          To: NoteTab-Scripts
          Subject: [NTS] Finding Pairwise Matches


          I am past pulling hair and am now down to scalp and it's getting bloody so
          maybe someone here can help.

          I am trying to replace all commas between matched pairs of double quotes.
          My first cut worked like a champ. However, it also removed the commas from
          between the unmatched pairs of quotes as well. By matched pairs I mean for
          any given line, the quotes are taken in pairs (1&2, 3&4, etc). Text between
          quotes 2&3, 4&5, etc should be ignored. There can be any number of matched
          pairs in a line and any number of commas between any matched or unmatched
          pair of quotes.

          Sample text:
          nnnnnnnnn,"xxxx",,,"ss,ss,",xxx
          nnnnnnnnn,"xx,xx",,"ss,ss",xxx
          nnnnnnnnn,xxxx,,"ss,ss,"xxx

          First clip:
          :Loop1
          ^!Replace "\"(.*?)\,(.*?)\"" >> "\"$1�$2\"" AIRSTW
          ^!IfError Next Else Loop1

          I have tried just about everything to force pair wise matching but to no avail.

          This pattern, (\"[^\"\,]*\") correctly finds all matched quote pairs
          without embedded commas but attempting to use it as a look behind assertion
          has mixed results.

          (\"[^\"\,]*\")+\K. Works fine but lines may not always contain a matching
          pattern
          (\"[^\"\,]*\")*\K. Switching to a "*" quantifier destroys the entire
          assertion pattern.

          Only the "." matches. Why is the pattern/quantifier not greedy? The default
          is supposed to be greedy.

          Using the two look behind assertions:

          (\"[^\"\,]*\")+?.*?\K(\"(.*?)\,(.*?)\")

          Correctly finds all look behind assertions but skips lines like line#2

          (\"[^\"\,]*\")*?.*?\K(\"(.*?)\,(.*?)\")

          Again switching quantifiers allows the RegEx engine to take the zero
          instance option just use the pattern after the \K. It incorrectly matches
          the first three quotes. Again, why is the assertion not greedy?

          My eyes and head are going in circles. Any help would be appreciated. How
          an I force RegEx to use the assertion pattern when it exists?

          Namaste', Art



          [Non-text portions of this message have been removed]
        • Art Kocsis
          ... Thank you, Axel, for responding. Although I am not sure exactly what you had in mind using arrays I know there are many ways using clip commands to parse
          Message 4 of 6 , Sep 27, 2012
          • 0 Attachment
            At 9/23/2012 03:19 AM, Axel wrote:
            >Art Kocsis wrote:
            > > I am trying to replace all commas between matched pairs of double quotes.
            >I other words all those inside quotes. I have not tried anything, but I
            >believe ^!SetArray deals with this problem correctly, so you can then
            >then delete inside the array elements. Two nested loops, won't be
            >blindingly fast.

            Thank you, Axel, for responding.

            Although I am not sure exactly what you had in mind using arrays I know
            there are many ways using clip commands to parse the lines for the matching
            double quotes. However, my goal and desire was to do the parsing,
            substitution and removals just using RegEx. Speed is not an issue but
            elegance, compactness and maintaining/expanding my skills in RegEx is.

            Art
          • Art Kocsis
            ... Thank you John for your response. It does seem to work fine. I made some slight changes to your suggestion: Used left & right chevrons as the [] chars
            Message 5 of 6 , Sep 27, 2012
            • 0 Attachment
              >From: ntb-scripts@yahoogroups.com [mailto:ntb-scripts@yahoogroups.com] On
              >Behalf Of Art Kocsis
              >Sent: Sunday, September 23, 2012 02:39
              >To: NoteTab-Scripts
              >Subject: [NTS] Finding Pairwise Matches
              >I am past pulling hair and am now down to scalp and it's getting bloody so
              >maybe someone here can help.
              >Art,
              >
              At 9/23/2012 06:05 AM, Jophn wrote:
              >;Replace first double quote with opening bracket
              >^!Replace "^[^\r\n\"]*\K\"" >> "[" ARSW
              >;Replace next double quote with closing bracket
              >^!Replace "^[^\r\n\"]*\K\"" >> "]" ARSW
              >;Repeat as long as double quotes exist
              >^!IfError Next Else Skip_-2
              >;Replace commas between opening and closing brackets
              >^!Replace "\[[^\r\n,\]]*\K,(?=.*\])" >> "" ARSW
              >^!IfError Next Else Skip_-1
              >;Change brackets back to double quotes
              >^!Replace "[\[\]]" >> "\"" ARSW
              >^!IfError Next Else Skip_-1
              >Should be pretty fast.
              >=============================
              >nnnn,"xx,xx",,ss,ss,ssss
              >nnnn,"xx,xx",,"xx,,xx",ssss
              >nnnn,"yyyy",,,"yyyy",ssss
              >nnnn,"yyyy",,"yyyy",,,"xx,xx,",sss
              >nnnn,"yyyy",,"x,xxx","yyyy",,,"xx,xx,",sss
              >nnnn,sssss,,"xx,xx,",sss,"yyyy",
              >nnnn,sssss,,,"yyyy","xx,xx,",sss

              Thank you John for your response. It does seem to work fine.
              I made some slight changes to your suggestion:
              Used left & right chevrons as the [] chars could appear in the text
              Used a RegEx pattern to change only matched pair of quotes to chevrons
              as your separate commands would also change orphaned quotes

              So the essence of the clip is elegant, compact and entirely RegEx - just
              what I wanted (earlier tests for uniqueness of temp char & chevrons not shown):
              ;====================================
              ^!Set %tc%=§
              ;First replace all matching double quote pairs with left & right chevron
              pairs (« and »)
              ^!Replace "^.*?\K\"(.*?)\"" >> "«$1»" AIRSTW
              ^!IfError Next Else Skip_-1

              ;Next, replace all embedded commas between chevron pairs with the unique
              temp char
              ^!Replace "«[^\r\n,»]*\K,(?=.*»)" >> "^%tc%" AIRSTW
              ^!IfError Next Else Skip_-1

              ;Next, delete all left & right chevrons
              ^!Replace "«" >> "" AIRSTW
              ^!Replace "»" >> "" AIRSTW

              ;Finally, check if there were any unmatched double quote chars remaining
              ^!Find "\"" AIRSTW
              ^!IfError Skip_1
              ^!Continue ###### File Error! File contains unmatched double quote char,
              Continue or exit?
              ;====================================

              Your pattern, "«[^\r\n,»]*\K,(?=.*»)" is simple, straightforward and pretty
              obvious once seen. However, I don't think I would have gotten there. I was
              so totally focused on jumping over the possible matching pairs without
              commas that I didn't stop to analyze the pattern with commas. Duh! So,
              thank you again. My scalp can heal now.


              Although I fully understand your pattern I do not understand why the
              greedy/not greedy specifications in mine do NOT work. It was my
              understanding that greedy, meant "consume as much as possible that match"
              and non-greedy meant "stop after the first matching pattern". In both cases
              I expected at least one match if allowed by subsequent criteria.

              However, for example,

              given: nnn,«yyyy»,,«x,xxx»,«yyyy»,,
              the pattern: («[^\,]*?»)*.*?\K«(.*?)\,(.*?)»
              matches: «yyyy»,,«x,xxx»

              Why does the "(«[^\,]*?»)*" NOT consume the "«yyyy»" and reset the match
              point past it?

              Further investigation revealed something not right with NTB/RegEx and why I
              was losing so much hair.

              It deserves its own subject line and exposition so see next post.

              Art
            • flo.gehrke
              ... A single «[^ ,]*?» or even «[^,]+» (no need to escape comma in character class) would match that «yyyy» section but your RegEx is demanding
              Message 6 of 6 , Sep 30, 2012
              • 0 Attachment
                --- In ntb-scripts@yahoogroups.com, Art Kocsis <artkns@...> wrote:

                > given: nnn,«yyyy»,,«x,xxx»,«yyyy»,,
                > the pattern: («[^\,]*?»)*.*?\K«(.*?)\,(.*?)»
                > matches: «yyyy»,,«x,xxx»
                >
                > Why does the "(«[^\,]*?»)*" NOT consume the "«yyyy»" and reset the
                > match point past it?

                A single '«[^\,]*?»' or even '«[^,]+»' (no need to escape comma in character class) would match that '«yyyy»' section but your RegEx is demanding more than that.

                In short, the engine starts testing '(«[^\,]*?»)*.*?\K«'. Your subject string, however, starts with 'nnn...'. So the engine doesn't achieve any submatch until it's testing '.*?\K«'. Now backtracking to '.' it matches 'nnn,«' because each character is matched with the dot. So 'nnn,' is not skipped, and it goes on till 'nnn,«yyyy»,,«x,xxx»' is matched in the end.

                BTW, for me, a simple clip like...

                ^!Jump Doc_Start
                :Loop
                ^!Find ""[^"]+"" RS
                ^!IfError End
                ^!IfMatch "[^,]+" "^$GetSelection$" Skip
                ^!InsertText ""^$StrReplace(,;;^$GetSelection$;A)$""
                ^!Goto Loop

                (designed for Ntb 7.0) would perfectly do the job (removing commas between opening and closing brackets) when run against your sample string...

                nnnnnnnnn,"xxxx",,,"ss,ss,",xxx
                nnnnnnnnn,"xx,xx",,"ss,ss",xxx
                nnnnnnnnn,xxxx,,"ss,ss,"xxx

                Even...

                ^!Jump Doc_Start
                ^!Find ""\w+,(\w|,)*"" RS
                ^!IfError End
                ^!InsertText ""^$StrReplace(,;;^$GetSelection$;A)$""
                ^!Goto Skip_-3

                would do it if there were no more variations (?) in the string.

                Members being at war with RegEx will be happy to see that they could even find a solution without any RegEx at all:

                ^!Jump Doc_Start
                :Loop
                ^!Find """ RS
                ^!IfError End
                ^!MoveCursor +1
                ^!Keyboard CTRL+M &50
                ^!IfFalse ^$StrPos(",";"^$GetSelection$";0)$ Skip
                ^!InsertSelect "^$StrReplace(",";"";"^$GetSelection$";A)$"
                ^!Jump Select_End
                ^!Goto Loop

                Regards,
                Flo
              Your message has been successfully submitted and would be delivered to recipients shortly.