Loading ...
Sorry, an error occurred while loading the content.
 

Help with Regex Clip Code to find the following pattern....

Expand Messages
  • Paul
    Here s the three general conditions. I want to replace spott%1% pink with spotted pink and spotty pink. Possible formats: {brown|spott%1% pink|grey}
    Message 1 of 9 , Oct 9, 2010
      Here's the three general conditions. I want to replace "spott%1% pink" with spotted pink and spotty pink.

      Possible formats:
      {brown|spott%1% pink|grey}
      {spott%1% pink|brown|grey}
      {brown|grey|spott%1% pink}
      {spott%1% pink}

      Regex for the replace code is what, can you 'spott' it for me? ;)
      Here's my starting point:

      ;Replace Bracketed Search Token, ST, with Regex
      ^!Set %ST%="\%d+\%"
      ^!Replace "^%ST%" >> "Something Else"


      Building the regex
      want to capture numbered sub-patterns:
      {(.*)|\|(.*) ... delimited by { or |

      want to lookahead for end bracket or pipe
      (.*)(?=}|\|) ... don't include char in find string

      I have this so far and it doesn't work:
      ^!Set %ST%="{(.*)|\|(.*)\%d+\%(.*)(?=}|\|)"

      Suppose I can search for \%d\% ** if and only if ** it is contained within curly brackets, then backtrack to an earlier instance of either a { or |.

      Given the following regex finds the innermost pair of curly brackets:
      ^!Find "{[^{}]++}" WRS
      ..does this make it easier to build the final expression to satisfy the requirement the %1% token is bound by brackets?

      Or perhaps I Find the innermost brackets and then ^!GetMatchAll for the sub-pattern... arghh!! i'm done for the moment.. help much appreciated! :)
    • Paul
      Ok so i figured out some regex so I ve solved my earlier problem however, why do I get an error when I try to use a POSIX character class like [:alnum:]? Also,
      Message 2 of 9 , Oct 10, 2010
        Ok so i figured out some regex so I've solved my earlier problem however, why do I get an error when I try to use a POSIX character class like [:alnum:]?

        Also, a) can the regex (below) be streamlined and b) is it worth doing (speed/performance/reliability/readability)?


        > Possible search scenarios:
        > {brown|a spott%1% pink|grey}
        > {a spott%1% pink|brown|grey}
        > {brown|grey|a spott%1% pink}
        > {a spott%1% pink}

        (I added "a " to the start of the search term because I realised it was a potential alternative and included it in the search string by adding \s....)

        The following regex works
        :Find text containing tokenised number
        ;^!Find "(?<={|\|)([a-zA-Z0-9!-\)\s]*)(%\d+%)([a-zA-Z0-9!-\)\s]*)" R

        :Replace function using above regex
        ^!Replace "(?<={|\|)([a-zA-Z0-9!-\)\s]*)(%\d+%)([a-zA-Z0-9!-\)\s]*)" >> *$1 $2 $3* RWA

        Refined the regex using negative character classes:
        ^!Replace "(?<={|\|)([^{|%]*)(%\d+%)([^}|]*)" >> *$1 $2 $3* RWA


        QUESTION: Instead of [a-zA-Z0-9!-\)]*
        why won't ntp let me use [:alnum:]* ?????

        Regards,
        Paul
      • John Shotsky
        Posix: Use double brackets: [[:allnum:]] All letters, both cases, including accented ones: p{L} All numbers: d. Everything else: D All letters and numbers,
        Message 3 of 9 , Oct 10, 2010
          Posix: Use double brackets: [[:allnum:]]

          All letters, both cases, including accented ones: \p{L}
          All numbers: \d. Everything else: \D
          All letters and numbers, plus underscore: \w. Everything else: \W
          Your non-alphanumeric characters will need to be explicit, unless you include all of them.

          Negated classes may prove troublesome unless you include \r\n in them, as they blow right past
          paragraphs otherwise. That could make for some very interesting alternatives.

          Regards,
          John

          From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Paul
          Sent: Sunday, October 10, 2010 05:28
          To: ntb-clips@yahoogroups.com
          Subject: [Clip] Re: Help with Regex Clip Code UPDATE


          Ok so i figured out some regex so I've solved my earlier problem however, why do I get an error when
          I try to use a POSIX character class like [:alnum:]?

          Also, a) can the regex (below) be streamlined and b) is it worth doing
          (speed/performance/reliability/readability)?

          > Possible search scenarios:
          > {brown|a spott%1% pink|grey}
          > {a spott%1% pink|brown|grey}
          > {brown|grey|a spott%1% pink}
          > {a spott%1% pink}

          (I added "a " to the start of the search term because I realised it was a potential alternative and
          included it in the search string by adding \s....)

          The following regex works
          :Find text containing tokenised number
          ;^!Find "(?<={|\|)([a-zA-Z0-9!-\)\s]*)(%\d+%)([a-zA-Z0-9!-\)\s]*)" R

          :Replace function using above regex
          ^!Replace "(?<={|\|)([a-zA-Z0-9!-\)\s]*)(%\d+%)([a-zA-Z0-9!-\)\s]*)" >> *$1 $2 $3* RWA

          Refined the regex using negative character classes:
          ^!Replace "(?<={|\|)([^{|%]*)(%\d+%)([^}|]*)" >> *$1 $2 $3* RWA

          QUESTION: Instead of [a-zA-Z0-9!-\)]*
          why won't ntp let me use [:alnum:]* ?????

          Regards,
          Paul



          [Non-text portions of this message have been removed]
        • Paul
          Thanks John, The -ve classes seem to be working well for now but thanks for the posix notation. Paul
          Message 4 of 9 , Oct 10, 2010
            Thanks John,
            The -ve classes seem to be working well for now but thanks for the posix notation.
            Paul

            --- In ntb-clips@yahoogroups.com, "John Shotsky" <jshotsky@...> wrote:
            >
            > Posix: Use double brackets: [[:allnum:]]
            >
            > All letters, both cases, including accented ones: \p{L}
            > All numbers: \d. Everything else: \D
            > All letters and numbers, plus underscore: \w. Everything else: \W
            > Your non-alphanumeric characters will need to be explicit, unless you include all of them.
            >
            > Negated classes may prove troublesome unless you include \r\n in them, as they blow right past
            > paragraphs otherwise. That could make for some very interesting alternatives.
            >
            > Regards,
            > John
            >
            > From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Paul
            > Sent: Sunday, October 10, 2010 05:28
            > To: ntb-clips@yahoogroups.com
            > Subject: [Clip] Re: Help with Regex Clip Code UPDATE
            >
            >
            > Ok so i figured out some regex so I've solved my earlier problem however, why do I get an error when
            > I try to use a POSIX character class like [:alnum:]?
            >
            > Also, a) can the regex (below) be streamlined and b) is it worth doing
            > (speed/performance/reliability/readability)?
            >
            > > Possible search scenarios:
            > > {brown|a spott%1% pink|grey}
            > > {a spott%1% pink|brown|grey}
            > > {brown|grey|a spott%1% pink}
            > > {a spott%1% pink}
            >
            > (I added "a " to the start of the search term because I realised it was a potential alternative and
            > included it in the search string by adding \s....)
            >
            > The following regex works
            > :Find text containing tokenised number
            > ;^!Find "(?<={|\|)([a-zA-Z0-9!-\)\s]*)(%\d+%)([a-zA-Z0-9!-\)\s]*)" R
            >
            > :Replace function using above regex
            > ^!Replace "(?<={|\|)([a-zA-Z0-9!-\)\s]*)(%\d+%)([a-zA-Z0-9!-\)\s]*)" >> *$1 $2 $3* RWA
            >
            > Refined the regex using negative character classes:
            > ^!Replace "(?<={|\|)([^{|%]*)(%\d+%)([^}|]*)" >> *$1 $2 $3* RWA
            >
            > QUESTION: Instead of [a-zA-Z0-9!-\)]*
            > why won't ntp let me use [:alnum:]* ?????
            >
            > Regards,
            > Paul
            >
            >
            >
            > [Non-text portions of this message have been removed]
            >
          • diodeom
            ... Are you trying to get from {brown|spott%1% pink|grey} to {brown|spott{y|ed} pink|grey} ? To evoke help, not headaches :), it always makes sense to post
            Message 5 of 9 , Oct 10, 2010
              "Paul" <xboa721@...> wrote:
              >
              > Here's the three general conditions. I want to replace "spott%1% pink" with spotted pink and spotty pink.
              >
              > Possible formats:
              > {brown|spott%1% pink|grey}
              > {spott%1% pink|brown|grey}
              > {brown|grey|spott%1% pink}
              > {spott%1% pink}
              >
              > Regex for the replace code is what, can you 'spott' it for me? ;)
              > Here's my starting point:
              >

              Are you trying to get from "{brown|spott%1% pink|grey}" to "{brown|spott{y|ed} pink|grey}"?

              To evoke help, not headaches :), it always makes sense to post your exact desired outcome below accurately represented initial state (and its context if relevant to the sought-after pattern).
            • Paul
              ok. sorry for the ambiguity, thought i was being explict. I m getting from: original: {brown|spott{y|ed} pink|grey} found innermost bracket then (y|ed) read
              Message 6 of 9 , Oct 10, 2010
                ok. sorry for the ambiguity, thought i was being explict.

                I'm getting from:
                original: {brown|spott{y|ed} pink|grey}

                found innermost bracket then
                (y|ed) read into an array using GetDocMatchAll

                and replaced innermost bracket with indexed token.
                "spott%1% pink"

                the next phase expands the nested alternatives into:
                "spotty pink|spotted pink"

                The process repeats recursively until all curly brackets terms are simple {one level|deep|separate only by single|pipes}

                Then I can numerically address each token to generate the desired combination/permutation.

                That's a helluva mouthful so here's a snippet of code that should do that.. so far.

                Test Sample:
                The quick|fast|slow} {brown|spott %1% sdf|grey} fox {{*jumped %1% *|hopped} over the {lazy|bone idle} dog|swam around the lake}.

                Code:
                ^!SetListDelimiter ";"
                :Refined above regex using negative char class
                ^!Find (?<={|\|)([^{|%]*)(%\d+%)([^}|]*) RW
                ^!SetArray %Part%=^$GetReSubStrings$
                ^!Info ^$GetReSubStrings$
                ^!Info Part0=^%Part0%^pPart1=%Part1%^pPart2=%Part2%^pPart3=%Part3%

                Note, the info's output don't match.. do i have what i want already? how do i reference the array contents to match the first info ^!GetReSubString$????

                p.s. don't care about the actual data in the test sample.. just that the array will contain "spott ";"%1%";" sdf";

                hope that clarifies the mud!!
                :) Cheers
              • Paul
                Coded Thus Far: To be as explicit as possible, here s the code. Should be easy enough to follow and execute: ;Use %Index% to ennumerate bracketed options
                Message 7 of 9 , Oct 10, 2010
                  Coded Thus Far:
                  To be as explicit as possible, here's the code. Should be easy enough to follow and execute:

                  ;Use %Index% to ennumerate bracketed options
                  ^!Set %Index%=0

                  :ProcessCurlyBrackets
                  ;Replace text in an innermost bracket with
                  ;randomly selected synonym.
                  ^!Continue [ProcessCurlyBrackets Start?]

                  ;Increment the nest index counter
                  ^!Inc %Index%
                  ;Locate a pair of solo or innermost brackets
                  ^!Find "{[^{}]++}" WRS
                  ;Leave the loop if no more brackets found
                  ^!IfError Permutate
                  ;Get an array of choices from selection
                  ;Assign to a indexed array
                  ^!SetArray %Nest^%Index%a%=^$GetDocMatchAll([^|}{]++)$
                  ;Replace selection with indexed token
                  ^!InsertText "%^%Index%%"

                  :Process nested index tokens
                  ^!Continue [Process nested index tokens?]
                  ;Reset synonym index variable, %Opt%
                  ^!Set %Opt%=1
                  ;Program Bracketed Search Token, ST, with Regex
                  ^!Set %ST%="(?<={|\|)([^{|%]*)(%\d+%)([^}|]*)"
                  ;Replace first tokenised synonym with expansion
                  ^!Find "^%ST%" WR
                  ;Split tokenised synonym into its compoents parts
                  ^!SetArray %Part%=^$GetDocMatchAll(**********)$
                  ; Code ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

                  ;code fragment...
                  ;"$1^%Nest^%Index%a^%Opt%%$2"
                  ;Check if no nested tokens exist
                  ^!IfError ProcessCurlyBrackets
                  :IterateExpand
                  ;Move pointer to next synonym
                  ^!Inc %Opt%
                  ;Loop until all array options expanded
                  ^!If ^%Opt%>^%Nest^%Index%a0% ProcessCurlyBrackets
                  ;^!Info Opt=^%Opt%, Index=^%Index%
                  ;^!Continue [Do You Want To Proceed?]
                  ^!InsertText "|$1^%Nest^%Index%a^%Opt%%$2"
                  ^!Goto IterateExpand

                  ;Look for another set from the very top
                  ^!Goto ProcessCurlyBrackets


                  :Permutate
                  :End



                  And the development code I used to work out the regex. Copy into a 'test clip' to spot highlighting. Any feedback welcome here:

                  :Finds between curly brackets and pipe
                  :no starting character to control behaviour
                  ;following line grabs between brackets or pipe
                  ;also selects after close bracket and next bracket
                  ;^!Find "[^{}|]++" R

                  :Finds Percentage Tokenised numbers
                  ;next refinement
                  ;^!Find %\d+% R

                  : Finds starting delimiter
                  ;({+|\|)

                  :Finds terms between curly brackets and pipe
                  ;^!Replace "({+|\|)([^}{|]++)" >> "*$2*" R

                  :Next refinement works!!
                  ;^!Replace "({+|\|)([a-zA-Z0-9!-\)]*)(%\d+%)([a-zA-Z0-9!-\)]*)" >> "*$2 $3 $4*" R

                  :Find text containing tokenised number
                  ;the following works
                  ;^!Find "(?<={|\|)([a-zA-Z0-9!-\)\s]*)(%\d+%)([a-zA-Z0-9!-\)\s]*)" R

                  :Replace using above regex
                  ;the following works
                  ;^!Replace "(?<={|\|)([a-zA-Z0-9!-\)\s]*)(%\d+%)([a-zA-Z0-9!-\)\s]*)" >> *$1 $2 $3* RWA
                  ;

                  ^!SetListDelimiter ";"
                  :Refined above regex using negative char class
                  ;need to exclude {}| from the character class set
                  ^!Find (?<={|\|)([^{|%]*)(%\d+%)([^}|]*) RW
                  ^!SetArray %Part%=^$GetReSubStrings$

                  :Grab component parts
                  ^!Info ^$GetReSubStrings$
                  ;^!SetArray %Part1%=^$GetDocMatchAll(([^{]*)(?=%\d+%))$
                  ;^!SetArray %Part2%=^$GetDocMatchAll("(?=%\d+%)(.*)";1)$
                  ^!Info Part0=^%Part0%^pPart1=%Part1%^pPart2=%Part2%^pPart3=%Part3%

                  :Following eats the last curly bracket or pipe
                  ;(}|\|)

                  ;^!Set %ST%="({|\|)(.*)\%d+\%(.*)"
                  ;(?=<({|\|))

                  NOTE: This is just a development scratchpad; thought it may help you understand what i'm trying to achieve. :)
                  Paul
                  p.s. Useful test text snippet:The quick|fast|slow} {brown|spott{ed|y} pink|grey} fox {{jumped|hopped} over the {lazy|bone idle} dog|swam around the lake}.

                  NOTE: the first curly bracket that *should* appear before the quick is removed so the test code works on the second nested bracket.

                  The result should be programmatically resolved to:
                  The {quick|fast|slow} {brown|spotted pink|spotty pink|grey} fox {{jumped|hopped} over the {lazy|bone idle} dog|swam around the lake}.

                  Which on inspection can on a first iteration be written as:
                  The quick brown fox jumped over the lazy dog.

                  A later combination can be written as:
                  The slow spotty pink fox swam around the lake.

                  Isn't english a beautiful language!!
                  :)
                  Regards,
                  Paul


                  --- In ntb-clips@yahoogroups.com, "diodeom" <diomir@...> wrote:
                  >
                  > "Paul" <xboa721@> wrote:
                  > >
                  > > Here's the three general conditions. I want to replace "spott%1% pink" with spotted pink and spotty pink.
                  > >
                  > > Possible formats:
                  > > {brown|spott%1% pink|grey}
                  > > {spott%1% pink|brown|grey}
                  > > {brown|grey|spott%1% pink}
                  > > {spott%1% pink}
                  > >
                  > > Regex for the replace code is what, can you 'spott' it for me? ;)
                  > > Here's my starting point:
                  > >
                  >
                  > Are you trying to get from "{brown|spott%1% pink|grey}" to "{brown|spott{y|ed} pink|grey}"?
                  >
                  > To evoke help, not headaches :), it always makes sense to post your exact desired outcome below accurately represented initial state (and its context if relevant to the sought-after pattern).
                  >
                • diodeom
                  ... Here s a possible take: ;Locate solo nested brace with its preceding and following stuff ... ^!Find (?
                  Message 8 of 9 , Oct 10, 2010
                    "Paul" <xboa721@...> wrote:
                    >
                    > Useful test text snippet:The quick|fast|slow} {brown|spott{ed|y} pink|grey} fox {{jumped|hopped} over the {lazy|bone idle} dog|swam around the lake}.
                    >
                    > NOTE: the first curly bracket that *should* appear before the quick is removed so the test code works on the second nested bracket.
                    >
                    > The result should be programmatically resolved to:
                    > The {quick|fast|slow} {brown|spotted pink|spotty pink|grey} fox {{jumped|hopped} over the {lazy|bone idle} dog|swam around the lake}.
                    >

                    Here's a possible take:

                    ;Locate solo nested brace with its preceding and following stuff
                    :Brace
                    ^!Find "(?<={|\|)([\w\40]*+){([\w\40|]++)}((?1))(?=\||})" WRS
                    ^!IfError End
                    ^!SetArray %brace%=^$GetReSubStrings$
                    ;Get alternatives (at ^%brace2%)
                    ^!SetArray %alt%=^$StrReplace(|;";";^%brace2%;0;0)$
                    ^!Set %i%=0
                    :Alt
                    ^!Inc %i%
                    ^!If ^%i%>^%alt0% Repl
                    ;Edit the replacement string
                    ^!Append %repl%=^%brace1%^%alt^%i%%^%brace3%|
                    ^!Goto Alt
                    :Repl
                    ;Paste fixed alternatives and look for another case
                    ^!Set %repl%=^$StrDeleteRight(^%repl%;1)$
                    ^!InsertText ^%repl%
                    ^!Set %repl%=
                    ^!Goto Brace
                  • Paul
                    Now that looks more like familiar procedural code. Thankyou, will be back on this soon. Paul
                    Message 9 of 9 , Oct 11, 2010
                      Now that looks more like familiar procedural code. Thankyou, will be back on this soon.
                      Paul

                      > Here's a possible take:
                      >
                      > ;Locate solo nested brace with its preceding and following stuff
                      > :Brace
                      > ^!Find "(?<={|\|)([\w\40]*+){([\w\40|]++)}((?1))(?=\||})" WRS
                      > ^!IfError End
                      > ^!SetArray %brace%=^$GetReSubStrings$
                      > ;Get alternatives (at ^%brace2%)
                      > ^!SetArray %alt%=^$StrReplace(|;";";^%brace2%;0;0)$
                      > ^!Set %i%=0
                      > :Alt
                      > ^!Inc %i%
                      > ^!If ^%i%>^%alt0% Repl
                      > ;Edit the replacement string
                      > ^!Append %repl%=^%brace1%^%alt^%i%%^%brace3%|
                      > ^!Goto Alt
                      > :Repl
                      > ;Paste fixed alternatives and look for another case
                      > ^!Set %repl%=^$StrDeleteRight(^%repl%;1)$
                      > ^!InsertText ^%repl%
                      > ^!Set %repl%=
                      > ^!Goto Brace
                      >
                    Your message has been successfully submitted and would be delivered to recipients shortly.