Loading ...
Sorry, an error occurred while loading the content.

Help with Regex: named replace expressions

Expand Messages
  • Don - HtmlFixIt.com
    Hi all, I am working with something that would make it ideal if I could use named expressions ... I am looking at notetab help for regular expressions (from
    Message 1 of 4 , Jul 2, 2009
    • 0 Attachment
      Hi all,

      I am working with something that would make it ideal if I could use
      named expressions ... I am looking at notetab help for regular
      expressions (from clip help I clicked on the regular expressions link).

      I find this:
      "Another special case occurs when "$" is followed by a single digit in
      the range of 1 through 65535. In this case the tagged match word found
      by the Find expression is used in the resulting replacement text. If a
      tagged match word for that tag number was not defined, or if the tagged
      match word doesn't match anything, then nothing is output. The tagged
      match words can be used in any order and can be repeated any number of
      times.



      A "$0" appearing in the Replace expression causes all text matched by
      the match expression to be sent to the output. The "$0" can appear in
      the Replace expression as many times as desired."

      I did not know about the zero. But I want to name them. Can someone
      point me to the place in help, or alternatively explain it to me please
      and then given an example. If the latter is the case perhaps we can
      update help to add it the next time we do an upgrade.

      Thanks so much,

      Don
    • Sheri
      ... See message #18422 in the archive (but be aware that Yahoo seems to make it look like there is a backslash where it split a line in the middle of $
      Message 2 of 4 , Jul 2, 2009
      • 0 Attachment
        --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...> wrote:
        >
        > Hi all,
        >
        > I am working with something that would make it ideal if I could use
        > named expressions ... I am looking at notetab help for regular
        > expressions (from clip help I clicked on the regular expressions link).
        >
        > I find this:
        > "Another special case occurs when "$" is followed by a single digit in
        > the range of 1 through 65535. In this case the tagged match word found
        > by the Find expression is used in the resulting replacement text. If a
        > tagged match word for that tag number was not defined, or if the tagged
        > match word doesn't match anything, then nothing is output. The tagged
        > match words can be used in any order and can be repeated any number of
        > times.
        >
        >
        >
        > A "$0" appearing in the Replace expression causes all text matched by
        > the match expression to be sent to the output. The "$0" can appear in
        > the Replace expression as many times as desired."
        >
        > I did not know about the zero. But I want to name them. Can someone
        > point me to the place in help, or alternatively explain it to me please
        > and then given an example. If the latter is the case perhaps we can
        > update help to add it the next time we do an upgrade.
        >
        > Thanks so much,
        >
        > Don
        >

        See message #18422 in the archive (but be aware that Yahoo seems to make it look like there is a backslash where it split a line in the middle of $<first> where there was no backslash).

        Also see "Named Subpatterns" and "Substring Replacements" in regex.chm.

        Regards,
        Sheri
      • Flo
        ... Don, Given a text like... The museum opens Mondays, Fridays, and Sundays. With the clip... ^!Replace (?P Mon|Fri|Sun)days $ AWRS (where
        Message 3 of 4 , Jul 3, 2009
        • 0 Attachment
          --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...> wrote:

          > Can someone point me to the place in help, or alternatively explain
          > it to me please and then given an example.

          Don,

          Given a text like...

          The museum opens Mondays, Fridays, and Sundays.

          With the clip...

          ^!Replace "(?P<open>Mon|Fri|Sun)days" >> "$<open>" AWRS

          (where "open" is the name of the subpattern) the days will be shortened to "Mon, Fri, Sun".

          If the named subpattern is used multiple times you will have to add the Duplicate Names Option (?J). The sample text is:

          Sunday, Monday, Tuesday too.
          Wednesday, Thursday just for you.
          Friday, Saturday that's the end.
          Now let's say those days again!

          In order to shorten all days, you may run (in one long line!)...

          ^!Replace "(?J)(?P<name>Mon|Fri|Sun)day|(?P<name>Tue)sday|(?P<name>Wed)nesday|(?P<name>Thu)rsday|(?P<name>Sat)urday" >> "$<name>" AWRS

          Regards,
          Flo
        • Don - HtmlFixIt.com
          This is very very helpful as were Sheri s comments yesterday. I guess until yesterday I didn t see help with regular expressions as I always clicked into
          Message 4 of 4 , Jul 3, 2009
          • 0 Attachment
            This is very very helpful as were Sheri's comments yesterday. I guess
            until yesterday I didn't see "help with regular expressions" as I always
            clicked into "regular expressions" from "help with clip commands" off of
            the menu. Examples are always so helpful and not always really common
            in the help files to be honest, at least as many examples as might help me.

            I think some lines of my current clip may be interesting to others and I
            know I'll want to find them later too ... so if you will indulge me,
            I'll explain this part of my clip.

            Here are the lines of code in my clip from line 74 to line 90. To
            explain it just the tiniest bit, I am taking a line of data:
            99 Trzybinski, Chelsea 10 Union 26:17.64 11
            Every line is fixed width. At the front part of my clip I am telling
            the clip what characters on each line are
            Place/Name/Grade/School/Time/Junk (by omission it is junk)

            I then create an array that tells me the following for each of the above
            except Junk:
            starting cursor position of data#colon#length of data
            element#colon#:stopping cursor position#colon#name of data element -
            place/name/grade/school/time#semi-colon#...repeat for each element

            So it looks like this:
            0001:3:4:Grade;0005:26:31:Name;0031:2:33:Place;0034:22:56:School;0056:8:64:Time

            I sort those elements so that they are in order from first to last
            starting position and then I can easily do the math to decide there is
            "junk" between the elements and then I discard those junk elements if
            you will. For example in the above array column 4 is discarded as junk
            and column 33 is discarded. You know there is junk because the finish
            cursor position doesn't equal the following element's start position.

            What was frustrating me is that the order of elements varies from data
            set to data set, so naming my elements lets me put them back in the
            right place easily so that output is consistent. I put pipes "|" around
            the name because I then go work on it by breaking first and last into
            tab delimited elements as the next step.

            The other thing is I start with a four element array and then convert
            the colons to semi colons to make it a 20 element array after sorting it
            on the first, fifth, ninth, 13th and 17th elements which are the start
            points for each data element. In essence it is a two dimensional array
            at the start and then I make it a one dimensional array with a string
            replace on the colons. Note that I do this with the SetArray command so
            that it also updates the zero element from 5 to 20 elements in the array.

            So, Line 74 is a placeholder
            75 sorts the array from comma delimited to comma delimited -- this is a
            wonderful line of code to be saved for future use by me
            76 is info and will be removed
            77 is converting a five element array to a twenty element array
            78 is blank
            79 is a heading
            80 comment shows element numbers of array sample below
            81 comment array sample
            82 shows regex search that would work based on the above sample ... used
            by me to figure out how to build the regex in a few lines that will vary
            depending on the columns we need to use/discard for a specific data set
            83 blank
            84 here I build the regex search that will look much like line 82 and
            assign it to a variable so I can use it later -- it does math on the 20
            element array to decide what data to save as each field in the delimited
            output and IT NAMES THE REPLACE SUBPATTERNS based on the names in the array
            85 blank
            86 info -- will be removed
            87 blank
            88 replace command using the regex built in line 84 -- because we now
            name the subpatterns in 84 we can always put them in the same order here
            89 goes to the next clip where I will fix the name by breaking it into
            first#tab#last -- which is why I pipe delimited it so I can easily
            replace it
            90 end comment



            ;Line 80
            ^!SetArray
            %fielddynamics%=^$StrReplace("^P";";";"^$StrSort("^$StrReplace(";";"^P";"^%fielddynamics%";0;0)$";No;Yes;No)$";No;No)$
            ^!Info [C]^%fielddynamics%
            ^!SetArray %fielddynamics%=^$StrReplace(":";";";"^%fielddynamics%";No;No)$

            :BuildRegex
            ;1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
            18 19 20
            ;0001:3:4:Grade;0005:26:31:Name;0031:2:33:Place;0034:22:56:School;0056:8:64:Time
            ;^.{0}(.{3}).{1}(.{26}).{0}(.{2}).{1}(.{22}).{0}(.{8}).*

            ^!Set
            %regexstring%="^.{^$Calc(^%fielddynamics1%-1)$}(?<^%fielddynamics4%>.{^%fielddynamics2%}).{^$Calc(^%fielddynamics5%-^%fielddynamics3%)$}(?<^%fielddynamics8%>.{^%fielddynamics6%}).{^$Calc(^%fielddynamics9%-^%fielddynamics7%)$}(?<^%fielddynamics12%>.{^%fielddynamics10%}).{^$Calc(^%fielddynamics13%-^%fielddynamics11%)$}(?<^%fielddynamics16%>.{^%fielddynamics14%}).{^$Calc(^%fielddynamics17%-^%fielddynamics15%)$}(?<^%fielddynamics20%>.{^%fielddynamics18%}).*"

            ^!Info [C]^%regexstring%

            ^!Replace "^%regexstring%" >>
            "^%D%\t^%Gender%\t$<Place>\t|$<Name>|\t$<Grade>\t$<School>\t$<Time>" RAWS
            ^!Clip "xc: divide names to clean up results v2"
            ;Line 90

            Of course improvements welcomed -- although this appears to be working well.

            Flo wrote:
            > --- In ntb-clips@yahoogroups.com, "Don - HtmlFixIt.com" <don@...> wrote:
            >
            >> Can someone point me to the place in help, or alternatively explain
            >> it to me please and then given an example.
            >
            > Don,
            >
            > Given a text like...
            >
            > The museum opens Mondays, Fridays, and Sundays.
            >
            > With the clip...
            >
            > ^!Replace "(?P<open>Mon|Fri|Sun)days" >> "$<open>" AWRS
            >
            > (where "open" is the name of the subpattern) the days will be shortened to "Mon, Fri, Sun".
            >
            > If the named subpattern is used multiple times you will have to add the Duplicate Names Option (?J). The sample text is:
            >
            > Sunday, Monday, Tuesday too.
            > Wednesday, Thursday just for you.
            > Friday, Saturday that's the end.
            > Now let's say those days again!
            >
            > In order to shorten all days, you may run (in one long line!)...
            >
            > ^!Replace "(?J)(?P<name>Mon|Fri|Sun)day|(?P<name>Tue)sday|(?P<name>Wed)nesday|(?P<name>Thu)rsday|(?P<name>Sat)urday" >> "$<name>" AWRS
            >
            > Regards,
            > Flo
            >
            >
            >
            >
            > ------------------------------------
            >
            > Fookes Software: http://www.fookes.com/
            > NoteTab website: http://www.notetab.com/
            > NoteTab Discussion Lists: http://www.notetab.com/groups.php
            >
            > ***
            > Yahoo! Groups Links
            >
            >
            >
            >
          Your message has been successfully submitted and would be delivered to recipients shortly.