23552[Clip] Re: Tooltip clip?

Expand Messages
  • flo.gehrke
    Dec 20, 2012
    • 0 Attachment
      --- In ntb-clips@yahoogroups.com, "m.feichtinger" <mafei@...> wrote:
      > The following clip works on your provided test text. (...)
      > The (?<Name>...) groups are named for convenient reference.
      > ^!Replace "(?xJ)^(?<HEAD>Matt\x20\d++:\d++\.?) (?<TEXT>.+)
      > \K ((?#BooklistStart) (?<SN>Mat)(?:t)? | (?<SN>Mar)(?:k)? |
      > (?<SN>Luu)(?:k)? | (?<SN>Joh)(?:)? | (?<SN>Esr)(?:a)? |
      > (?<SN>Mii)(?:ka)? | (?<SN>Ruu)(?:t)? (?#BooklistEnd)) \.? \x20"
      > >> "$<SN>_" RWAS
      > etc...

      Just a few comments on that pattern...

      1. Named subpattern are used for referencing. Actually, there's only one reference to a single named subpattern <SN>. There's no reference to <HEAD> and <TEXT>. So why introducing these names?

      2. Since you start with '^!Replace "(?xJ)^(?<HEAD>Matt\x20...', the only matches are achieved in lines (or paragraphs) that start with 'Matt'. So what's the use of that long alternation? Except 'Mat', none of these alternatives will ever be matched because - if I'm not mistaken - they don't occur in a line (or paragraph) that starts with 'Matt'.

      3. In general, it's not very efficient to name each item in an alternation with the same name. In this case, you probably are better off with a Duplicate Subpattern Number '(?|...)'.

      Example: Take an alternation reduced to 'Matt' or 'Mark' only. Instead of writing...

      ^!Replace "(?xJ)^Matt\x20\d+:\d+\.? .+ \K ((?<SN>Mat)(?:t)? | (?<SN>Mar)(?:k)? ) \.?\x20" >> "$<SN>_" WARS
      ^!IfError Next Else Skip_-1

      you better leave out that '(?J)' modifier, omit naming, and write...

      ^!Replace "(?x)^Matt\x20\d+:\d+\.? .+ \K (?|(Mat)(?:t)? | (Mar)(?:k)? ) \.?\x20" >> "$1_" RWAS
      ^!IfError Next Else Skip_-1

      This gets to the same result and saves you a lot of "noise" in your pattern.

      Please note: This is considering some details only -- it's not the complete job!

      In my view, it's impossible to seriously create a solution without a complete and exact overview of the data "before & after editing". Also a complete list is needed that shows all abbreviations for the books at start of string and how each one has to be replaced with a shorter notation. Though more than 20 messages have been posted so far, we haven't seen this indispensable precondition. Also, the conditions get changed with each of puusto's messages.

      Sorry -- I could imagine a more efficient procedure...

