Loading ...
Sorry, an error occurred while loading the content.

Re: Asterisk in Negative Character Class

Expand Messages
  • Sheri
    ... Any pair of characters clipcode treats as tokens, e.g. ^*, ^#, ^P, ^T, ^L, (maybe also ^% and ^$) would be subject to misinterpretation in you need them in
    Message 1 of 3 , Oct 1, 2009
    • 0 Attachment
      --- In ntb-clips@yahoogroups.com, "flo.gehrke" <flo.gehrke@...> wrote:
      >
      > In a text database, some terms are marked with asterisks for indexing. For example...
      >
      > Alexander von **Humboldt**'s travel diary of his journey on the **Orinoco** in April **1800** reminds us of the complex and difficult stages of his writing process.
      >
      > When running...
      >
      > ^!Info ^$GetDocListAll("[*\d]{2,}";"$0\r\n")$
      >
      > against this string the clip correctly outputs:
      >
      > **
      > **
      > **
      > **
      > **1800**
      >
      > The RegEx used here complies with PCRE rules: Inside a Character Class, an asterisk is no metacharacter but a literal asterisk.
      >
      > Next, I tried to output those terms without the asterisks running...
      >
      > ^!Info ^$GetDocListAll("(?:\*\*)([^*]+)\*\*";"$1\r\n")$
      >
      > In this case, the output is quite absurd. An error message pops up saying "Regex error: PCRE does not support \L, \l, \N, \U, or \u" -- although none of these metacharacters is used here.
      >
      > After OK, the Infobox appears displaying the same error message again.
      >
      > Obviously, the clip misinterprets the Negative Character Class [^*]. Though PCRE doesn't demand it here, it works if we escape the asterisk with [^\*].
      >
      > Strange -- isn't it?
      >
      > Flo
      >

      Any pair of characters clipcode treats as tokens, e.g. ^*, ^#, ^P, ^T, ^L, (maybe also ^% and ^$) would be subject to misinterpretation in you need them in your pattern literally. To see what is getting substituted into your character class, you could try:

      ^!Info ^*

      just before your ^$GetDocListAll.

      Regards,
      Sheri
    • flo.gehrke
      ... Sheri, Thanks for this explanation. For me, it varies a little bit... Not working: [^$] [^%] [^#] [^*] Working: [^T] [^P] [^L] interpreted as a non-T,
      Message 2 of 3 , Oct 1, 2009
      • 0 Attachment
        --- In ntb-clips@yahoogroups.com, "Sheri" <silvermoonwoman@...> wrote:
        >
        > Any pair of characters clipcode treats as tokens, e.g. ^*, ^#, ^P,
        > ^T, ^L, (maybe also ^% and ^$) would be subject to misinterpretation
        > in you need them in your pattern literally.

        Sheri,

        Thanks for this explanation. For me, it varies a little bit...

        Not working: [^$] [^%] [^#] [^*]
        Working: [^T] [^P] [^L] interpreted as a non-T, non-P, non-L.

        Flo
      Your message has been successfully submitted and would be delivered to recipients shortly.