Loading ...
Sorry, an error occurred while loading the content.

Asterisk in Negative Character Class

Expand Messages
  • flo.gehrke
    In a text database, some terms are marked with asterisks for indexing. For example... Alexander von **Humboldt** s travel diary of his journey on the
    Message 1 of 3 , Oct 1, 2009
    • 0 Attachment
      In a text database, some terms are marked with asterisks for indexing. For example...

      Alexander von **Humboldt**'s travel diary of his journey on the **Orinoco** in April **1800** reminds us of the complex and difficult stages of his writing process.

      When running...

      ^!Info ^$GetDocListAll("[*\d]{2,}";"$0\r\n")$

      against this string the clip correctly outputs:

      **
      **
      **
      **
      **1800**

      The RegEx used here complies with PCRE rules: Inside a Character Class, an asterisk is no metacharacter but a literal asterisk.

      Next, I tried to output those terms without the asterisks running...

      ^!Info ^$GetDocListAll("(?:\*\*)([^*]+)\*\*";"$1\r\n")$

      In this case, the output is quite absurd. An error message pops up saying "Regex error: PCRE does not support \L, \l, \N, \U, or \u" -- although none of these metacharacters is used here.

      After OK, the Infobox appears displaying the same error message again.

      Obviously, the clip misinterprets the Negative Character Class [^*]. Though PCRE doesn't demand it here, it works if we escape the asterisk with [^\*].

      Strange -- isn't it?

      Flo
    • Sheri
      ... Any pair of characters clipcode treats as tokens, e.g. ^*, ^#, ^P, ^T, ^L, (maybe also ^% and ^$) would be subject to misinterpretation in you need them in
      Message 2 of 3 , Oct 1, 2009
      • 0 Attachment
        --- In ntb-clips@yahoogroups.com, "flo.gehrke" <flo.gehrke@...> wrote:
        >
        > In a text database, some terms are marked with asterisks for indexing. For example...
        >
        > Alexander von **Humboldt**'s travel diary of his journey on the **Orinoco** in April **1800** reminds us of the complex and difficult stages of his writing process.
        >
        > When running...
        >
        > ^!Info ^$GetDocListAll("[*\d]{2,}";"$0\r\n")$
        >
        > against this string the clip correctly outputs:
        >
        > **
        > **
        > **
        > **
        > **1800**
        >
        > The RegEx used here complies with PCRE rules: Inside a Character Class, an asterisk is no metacharacter but a literal asterisk.
        >
        > Next, I tried to output those terms without the asterisks running...
        >
        > ^!Info ^$GetDocListAll("(?:\*\*)([^*]+)\*\*";"$1\r\n")$
        >
        > In this case, the output is quite absurd. An error message pops up saying "Regex error: PCRE does not support \L, \l, \N, \U, or \u" -- although none of these metacharacters is used here.
        >
        > After OK, the Infobox appears displaying the same error message again.
        >
        > Obviously, the clip misinterprets the Negative Character Class [^*]. Though PCRE doesn't demand it here, it works if we escape the asterisk with [^\*].
        >
        > Strange -- isn't it?
        >
        > Flo
        >

        Any pair of characters clipcode treats as tokens, e.g. ^*, ^#, ^P, ^T, ^L, (maybe also ^% and ^$) would be subject to misinterpretation in you need them in your pattern literally. To see what is getting substituted into your character class, you could try:

        ^!Info ^*

        just before your ^$GetDocListAll.

        Regards,
        Sheri
      • flo.gehrke
        ... Sheri, Thanks for this explanation. For me, it varies a little bit... Not working: [^$] [^%] [^#] [^*] Working: [^T] [^P] [^L] interpreted as a non-T,
        Message 3 of 3 , Oct 1, 2009
        • 0 Attachment
          --- In ntb-clips@yahoogroups.com, "Sheri" <silvermoonwoman@...> wrote:
          >
          > Any pair of characters clipcode treats as tokens, e.g. ^*, ^#, ^P,
          > ^T, ^L, (maybe also ^% and ^$) would be subject to misinterpretation
          > in you need them in your pattern literally.

          Sheri,

          Thanks for this explanation. For me, it varies a little bit...

          Not working: [^$] [^%] [^#] [^*]
          Working: [^T] [^P] [^L] interpreted as a non-T, non-P, non-L.

          Flo
        Your message has been successfully submitted and would be delivered to recipients shortly.