Loading ...
Sorry, an error occurred while loading the content.

22417Re: [Clip] Trouble with UTF

Expand Messages
  • Axel Berger
    Jan 11, 2012
    • 0 Attachment
      Art Kocsis wrote:
      > will your clip be available after you get finished tweaking it?

      There was a very silly typing mistake in the first draft that didn't
      come to light until I first came upon non-ANSI characters. This seems to
      work (many ^!Set lines are long):

      :loop
      ^!Find "[\xC0-\xF7][\x80-\xBF]*" RS
      ^!IfError donelatin
      ^!IfMatch "[\xC2-\xC3][\x80-\xBF]" "^$GetSelection$" latin1
      ^!IfMatch "[\xC0-\xDF][\x80-\xBF]" "^$GetSelection$" zwei
      ^!IfMatch "[\xE0-\xEF][\x80-\xBF]{2}" "^$GetSelection$" drei
      ^!IfMatch "[\xF0-\xF7][\x80-\xBF]{3}" "^$GetSelection$" vier
      ^!Continue Illegal sequence, can't be converted.
      ^!Goto loop
      :zwei
      ^!Set %first%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";2)$)$ MOD
      64)$
      ^!Set %second%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";1)$)$ MOD
      32)$
      ^!Set %third%=0
      ^!Set %fourth%=0
      ^!Goto makeent
      :drei
      ^!Set %first%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";3)$)$ MOD
      64)$
      ^!Set %second%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";2)$)$ MOD
      64)$
      ^!Set %third%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";1)$)$ MOD
      16)$
      ^!Set %fourth%=0
      ^!Goto makeent
      :vier
      ^!Set %first%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";4)$)$ MOD
      64)$
      ^!Set %second%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";3)$)$ MOD
      64)$
      ^!Set %third%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";2)$)$ MOD
      64)$
      ^!Set %fourth%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";1)$)$ MOD
      8)$
      :makeent
      ^!Set
      %first%=^$Calc(262144*^%fourth%+4096*^%third%+64*^%second%+^%first%;0)$
      ^!InsertText &#^%first%;
      ^!Goto loop
      :latin1
      ^!Set %first%=^$StrCopyRight("^$GetSelection$";1)$
      ^!Set %second%=^$StrCopyLeft("^$GetSelection$";1)$
      ^!Set %first%=^$Calc(^$CharToDec(^%first%)$ MOD 64)$
      ^!Set %second%=^$Calc(^$CharToDec(^%second%)$ MOD 4)$
      ^!InsertText ^$DecToChar(^$Calc(64*^%second%+^%first%)$)$
      ^!Goto loop
      :donelatin
      ^!Replace "€" >> "€" WASTI
      ^!Replace "Š" >> "Š" WASTI
      ^!Replace "š" >> "š" WASTI
      ^!Replace "Ž" >> "Ž" WASTI
      ^!Replace "ž" >> "ž" WASTI
      ^!Replace "Œ" >> "Œ" WASTI
      ^!Replace "œ" >> "œ" WASTI
      ^!Replace "Ÿ" >> "Ÿ" WASTI

      It is advisable to check for legal UTF-8, i.e. no non-UTF 8-bit
      characters, first:

      :loop
      ^!Find "([\x80-\xBF]|[\xC0-\xFF][\x80-\xBF]*)" RS
      ^!IfError usasc
      ^!IfMatch "[\xC2-\xDF][\x80-\xBF]" "^$GetSelection$" loop
      ^!IfMatch "\xE0[\xA0-\xBF][\x80-\xBF]" "^$GetSelection$" loop
      ^!IfMatch "[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}" "^$GetSelection$" loop
      ^!IfMatch "\xED[\x80-\x9F][\x80-\xBF]" "^$GetSelection$" loop
      ^!IfMatch "\xF0[\x90-\xBF][\x80-\xBF]{2}" "^$GetSelection$" loop
      ^!IfMatch "[\xF1-\xF3][\x80-\xBF]{3}" "^$GetSelection$" loop
      ^!IfMatch "\xF4[\x80-\x8F][\x80-\xBF]{2}" "^$GetSelection$" loop
      ^!Continue Illegal sequence, no UTF-8
      ^!Goto loop
      :usasc
      ^!Continue No errors found

      Both clips do not start with a ^!Jump TEXT_START and begin at the
      current cursor position. This is on purpose, but you might want to
      change it.

      Axel

      --
      Dipl.-Ing. F. Axel Berger Tel: +49/ 2174/ 7439 07
      Johann-Häck-Str. 14 Fax: +49/ 2174/ 7439 68
      D-51519 Odenthal-Heide eMail: Axel-Berger@...
      Deutschland (Germany) http://berger-odenthal.de
    • Show all 12 messages in this topic