Thanks, Axel, I may incorporate this into my library. But you know what they say about sleeping dogs� :-)

The problems I encounter are often related to someone's incorrect usage of certain characters. For example there are

several different characters that people use to denote the one and only 'degrees' symbol. So my clips carefully look for

these incorrect usages and convert them to 'real' degrees symbols. An algorithmic approach would miss those, and then

I'd still have to have something to make those conversions. There are other such cases, as well, like the already

mentioned 'smart quotes' needing to be converted to standard quotes.

Regards,

John

RecipeTools Web Site: <

http://recipetools.gotdns.com/>

http://recipetools.gotdns.com/
From:

ntb-clips@yahoogroups.com [mailto:

ntb-clips@yahoogroups.com] On Behalf Of Axel Berger

Sent: Wednesday, November 23, 2011 05:54

To:

ntb-clips@yahoogroups.com
Subject: Re: [Clip] Transform UTF-8 to ANSI

John Shotsky wrote:

> I have something similar,

John sent me a small converter for single-charater fractions like � to

their three character equivalents like 1/4 by Sheri and a quite

comprehensive one for his recipes from diverse sources of his own. Both

are much tidier than my quick-and-sirty efforts, with comprehensive

comments, error checking, and clearing of variables. But I've done

something different now and solved it algorithmicly.

By the way it's all a bit superfluous, as NoteTab does do it natively

after all. I just have to save the text and open in NoteTab. Standard

UTF are shown as ANSI characters and I can save as ANSI. It just does

not work with my usual method of pasting text into an open, empty, new

document.

First off, John, your conversions of "quoted-printable" characters can

be more generalised thus:

^!Replace "=^P" >> "" WASTI

^!Replace "=3D" >> "<gleich>" WASTI

^!Jump TEXT_START

:loop

^!Find "=[0-9A-Fa-f]{2}" WRASTI

;^!Continue

^!IfError fini

;long line

^!InsertText

^$DecToChar(^$HexToInt(^$StrCopyRight("^$GetSelection$";2)$)$)$

;end long line

^!Goto loop

:fini

^!Replace "<gleich>" >> "=" WASTI

My solution for UTF-8 ist the following. It is not fully tested. UTF-8

encodes Latin-1 in the first 256 characters, so the eight characters

where cp-1252 (aka Windows) and Latin-1 differ have to be treated

specially. Quite a few of the ^!Set lines are long:

:loop

^!Find "[\xC0-\xF7][\x80-\xBF]*" RS

^!IfError donelatin

^!IfMatch "[\xC2-\xC3][\x80-\xBF]" "^$GetSelection$" latin1

^!IfMatch "[\xC0-\xDF][\x80-\xBF]" "^$GetSelection$" zwei

^!IfMatch "[\xE0-\xEF][\x80-\xBF]{2}" "^$GetSelection$" drei

^!IfMatch "[\xF0-\xF7][\x80-\xBF]{3}" "^$GetSelection$" vier

^!Continue Illegal sequence, can't be converted.

^!Goto loop

:zwei

^!Set %first%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";2)$)$ MOD

64)$

^!Set %second%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";1)$)$ MOD

32)$

^!Set %third%=0

^!Set %fourth%=0

^!Goto makeent

:drei

^!Set %first%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";3)$)$ MOD

64)$

^!Set %second%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";2)$)$ MOD

64)$

^!Set %third%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";1)$)$ MOD

16)$

^!Set %fourth%=0

^!Goto makeent

:vier

^!Set %first%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";4)$)$ MOD

64)$

^!Set %second%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";3)$)$ MOD

64)$

^!Set %third%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";2)$)$ MOD

64)$

^!Set %fourth%=^$Calc(^$CharToDec(^$StrIndex("^$GetSelection$";1)$)$ MOD

8)$

:makeent

^!Set

%first%=^$Calc(262144*^%fourth%+4096*^%third+%64*^%second%+^%first%;0)$

^!InsertText &#^%first%;

^!Goto loop

:latin1

^!Set %first%=^$StrCopyRight("^$GetSelection$";1)$

^!Set %second%=^$StrCopyLeft("^$GetSelection$";1)$

^!Set %first%=^$Calc(^$CharToDec(^%first%)$ MOD 64)$

^!Set %second%=^$Calc(^$CharToDec(^%second%)$ MOD 4)$

^!InsertText ^$DecToChar(^$Calc(64*^%second%+^%first%)$)$

^!Goto loop

:donelatin

^!Replace "€" >> "�" WASTI

^!Replace "Š" >> "�" WASTI

^!Replace "š" >> "�" WASTI

^!Replace "Ž" >> "�" WASTI

^!Replace "ž" >> "�" WASTI

^!Replace "Œ" >> "�" WASTI

^!Replace "œ" >> "�" WASTI

^!Replace "Ÿ" >> "�" WASTI

Axel

[Non-text portions of this message have been removed]