> answers at:
> executive summary:
> utf-16 is ucs-2 plus surrogate pairs (a sneaky way to represent a
> from 10000 to 10FFFF using two ucs-2 charecters) Personally, I'd say its
> only useful to people who have jumped the gun and went ucs-2 instead of
> utf-8 (such as MS,Apple,Java,etc). When utf-8 is feasible, it should be
> used instead.
Thanks! Haven yet looked up that 2nd link but the 1st one is very detailed.
It even points out the fact that depending of the context there are two
kinds of UTF-8 (in one case copdepoints in the range 110000-7FFFFF are
illegal, must be rejected if read and never output, in the other they are
legal though not yet defined). So the most important difference between
UCS-[2,4] and UTF-[16,32] is in the range of codepoints that can be
represented. Also they point the (maybe paradoxical) fact that UTF-32 is a
21-bit representation (using, it is true, 32-bit words), while UCS-4 is
31-bit. A mine of information, even (notwithstanding the title) for people
outside the Unix/Linux world. I'm going to write to Yegappan to urge him to
add a link to it in the Unicode section I wrote for the Vim FAQ.