- View Sourceseer26 <seer26@...> wrote:
> answers at:Thanks! Haven yet looked up that 2nd link but the 1st one is very detailed.
> executive summary:
> utf-16 is ucs-2 plus surrogate pairs (a sneaky way to represent a
> from 10000 to 10FFFF using two ucs-2 charecters) Personally, I'd say its
> only useful to people who have jumped the gun and went ucs-2 instead of
> utf-8 (such as MS,Apple,Java,etc). When utf-8 is feasible, it should be
> used instead.
It even points out the fact that depending of the context there are two
kinds of UTF-8 (in one case copdepoints in the range 110000-7FFFFF are
illegal, must be rejected if read and never output, in the other they are
legal though not yet defined). So the most important difference between
UCS-[2,4] and UTF-[16,32] is in the range of codepoints that can be
represented. Also they point the (maybe paradoxical) fact that UTF-32 is a
21-bit representation (using, it is true, 32-bit words), while UCS-4 is
31-bit. A mine of information, even (notwithstanding the title) for people
outside the Unix/Linux world. I'm going to write to Yegappan to urge him to
add a link to it in the Unicode section I wrote for the Vim FAQ.