Loading ...
Sorry, an error occurred while loading the content.

Re: utf-16

Expand Messages
  • Tony Mechelynck
    seer26 wrote: [...] ... Thanks! Haven yet looked up that 2nd link but the 1st one is very detailed. It even points out the fact that
    Message 1 of 2 , Nov 15, 2002
    View Source
    • 0 Attachment
      seer26 <seer26@...> wrote:
      [...]
      > answers at:
      > http://www.cl.cam.ac.uk/~mgk25/unicode.html
      > http://www.zvon.org/tmRFC/RFC2781/Output/chapter2.html
      >
      > executive summary:
      > utf-16 is ucs-2 plus surrogate pairs (a sneaky way to represent a
      > character
      > from 10000 to 10FFFF using two ucs-2 charecters) Personally, I'd say its
      > only useful to people who have jumped the gun and went ucs-2 instead of
      > utf-8 (such as MS,Apple,Java,etc). When utf-8 is feasible, it should be
      > used instead.

      Thanks! Haven yet looked up that 2nd link but the 1st one is very detailed.
      It even points out the fact that depending of the context there are two
      kinds of UTF-8 (in one case copdepoints in the range 110000-7FFFFF are
      illegal, must be rejected if read and never output, in the other they are
      legal though not yet defined). So the most important difference between
      UCS-[2,4] and UTF-[16,32] is in the range of codepoints that can be
      represented. Also they point the (maybe paradoxical) fact that UTF-32 is a
      21-bit representation (using, it is true, 32-bit words), while UCS-4 is
      31-bit. A mine of information, even (notwithstanding the title) for people
      outside the Unix/Linux world. I'm going to write to Yegappan to urge him to
      add a link to it in the Unicode section I wrote for the Vim FAQ.

      Regards,
      Tony.
    Your message has been successfully submitted and would be delivered to recipients shortly.