Loading ...
Sorry, an error occurred while loading the content.

utf-16

Expand Messages
  • seer26
    ... answers at: http://www.cl.cam.ac.uk/~mgk25/unicode.html http://www.zvon.org/tmRFC/RFC2781/Output/chapter2.html executive summary: utf-16 is ucs-2 plus
    Message 1 of 2 , Nov 14, 2002
    • 0 Attachment
      > Sorry if I seem to be asking a dumb question, but what is the difference
      > between utf-16 and ucs-2? I understand that these names describe unicode
      > encodings where the codepoints of Unicode are represented by means of 16-bit
      > words. The docs I found somewhere under http://www.unicode.org include
      > descriptions of utf-8, utf-16le, utf-16ge, utf-32le, utf-32ge, but I hardly
      > saw any name starting with ucs. I had been under the impression that ucs-2
      > and ucs-4 were older names for utf-16 and utf-32 respectively. (If they
      > aren't, I should reread my section if the Vim FAQ about Unicode and maybe
      > send a correction to Yegappan.)

      answers at:
      http://www.cl.cam.ac.uk/~mgk25/unicode.html
      http://www.zvon.org/tmRFC/RFC2781/Output/chapter2.html

      executive summary:
      utf-16 is ucs-2 plus surrogate pairs (a sneaky way to represent a
      character
      from 10000 to 10FFFF using two ucs-2 charecters) Personally, I'd say its
      only useful to people who have jumped the gun and went ucs-2 instead of
      utf-8 (such as MS,Apple,Java,etc). When utf-8 is feasible, it should be
      used instead.
    • Tony Mechelynck
      seer26 wrote: [...] ... Thanks! Haven yet looked up that 2nd link but the 1st one is very detailed. It even points out the fact that
      Message 2 of 2 , Nov 15, 2002
      • 0 Attachment
        seer26 <seer26@...> wrote:
        [...]
        > answers at:
        > http://www.cl.cam.ac.uk/~mgk25/unicode.html
        > http://www.zvon.org/tmRFC/RFC2781/Output/chapter2.html
        >
        > executive summary:
        > utf-16 is ucs-2 plus surrogate pairs (a sneaky way to represent a
        > character
        > from 10000 to 10FFFF using two ucs-2 charecters) Personally, I'd say its
        > only useful to people who have jumped the gun and went ucs-2 instead of
        > utf-8 (such as MS,Apple,Java,etc). When utf-8 is feasible, it should be
        > used instead.

        Thanks! Haven yet looked up that 2nd link but the 1st one is very detailed.
        It even points out the fact that depending of the context there are two
        kinds of UTF-8 (in one case copdepoints in the range 110000-7FFFFF are
        illegal, must be rejected if read and never output, in the other they are
        legal though not yet defined). So the most important difference between
        UCS-[2,4] and UTF-[16,32] is in the range of codepoints that can be
        represented. Also they point the (maybe paradoxical) fact that UTF-32 is a
        21-bit representation (using, it is true, 32-bit words), while UCS-4 is
        31-bit. A mine of information, even (notwithstanding the title) for people
        outside the Unix/Linux world. I'm going to write to Yegappan to urge him to
        add a link to it in the Unicode section I wrote for the Vim FAQ.

        Regards,
        Tony.
      Your message has been successfully submitted and would be delivered to recipients shortly.