Loading ...
Sorry, an error occurred while loading the content.

2385Re: Vim on OS X, (no)macatsui problem

Expand Messages
  • Tony Mechelynck
    Oct 14, 2007
    • 0 Attachment
      björn wrote:
      > I'm sorry about the confusion with posting this thread separately on
      > vim_multibyte and vim_mac...I'll try to bring the diverging threads
      > together by posting this reply to both groups.
      > Tim Allen replied to the vim_mac thread saying that NSString uses
      > utf-16 internally and this is indeed why it says one deseret char has
      > length 2 (since it needs two 16 bit chars to store one deseret char,
      > as has been pointed out already).

      Yes, obviously (if one thinks about it) one UTF-16 16-bit word cannot
      represent anything above U+FFFF. For codepoints U+10000 to U+10FFFF (including
      Deseret, among others), two "surrogate characters" are used -- two 16-bit
      words, one in the range 0xD800-0xDBFF and the other in the range 0xDC00-0xDFFF
      : see
      http://en.wikipedia.org/wiki/UTF-16#Encoding_of_characters_outside_the_BMP for
      details. Unlike UTF-8 and UTF-32, UTF-16 inherently cannot, even with
      surrogates, represent anything above U+10FFFF, and (I suppose) that's (one of
      the reasons) why it was decided to bring the "upper range" of Unicode down
      from U+7FFFFFFF to U+10FFFF (and even U+10FFFD since for other reasons, the
      last two codepoints of every plane -- U+xxFFFE and U+xxFFFF -- are "invalid").

      > I was under the mistaken impression that NSString always returned
      > length 1 for one character (not counting composing characters), which
      > is why I thought MacVim would work in all situations except when
      > composing characters were used. Again, this can be fixed by getting
      > rid of the assumption that each line in the text storage has the same
      > length (as returned by NSString), but this is a rather big code
      > change.
      > Thanks to Tony and Tim for educating me on the finer points of Unicode... :-)

      My pleasure. :-)

      > /Björn

      Best regards,
      Court, n.:
      A place where they dispense with justice.
      -- Arthur Train

      You received this message from the "vim_multibyte" maillist.
      For more information, visit http://www.vim.org/maillist.php
    • Show all 17 messages in this topic