Loading ...
Sorry, an error occurred while loading the content.

2384Re: Vim on OS X, (no)macatsui problem

Expand Messages
  • björn
    Oct 14, 2007
      > > The problem is that one deseret character for some reason takes up
      > > _two_ characters when put in the text storage (I guess this have
      > > something to do with Unicode?). Specifically, calling "length" on an
      > > NSString containing one deseret character returns 2 instead of 1, as I
      > > would expect.
      > >
      > UTF-8 uses:
      > 1 byte for each codepoint in the range U+0000 - U+007F
      > 2 bytes for each codepoint in the range U+0080 - U+07FF
      > 3 bytes for each codepoint in the range U+0800 - U+FFFF
      > 4 bytes for each codepoint in the range U+10000 - U+1FFFFF
      > Actually, current standards mandate that no codepoints higher than U+10FFFD
      > will "ever" be used. (Vim supports up to U+3FFFFFFF, with up to 6 bytes per
      > codepoint, following an earlier draft of the standard.)
      > Unicode also has the notion of "composing characters", which are characters
      > which are "superimposed" on the preceding character, possibly changing its
      > shape. These are usually diacritics: most of the accents of Latin can be
      > either precomposed or spacing-non-accented + composing-accent, but the
      > optional vowel marks of Hebrew and Arabic exist only as composing characters.
      > Since your Deseret characters are outside the BMP, each of them requires 4
      > bytes in UTF-8 (also two 16-bit words in UTF-16 and one 32-bit doubleword in
      > UTF-32); but maybe that's not what your measured "length" means? Does your
      > NSString include a final null (as C strings do) or an initial bytecount (as
      > Pascal strings do)? Or do your Deseret characters include "composing" elements?

      I'm sorry about the confusion with posting this thread separately on
      vim_multibyte and vim_mac...I'll try to bring the diverging threads
      together by posting this reply to both groups.

      Tim Allen replied to the vim_mac thread saying that NSString uses
      utf-16 internally and this is indeed why it says one deseret char has
      length 2 (since it needs two 16 bit chars to store one deseret char,
      as has been pointed out already).

      I was under the mistaken impression that NSString always returned
      length 1 for one character (not counting composing characters), which
      is why I thought MacVim would work in all situations except when
      composing characters were used. Again, this can be fixed by getting
      rid of the assumption that each line in the text storage has the same
      length (as returned by NSString), but this is a rather big code

      Thanks to Tony and Tim for educating me on the finer points of Unicode... :-)


      You received this message from the "vim_multibyte" maillist.
      For more information, visit http://www.vim.org/maillist.php
    • Show all 17 messages in this topic