2384Re: Vim on OS X, (no)macatsui problem
- Oct 14, 2007
> > The problem is that one deseret character for some reason takes upI'm sorry about the confusion with posting this thread separately on
> > _two_ characters when put in the text storage (I guess this have
> > something to do with Unicode?). Specifically, calling "length" on an
> > NSString containing one deseret character returns 2 instead of 1, as I
> > would expect.
> UTF-8 uses:
> 1 byte for each codepoint in the range U+0000 - U+007F
> 2 bytes for each codepoint in the range U+0080 - U+07FF
> 3 bytes for each codepoint in the range U+0800 - U+FFFF
> 4 bytes for each codepoint in the range U+10000 - U+1FFFFF
> Actually, current standards mandate that no codepoints higher than U+10FFFD
> will "ever" be used. (Vim supports up to U+3FFFFFFF, with up to 6 bytes per
> codepoint, following an earlier draft of the standard.)
> Unicode also has the notion of "composing characters", which are characters
> which are "superimposed" on the preceding character, possibly changing its
> shape. These are usually diacritics: most of the accents of Latin can be
> either precomposed or spacing-non-accented + composing-accent, but the
> optional vowel marks of Hebrew and Arabic exist only as composing characters.
> Since your Deseret characters are outside the BMP, each of them requires 4
> bytes in UTF-8 (also two 16-bit words in UTF-16 and one 32-bit doubleword in
> UTF-32); but maybe that's not what your measured "length" means? Does your
> NSString include a final null (as C strings do) or an initial bytecount (as
> Pascal strings do)? Or do your Deseret characters include "composing" elements?
vim_multibyte and vim_mac...I'll try to bring the diverging threads
together by posting this reply to both groups.
Tim Allen replied to the vim_mac thread saying that NSString uses
utf-16 internally and this is indeed why it says one deseret char has
length 2 (since it needs two 16 bit chars to store one deseret char,
as has been pointed out already).
I was under the mistaken impression that NSString always returned
length 1 for one character (not counting composing characters), which
is why I thought MacVim would work in all situations except when
composing characters were used. Again, this can be fixed by getting
rid of the assumption that each line in the text storage has the same
length (as returned by NSString), but this is a rather big code
Thanks to Tony and Tim for educating me on the finer points of Unicode... :-)
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
- << Previous post in topic Next post in topic >>