2396Re: Vim on OS X, (no)macatsui problem
- Oct 16, 2007Kenneth Beesley wrote:
> Hi Bjôrn,Vim doesn't use UTF-16 internally, because the many intervening nulls would
> Many thanks for the message.
> Yeah, the term Character is a technical term in Unicode, and each
> Unicode character has a code point value that ranges from 0x0 to
> In the original vision of Unicode, code point values ranged from 0x0
> to 0xFFFF, allowing just 64k distinct characters. This old limited
> is now known as the Basic Multilingual Plane (BMP). The current
> vision of Unicode, now 10 years old, allows about a million characters,
> and the characters with code point values beyond 0xFFFF are known
> as supplementary characters.
> Many software applications still haven't caught up with supplementary
> characters. They're still stuck in the BMP.
> In Java, there is a type called "char" that has 16 bits and so can
> represent any code point value in the BMP, 0x0 to 0xFFFF. It is
> not to confuse "char" with the Unicode notion of Character. In Java,
> to store a supplementary Unicode character, two "chars" are used, in a
> coding system known as UTF-16. It sounds like MacVim has a similar
> storage system, and that the length-in-chars is being confused with
> the length-in-Unicode-characters.
> Best wishes,
wreak havoc with the C requirement of null-terminated strings. If you set
'encoding' to UCS-4, UTF-16 or UTF-32 (of any endianness), Vim will actually
use UTF-8 internally, because 0x00 in UTF-8 is the NULL character (codepoint
U+0000), nothing else, and Vim already knows how to handle that.
When you set 'fileencoding' to UTF-16, the internal UTF-8 representation of
the text will be converted to and from UTF-16 when writing or reading
(respectively), using surrogate pairs for any codepoint above U+FFFF, so that,
_on disk_, they take two UTF-16 words rather than one.
I don't know what function you used to count characters, but the Vim
string-length function, strlen(), gives a string's length in _bytes_ in the
current internal representation: for Unicode, "a" (U+0061) is one, "é"
(e-acute, U+00E9) is two, "" (dagger, U+2020) is three and any Deseret
character is four. (Under ":help strlen()" you can see how to count
"characters" in a string, as opposed to "bytes".)
> On 13 Oct 2007, at 12:45, björn wrote:
>>>> He also reports that mapping numbers `:map 3 ...` doesn't work. I
>>>> can't reproduce this.
>>> I got this one wrong. See the other thread for Kenneth's
>>> clarification. Sorry.
>> Hi Ken,
>> I have looked into why MacVim fails to render the deseret glyphs and I
>> now have an answer, but unfortunately no solution.
>> The problem is that one deseret character for some reason takes up
>> _two_ characters when put in the text storage (I guess this have
>> something to do with Unicode?). Specifically, calling "length" on an
>> NSString containing one deseret character returns 2 instead of 1, as I
>> would expect.
>> Now, I do know how to fix this problem, but since Jiang is working on
>> moving his drawing code to MacVim I don't really want to spend any
>> time doing this, since the problem will disappear as soon as he is
>> finished. I'm sorry about that.
During a grouse hunt in North Carolina two intrepid sportsmen
were blasting away at a clump of trees near a stone wall. Suddenly a
red-faced country squire popped his head over the wall and shouted,
"Hey, you almost hit my wife."
"Did I?" cried the hunter, aghast. "Terribly sorry. Have a
shot at mine, over there."
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
- << Previous post in topic Next post in topic >>