Loading ...
Sorry, an error occurred while loading the content.

2396Re: Vim on OS X, (no)macatsui problem

Expand Messages
  • Tony Mechelynck
    Oct 16, 2007
    • 0 Attachment
      Kenneth Beesley wrote:
      > Hi Bjôrn,
      > Many thanks for the message.
      > Yeah, the term Character is a technical term in Unicode, and each
      > Unicode character has a code point value that ranges from 0x0 to
      > 0x10FFFF.
      > In the original vision of Unicode, code point values ranged from 0x0
      > to 0xFFFF, allowing just 64k distinct characters. This old limited
      > range
      > is now known as the Basic Multilingual Plane (BMP). The current
      > vision of Unicode, now 10 years old, allows about a million characters,
      > and the characters with code point values beyond 0xFFFF are known
      > as supplementary characters.
      > Many software applications still haven't caught up with supplementary
      > characters. They're still stuck in the BMP.
      > In Java, there is a type called "char" that has 16 bits and so can
      > represent any code point value in the BMP, 0x0 to 0xFFFF. It is
      > important
      > not to confuse "char" with the Unicode notion of Character. In Java,
      > to store a supplementary Unicode character, two "chars" are used, in a
      > coding system known as UTF-16. It sounds like MacVim has a similar
      > storage system, and that the length-in-chars is being confused with
      > the length-in-Unicode-characters.
      > Best wishes,
      > Ken

      Vim doesn't use UTF-16 internally, because the many intervening nulls would
      wreak havoc with the C requirement of null-terminated strings. If you set
      'encoding' to UCS-4, UTF-16 or UTF-32 (of any endianness), Vim will actually
      use UTF-8 internally, because 0x00 in UTF-8 is the NULL character (codepoint
      U+0000), nothing else, and Vim already knows how to handle that.

      When you set 'fileencoding' to UTF-16, the internal UTF-8 representation of
      the text will be converted to and from UTF-16 when writing or reading
      (respectively), using surrogate pairs for any codepoint above U+FFFF, so that,
      _on disk_, they take two UTF-16 words rather than one.

      I don't know what function you used to count characters, but the Vim
      string-length function, strlen(), gives a string's length in _bytes_ in the
      current internal representation: for Unicode, "a" (U+0061) is one, "é"
      (e-acute, U+00E9) is two, "†" (dagger, U+2020) is three and any Deseret
      character is four. (Under ":help strlen()" you can see how to count
      "characters" in a string, as opposed to "bytes".)

      > On 13 Oct 2007, at 12:45, björn wrote:
      >>>> He also reports that mapping numbers `:map 3 ...` doesn't work. I
      >>>> can't reproduce this.
      >>> I got this one wrong. See the other thread for Kenneth's
      >>> clarification. Sorry.
      >> Hi Ken,
      >> I have looked into why MacVim fails to render the deseret glyphs and I
      >> now have an answer, but unfortunately no solution.
      >> The problem is that one deseret character for some reason takes up
      >> _two_ characters when put in the text storage (I guess this have
      >> something to do with Unicode?). Specifically, calling "length" on an
      >> NSString containing one deseret character returns 2 instead of 1, as I
      >> would expect.
      >> Now, I do know how to fix this problem, but since Jiang is working on
      >> moving his drawing code to MacVim I don't really want to spend any
      >> time doing this, since the problem will disappear as soon as he is
      >> finished. I'm sorry about that.
      >> /Björn

      Best regards,
      During a grouse hunt in North Carolina two intrepid sportsmen
      were blasting away at a clump of trees near a stone wall. Suddenly a
      red-faced country squire popped his head over the wall and shouted,
      "Hey, you almost hit my wife."
      "Did I?" cried the hunter, aghast. "Terribly sorry. Have a
      shot at mine, over there."

      You received this message from the "vim_multibyte" maillist.
      For more information, visit http://www.vim.org/maillist.php
    • Show all 17 messages in this topic