Loading ...
Sorry, an error occurred while loading the content.

2383Re: Vim on OS X, (no)macatsui problem

Expand Messages
  • Tony Mechelynck
    Oct 13, 2007
    • 0 Attachment
      björn wrote:
      >>> He also reports that mapping numbers `:map 3 ...` doesn't work. I
      >>> can't reproduce this.
      >> I got this one wrong. See the other thread for Kenneth's
      >> clarification. Sorry.
      >
      > Hi Ken,
      >
      > I have looked into why MacVim fails to render the deseret glyphs and I
      > now have an answer, but unfortunately no solution.
      >
      > The problem is that one deseret character for some reason takes up
      > _two_ characters when put in the text storage (I guess this have
      > something to do with Unicode?). Specifically, calling "length" on an
      > NSString containing one deseret character returns 2 instead of 1, as I
      > would expect.
      >
      > Now, I do know how to fix this problem, but since Jiang is working on
      > moving his drawing code to MacVim I don't really want to spend any
      > time doing this, since the problem will disappear as soon as he is
      > finished. I'm sorry about that.
      >
      >
      > /Björn

      UTF-8 uses:
      1 byte for each codepoint in the range U+0000 - U+007F
      2 bytes for each codepoint in the range U+0080 - U+07FF
      3 bytes for each codepoint in the range U+0800 - U+FFFF
      4 bytes for each codepoint in the range U+10000 - U+1FFFFF
      Actually, current standards mandate that no codepoints higher than U+10FFFD
      will "ever" be used. (Vim supports up to U+3FFFFFFF, with up to 6 bytes per
      codepoint, following an earlier draft of the standard.)

      Unicode also has the notion of "composing characters", which are characters
      which are "superimposed" on the preceding character, possibly changing its
      shape. These are usually diacritics: most of the accents of Latin can be
      either precomposed or spacing-non-accented + composing-accent, but the
      optional vowel marks of Hebrew and Arabic exist only as composing characters.

      Since your Deseret characters are outside the BMP, each of them requires 4
      bytes in UTF-8 (also two 16-bit words in UTF-16 and one 32-bit doubleword in
      UTF-32); but maybe that's not what your measured "length" means? Does your
      NSString include a final null (as C strings do) or an initial bytecount (as
      Pascal strings do)? Or do your Deseret characters include "composing" elements?


      Best regards,
      Tony.
      --
      hundred-and-one symptoms of being an internet addict:
      55. You ask your doctor to implant a gig in your brain.

      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_multibyte" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---
    • Show all 17 messages in this topic