Loading ...
Sorry, an error occurred while loading the content.

2669Re: Multibyte bugs

Expand Messages
  • Tony Mechelynck
    Apr 10, 2010
      On 10/04/10 23:43, Bram Moolenaar wrote:
      > Tony Mechelynck wrote:
      >> 1. (Minor bug): On this system (gvim 7.2.411, Huge version with
      >> GTK2-GNOME GUI), typing Ctrl-K in Insert mode followed by two spaces
      >> doesn't give the expected result: instead of U+00A0 ("Alt-space", the
      >> non-breaking space) I get U+E000, a CJK character. Ctrl-K NS works
      >> correctly.
      > Why do you expect CTRL-K<space> <space> to produce 0xa0? According to
      > http://www.faqs.org/rfcs/rfc1345.html it's 0xe000.

      Because of the following paragraph at lines 99-100 of digraph.txt:

      > For CTRL-K, there is one general digraph: CTRL-K <Space> {char} will enter
      > {char} with the highest bit set. You can use this to enter meta-characters.

      When {char} is 0x20 i.e. <Space>, the above tells me that CTRL-K <Space>
      <Space> gives 0xA0 i.e. the non-breaking space, which is useful to enter
      the "meta-character" Meta-Space if I don't remember the NS digraph. If
      U+E000 is a "private use" character, I don't see why it needs a digraph
      of its own anyway.

      On reading that RFC, which states in its beginning paragraph that it has
      no normative value whatsoever, I see (at the very end of section 3)
      quite a number of digraphs and trigraphs assigned to U+E000 to U+E028,
      in what Unicode calls a "private use area": see for instance the very
      start of http://www.unicode.org/charts/pdf/UE000.pdf:

      Private Use Area
      Range: E000–F8FF
      The Private Use Area does not contain any character assignments,
      consequently no character code charts or namelists are provided for this

      At least some of the characters listed there in the RFC have a different
      Unicode codepoint assigned to them, but maybe Unicode assigned them
      after the RFC (dated June 1992) was published. Personally I have strong
      doubts as to the usefulness of any Vim digraph for a "private use"
      character. U+E000 is listed as "indicates unfinished (Mnemonic)". I'm
      not sure what that means, unless maybe that a blank space in a charset
      chart (further down in the same RFC) indicates that the chart is unfinished?

      >> 2. U+E000 is displayed in gvim as CJK halfwidth. Shouldn't it be fullwidth?
      > Why would it be a double-width character? In
      > http://unicode.org/Public/UNIDATA/EastAsianWidth.txt it's marked as
      > "private use".

      Ah, I see. FWIW my usual 'guifont' has a glyph for it, which AFAICT is a
      fullwidth CJK glyph. OTOH the Unihan database does not mention it.

      >> 3. "\<Char-nnnn>" gives wrong results for some Unicode codepoints.
      > The form "\<xxx>" is for special keys, not characters. For the character
      > itself use \x or \u or \U. See ":help expr-string".
      > The special keys are escaped for use in a mapping.

      The example given at |expr-string| is "\<C-W>" which is the "<control>"
      character defined by ASCII as 0x17 ("\x17") and by Unicode as U+0017
      ("\u0017"), not a "special" non-ASCII key like <F8>, <Home> or
      <PageDown>. I had always thought that _every_ <> name could be used in a
      double-quoted string with a backslash prefix, and indeed I have verified
      that it works for all the <Char-nnnn> or <Char-0xnnnn> that I tested
      _except_ those whose UTF-8 expansion includes either or both of the
      bytes 0x80 and 0x9B, in which case two spurious bytes are inserted
      immediately after every occurrence of a 0x80 or 0x9B byte.

      If this bug is WONTFIX, I suggest to mention explicitly at the bottom of
      the list under |expr-quote| that the \<xxx> form does not apply if xxx
      is Char-nnnn or Char-0xnnnn.

      Best regards,
      "To whoever finds this note -
      I have been imprisoned by my father who wishes me to marry
      against my will. Please please please please come and rescue me.
      I am in the tall tower of Swamp Castle."
      SIR LAUNCELOT's eyes light up with holy inspiration.
      "Monty Python and the Holy Grail" PYTHON (MONTY)

      You received this message from the "vim_multibyte" maillist.
      For more information, visit http://www.vim.org/maillist.php
    • Show all 8 messages in this topic