Loading ...
Sorry, an error occurred while loading the content.

2669Re: Multibyte bugs

Expand Messages
  • Tony Mechelynck
    Apr 10, 2010
      On 10/04/10 23:43, Bram Moolenaar wrote:
      >
      > Tony Mechelynck wrote:
      >
      >> 1. (Minor bug): On this system (gvim 7.2.411, Huge version with
      >> GTK2-GNOME GUI), typing Ctrl-K in Insert mode followed by two spaces
      >> doesn't give the expected result: instead of U+00A0 ("Alt-space", the
      >> non-breaking space) I get U+E000, a CJK character. Ctrl-K NS works
      >> correctly.
      >
      > Why do you expect CTRL-K<space> <space> to produce 0xa0? According to
      > http://www.faqs.org/rfcs/rfc1345.html it's 0xe000.

      Because of the following paragraph at lines 99-100 of digraph.txt:

      ----8<----
      > For CTRL-K, there is one general digraph: CTRL-K <Space> {char} will enter
      > {char} with the highest bit set. You can use this to enter meta-characters.
      ---->8----

      When {char} is 0x20 i.e. <Space>, the above tells me that CTRL-K <Space>
      <Space> gives 0xA0 i.e. the non-breaking space, which is useful to enter
      the "meta-character" Meta-Space if I don't remember the NS digraph. If
      U+E000 is a "private use" character, I don't see why it needs a digraph
      of its own anyway.

      On reading that RFC, which states in its beginning paragraph that it has
      no normative value whatsoever, I see (at the very end of section 3)
      quite a number of digraphs and trigraphs assigned to U+E000 to U+E028,
      in what Unicode calls a "private use area": see for instance the very
      start of http://www.unicode.org/charts/pdf/UE000.pdf:

      ----8<----
      Private Use Area
      Range: E000–F8FF
      The Private Use Area does not contain any character assignments,
      consequently no character code charts or namelists are provided for this
      area.
      ---->8----

      At least some of the characters listed there in the RFC have a different
      Unicode codepoint assigned to them, but maybe Unicode assigned them
      after the RFC (dated June 1992) was published. Personally I have strong
      doubts as to the usefulness of any Vim digraph for a "private use"
      character. U+E000 is listed as "indicates unfinished (Mnemonic)". I'm
      not sure what that means, unless maybe that a blank space in a charset
      chart (further down in the same RFC) indicates that the chart is unfinished?

      >
      >> 2. U+E000 is displayed in gvim as CJK halfwidth. Shouldn't it be fullwidth?
      >
      > Why would it be a double-width character? In
      > http://unicode.org/Public/UNIDATA/EastAsianWidth.txt it's marked as
      > "private use".

      Ah, I see. FWIW my usual 'guifont' has a glyph for it, which AFAICT is a
      fullwidth CJK glyph. OTOH the Unihan database does not mention it.

      >
      >> 3. "\<Char-nnnn>" gives wrong results for some Unicode codepoints.
      [...]
      >
      > The form "\<xxx>" is for special keys, not characters. For the character
      > itself use \x or \u or \U. See ":help expr-string".
      > The special keys are escaped for use in a mapping.

      The example given at |expr-string| is "\<C-W>" which is the "<control>"
      character defined by ASCII as 0x17 ("\x17") and by Unicode as U+0017
      ("\u0017"), not a "special" non-ASCII key like <F8>, <Home> or
      <PageDown>. I had always thought that _every_ <> name could be used in a
      double-quoted string with a backslash prefix, and indeed I have verified
      that it works for all the <Char-nnnn> or <Char-0xnnnn> that I tested
      _except_ those whose UTF-8 expansion includes either or both of the
      bytes 0x80 and 0x9B, in which case two spurious bytes are inserted
      immediately after every occurrence of a 0x80 or 0x9B byte.

      If this bug is WONTFIX, I suggest to mention explicitly at the bottom of
      the list under |expr-quote| that the \<xxx> form does not apply if xxx
      is Char-nnnn or Char-0xnnnn.


      Best regards,
      Tony.
      --
      "To whoever finds this note -
      I have been imprisoned by my father who wishes me to marry
      against my will. Please please please please come and rescue me.
      I am in the tall tower of Swamp Castle."
      SIR LAUNCELOT's eyes light up with holy inspiration.
      "Monty Python and the Holy Grail" PYTHON (MONTY)
      PICTURES LTD

      --
      You received this message from the "vim_multibyte" maillist.
      For more information, visit http://www.vim.org/maillist.php
    • Show all 8 messages in this topic