Loading ...
Sorry, an error occurred while loading the content.

2601Re: Suggestion: Redefine \Uxxxxx in double-quoted strings

Expand Messages
  • Kenneth Reid Beesley
    Apr 6, 2009
    • 0 Attachment
      On 6 Apr 2009, at 12:22, Tony Mechelynck wrote:

      > Vim is now capable of displaying any Unicode codepoint for which the
      > installed 'guifont' has a glyph, even outside the BMP (i.e., even
      > above
      > U+FFFF),


      Good news.

      Many may not know that MacVim has been doing this rather well for
      quite a while.
      I routinely edit texts in Deseret Alphabet and Shaw (Shavian)
      Alphabet, which lie in the
      supplementary area.

      > but there's no easy way to represent those "high" codepoints by
      > Unicode value in strings: I mean, "\uxxxx" and \Uxxxx" still accept no
      > more than four hex digits.
      > I propose to keep "\uxxxx" at its present meaning, but extend
      > "\Uxxxxxxxx" to allow additional hex digits (either up to a total of 8
      > hex digits, in line with ^VUxxxxxxxx as opposed to ^Vuxxxx in Insert
      > mode, or at least up to the value \U10FFFF,

      Sounds good.

      \Uxxxxxxxx is also the Python convention for representing
      supplementary characters in strings.
      I think it requires exactly 8 hex digits, just as \uxxxx requires
      exactly four, but I'm willing to be

      The other reasonable convention is the Perl-like \x{x...}, (the prefix
      \x is literally backslash,
      small X) which, being delimited with curly braces, can contain any
      number of hex digits
      without confusing the tokenization. But your proposal is more in line
      with what Vim has

      > I'm aware that this is an "incompatible" change, but I believe the
      > risk
      > is low compared with the advantages

      For what it's worth, I agree.

      > The notation "\<Char-0x20000>" or "\<Char-131072>" doesn't work: here
      > (in my GTK2/Gnome2 gvim with 'encoding' set to UTF-8), ":echo"ing
      > such a
      > string displays <f0><a0><80><fe>X<80><fe>X instead of just the one CJK
      > character 𠀀 (and, yes, I've set my mailer to send this post as
      > UTF-8 so
      > if yours is "well-behaved" it should display that character properly).

      In MacVim, at least, supplementary code point values can appear
      usefully in <Char- > in keymap files.
      Entries like the following appear in my deseret-sampa_utf-8.vim keymap
      file. It all works great.

      "in out comment
      i <Char-0x10428> DESERET SMALL LETTER LONG I (e.g. i in
      e <Char-0x10429> DESERET SMALL LETTER LONG E (e.g. a in make)
      A <Char-0x1042A> DESERET SMALL LETTER LONG A (e.g. a in father)
      O <Char-0x1042B> DESERET SMALL LETTER LONG AH (e.g. a in call,
      au in caught, British/USEastCoastCity pronunciation)
      o <Char-0x1042C> DESERET SMALL LETTER LONG O (e.g. oa in boat)
      u <Char-0x1042D> DESERET SMALL LETTER LONG OO (e.g. oo in boot)

      Thanks to all those developers who have toiled to handle Unicode in Vim.


      Kenneth R. Beesley, D.Phil.
      P.O. Box 540475
      North Salt Lake, UT
      84054 USA

      You received this message from the "vim_multibyte" maillist.
      For more information, visit http://www.vim.org/maillist.php
    • Show all 6 messages in this topic