Loading ...
Sorry, an error occurred while loading the content.
 

Re: Suggestion: Redefine \Uxxxxx in double-quoted strings

Expand Messages
  • Kenneth Reid Beesley
    ... Tony, Good news. Many may not know that MacVim has been doing this rather well for quite a while. I routinely edit texts in Deseret Alphabet and Shaw
    Message 1 of 6 , Apr 6, 2009
      On 6 Apr 2009, at 12:22, Tony Mechelynck wrote:

      >
      > Vim is now capable of displaying any Unicode codepoint for which the
      > installed 'guifont' has a glyph, even outside the BMP (i.e., even
      > above
      > U+FFFF),

      Tony,

      Good news.

      Many may not know that MacVim has been doing this rather well for
      quite a while.
      I routinely edit texts in Deseret Alphabet and Shaw (Shavian)
      Alphabet, which lie in the
      supplementary area.


      > but there's no easy way to represent those "high" codepoints by
      > Unicode value in strings: I mean, "\uxxxx" and \Uxxxx" still accept no
      > more than four hex digits.
      >
      > I propose to keep "\uxxxx" at its present meaning, but extend
      > "\Uxxxxxxxx" to allow additional hex digits (either up to a total of 8
      > hex digits, in line with ^VUxxxxxxxx as opposed to ^Vuxxxx in Insert
      > mode, or at least up to the value \U10FFFF,

      Sounds good.

      \Uxxxxxxxx is also the Python convention for representing
      supplementary characters in strings.
      I think it requires exactly 8 hex digits, just as \uxxxx requires
      exactly four, but I'm willing to be
      corrected.

      The other reasonable convention is the Perl-like \x{x...}, (the prefix
      \x is literally backslash,
      small X) which, being delimited with curly braces, can contain any
      number of hex digits
      without confusing the tokenization. But your proposal is more in line
      with what Vim has
      already.

      >
      >
      > I'm aware that this is an "incompatible" change, but I believe the
      > risk
      > is low compared with the advantages

      For what it's worth, I agree.

      > The notation "\<Char-0x20000>" or "\<Char-131072>" doesn't work: here
      > (in my GTK2/Gnome2 gvim with 'encoding' set to UTF-8), ":echo"ing
      > such a
      > string displays <f0><a0><80><fe>X<80><fe>X instead of just the one CJK
      > character 𠀀 (and, yes, I've set my mailer to send this post as
      > UTF-8 so
      > if yours is "well-behaved" it should display that character properly).

      In MacVim, at least, supplementary code point values can appear
      usefully in <Char- > in keymap files.
      Entries like the following appear in my deseret-sampa_utf-8.vim keymap
      file. It all works great.

      "in out comment
      i <Char-0x10428> DESERET SMALL LETTER LONG I (e.g. i in
      machine)
      e <Char-0x10429> DESERET SMALL LETTER LONG E (e.g. a in make)
      A <Char-0x1042A> DESERET SMALL LETTER LONG A (e.g. a in father)
      O <Char-0x1042B> DESERET SMALL LETTER LONG AH (e.g. a in call,
      au in caught, British/USEastCoastCity pronunciation)
      o <Char-0x1042C> DESERET SMALL LETTER LONG O (e.g. oa in boat)
      u <Char-0x1042D> DESERET SMALL LETTER LONG OO (e.g. oo in boot)

      Thanks to all those developers who have toiled to handle Unicode in Vim.

      Ken

      ******************************
      Kenneth R. Beesley, D.Phil.
      P.O. Box 540475
      North Salt Lake, UT
      84054 USA






      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_multibyte" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---
    • Tony Mechelynck
      ... [...] It s actually patch 7.1.116 (30-Nov-2007). So no news-breaking scoop anymore, but as long as Vim s support of Unicode outside the BMP was less than
      Message 2 of 6 , Apr 6, 2009
        On 06/04/09 22:18, Kenneth Reid Beesley wrote:
        >
        >
        > On 6 Apr 2009, at 12:22, Tony Mechelynck wrote:
        >
        >>
        >> Vim is now capable of displaying any Unicode codepoint for which the
        >> installed 'guifont' has a glyph, even outside the BMP (i.e., even
        >> above
        >> U+FFFF),
        >
        > Tony,
        >
        > Good news.
        >
        > Many may not know that MacVim has been doing this rather well for
        > quite a while.
        > I routinely edit texts in Deseret Alphabet and Shaw (Shavian)
        > Alphabet, which lie in the
        > supplementary area.
        [...]

        It's actually patch 7.1.116 (30-Nov-2007). So no news-breaking scoop
        anymore, but as long as Vim's support of Unicode outside the BMP was
        less than optimal, the problem I'm raising in this thread might have
        made itself felt less acutely.


        Best regards,
        Tony.
        --
        Joe's sister puts spaghetti in her shoes!

        --~--~---------~--~----~------------~-------~--~----~
        You received this message from the "vim_multibyte" maillist.
        For more information, visit http://www.vim.org/maillist.php
        -~----------~----~----~----~------~----~------~--~---
      • Tony Mechelynck
        On 06/04/09 22:18, Kenneth Reid Beesley wrote: [...] ... [...] In keymap files, it seems to work on Linux too (I use it in my owncoded phonetic keymaps for
        Message 3 of 6 , Apr 6, 2009
          On 06/04/09 22:18, Kenneth Reid Beesley wrote:
          [...]
          > In MacVim, at least, supplementary code point values can appear
          > usefully in<Char-> in keymap files.
          > Entries like the following appear in my deseret-sampa_utf-8.vim keymap
          > file. It all works great.
          [...]

          In keymap files, it seems to work on Linux too (I use it in my owncoded
          "phonetic" keymaps for Arabic and Russian); but I was talking of
          double-quoted strings.

          These Arabic and Russian keymaps aren't above U+FFFF but anywhere above
          0x7F the <Char- > notation gives me problems inside double-quoted
          strings. I believe this is related to the documented fact that "\xnn"
          doesn't give valid UTF-8 values above 0x7F -- use "\u00nn" instead.


          Best regards,
          Tony.
          --
          If God is perfect, why did He create discontinuous functions?

          --~--~---------~--~----~------------~-------~--~----~
          You received this message from the "vim_multibyte" maillist.
          For more information, visit http://www.vim.org/maillist.php
          -~----------~----~----~----~------~----~------~--~---
        • John (Eljay) Love-Jensen
          Hi Tony, ... /Uxxxx /uxxxx /U{x} /U{xx} /U{xxx} /U{xxxx} /U{xxxxx} /U{xxxxxx} /U{xxxxxxx} /U{xxxxxxxx} --Eljay
          Message 4 of 6 , Apr 7, 2009
            Re: Suggestion: Redefine \Uxxxxx in double-quoted strings Hi Tony,

            > I don't see a convenient alternative though.  Anyone?

            /Uxxxx
            /uxxxx
            /U{x}
            /U{xx}
            /U{xxx}
            /U{xxxx}
            /U{xxxxx}
            /U{xxxxxx}
            /U{xxxxxxx}
            /U{xxxxxxxx}

            --Eljay

            --~--~---------~--~----~------------~-------~--~----~
            You received this message from the "vim_multibyte" maillist.
            For more information, visit http://www.vim.org/maillist.php
            -~----------~----~----~----~------~----~------~--~---

          Your message has been successfully submitted and would be delivered to recipients shortly.