Loading ...
Sorry, an error occurred while loading the content.

Suggestion: Redefine \Uxxxxx in double-quoted strings

Expand Messages
  • Tony Mechelynck
    Vim is now capable of displaying any Unicode codepoint for which the installed guifont has a glyph, even outside the BMP (i.e., even above U+FFFF), but
    Message 1 of 6 , Apr 6 11:22 AM
    • 0 Attachment
      Vim is now capable of displaying any Unicode codepoint for which the
      installed 'guifont' has a glyph, even outside the BMP (i.e., even above
      U+FFFF), but there's no easy way to represent those "high" codepoints by
      Unicode value in strings: I mean, "\uxxxx" and \Uxxxx" still accept no
      more than four hex digits.

      I propose to keep "\uxxxx" at its present meaning, but extend
      "\Uxxxxxxxx" to allow additional hex digits (either up to a total of 8
      hex digits, in line with ^VUxxxxxxxx as opposed to ^Vuxxxx in Insert
      mode, or at least up to the value \U10FFFF, above which the Unicode
      Consortium has decided that "there never shall be a valid Unicode
      codepoint at any future time".

      I'm aware that this is an "incompatible" change, but I believe the risk
      is low compared with the advantages (as a sidenote, many rare CJK
      characters lie in plane 2, in the "CJK Unified Extension B" range
      U+20000-U+2A6DF).

      The notation "\<Char-0x20000>" or "\<Char-131072>" doesn't work: here
      (in my GTK2/Gnome2 gvim with 'encoding' set to UTF-8), ":echo"ing such a
      string displays <f0><a0><80><fe>X<80><fe>X instead of just the one CJK
      character 𠀀 (and, yes, I've set my mailer to send this post as UTF-8 so
      if yours is "well-behaved" it should display that character properly).


      Best regards,
      Tony.
      --
      Although the moon is smaller than the earth, it is farther away.

      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_multibyte" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---
    • Bram Moolenaar
      ... It does cause problems for something like U12345 which would now be the character 0x1234 followed by the character 5. After the change it would become
      Message 2 of 6 , Apr 6 1:15 PM
      • 0 Attachment
        Tony Mechelynck wrote:

        > Vim is now capable of displaying any Unicode codepoint for which the
        > installed 'guifont' has a glyph, even outside the BMP (i.e., even above
        > U+FFFF), but there's no easy way to represent those "high" codepoints by
        > Unicode value in strings: I mean, "\uxxxx" and \Uxxxx" still accept no
        > more than four hex digits.
        >
        > I propose to keep "\uxxxx" at its present meaning, but extend
        > "\Uxxxxxxxx" to allow additional hex digits (either up to a total of 8
        > hex digits, in line with ^VUxxxxxxxx as opposed to ^Vuxxxx in Insert
        > mode, or at least up to the value \U10FFFF, above which the Unicode
        > Consortium has decided that "there never shall be a valid Unicode
        > codepoint at any future time".
        >
        > I'm aware that this is an "incompatible" change, but I believe the risk
        > is low compared with the advantages (as a sidenote, many rare CJK
        > characters lie in plane 2, in the "CJK Unified Extension B" range
        > U+20000-U+2A6DF).
        >
        > The notation "\<Char-0x20000>" or "\<Char-131072>" doesn't work: here
        > (in my GTK2/Gnome2 gvim with 'encoding' set to UTF-8), ":echo"ing such a
        > string displays <f0><a0><80><fe>X<80><fe>X instead of just the one CJK
        > character 𠀀 (and, yes, I've set my mailer to send this post as UTF-8 so
        > if yours is "well-behaved" it should display that character properly).

        It does cause problems for something like "\U12345" which would now be
        the character 0x1234 followed by the character 5. After the change it
        would become one character 0x12345.

        I don't see a convenient alternative though. Anyone?

        --
        Even got a Datapoint 3600(?) with a DD50 connector instead of the
        usual DB25... what a nightmare trying to figure out the pinout
        for *that* with no spex...

        /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
        /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
        \\\ download, build and distribute -- http://www.A-A-P.org ///
        \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

        --~--~---------~--~----~------------~-------~--~----~
        You received this message from the "vim_multibyte" maillist.
        For more information, visit http://www.vim.org/maillist.php
        -~----------~----~----~----~------~----~------~--~---
      • Kenneth Reid Beesley
        ... Tony, Good news. Many may not know that MacVim has been doing this rather well for quite a while. I routinely edit texts in Deseret Alphabet and Shaw
        Message 3 of 6 , Apr 6 1:18 PM
        • 0 Attachment
          On 6 Apr 2009, at 12:22, Tony Mechelynck wrote:

          >
          > Vim is now capable of displaying any Unicode codepoint for which the
          > installed 'guifont' has a glyph, even outside the BMP (i.e., even
          > above
          > U+FFFF),

          Tony,

          Good news.

          Many may not know that MacVim has been doing this rather well for
          quite a while.
          I routinely edit texts in Deseret Alphabet and Shaw (Shavian)
          Alphabet, which lie in the
          supplementary area.


          > but there's no easy way to represent those "high" codepoints by
          > Unicode value in strings: I mean, "\uxxxx" and \Uxxxx" still accept no
          > more than four hex digits.
          >
          > I propose to keep "\uxxxx" at its present meaning, but extend
          > "\Uxxxxxxxx" to allow additional hex digits (either up to a total of 8
          > hex digits, in line with ^VUxxxxxxxx as opposed to ^Vuxxxx in Insert
          > mode, or at least up to the value \U10FFFF,

          Sounds good.

          \Uxxxxxxxx is also the Python convention for representing
          supplementary characters in strings.
          I think it requires exactly 8 hex digits, just as \uxxxx requires
          exactly four, but I'm willing to be
          corrected.

          The other reasonable convention is the Perl-like \x{x...}, (the prefix
          \x is literally backslash,
          small X) which, being delimited with curly braces, can contain any
          number of hex digits
          without confusing the tokenization. But your proposal is more in line
          with what Vim has
          already.

          >
          >
          > I'm aware that this is an "incompatible" change, but I believe the
          > risk
          > is low compared with the advantages

          For what it's worth, I agree.

          > The notation "\<Char-0x20000>" or "\<Char-131072>" doesn't work: here
          > (in my GTK2/Gnome2 gvim with 'encoding' set to UTF-8), ":echo"ing
          > such a
          > string displays <f0><a0><80><fe>X<80><fe>X instead of just the one CJK
          > character 𠀀 (and, yes, I've set my mailer to send this post as
          > UTF-8 so
          > if yours is "well-behaved" it should display that character properly).

          In MacVim, at least, supplementary code point values can appear
          usefully in <Char- > in keymap files.
          Entries like the following appear in my deseret-sampa_utf-8.vim keymap
          file. It all works great.

          "in out comment
          i <Char-0x10428> DESERET SMALL LETTER LONG I (e.g. i in
          machine)
          e <Char-0x10429> DESERET SMALL LETTER LONG E (e.g. a in make)
          A <Char-0x1042A> DESERET SMALL LETTER LONG A (e.g. a in father)
          O <Char-0x1042B> DESERET SMALL LETTER LONG AH (e.g. a in call,
          au in caught, British/USEastCoastCity pronunciation)
          o <Char-0x1042C> DESERET SMALL LETTER LONG O (e.g. oa in boat)
          u <Char-0x1042D> DESERET SMALL LETTER LONG OO (e.g. oo in boot)

          Thanks to all those developers who have toiled to handle Unicode in Vim.

          Ken

          ******************************
          Kenneth R. Beesley, D.Phil.
          P.O. Box 540475
          North Salt Lake, UT
          84054 USA






          --~--~---------~--~----~------------~-------~--~----~
          You received this message from the "vim_multibyte" maillist.
          For more information, visit http://www.vim.org/maillist.php
          -~----------~----~----~----~------~----~------~--~---
        • Tony Mechelynck
          ... [...] It s actually patch 7.1.116 (30-Nov-2007). So no news-breaking scoop anymore, but as long as Vim s support of Unicode outside the BMP was less than
          Message 4 of 6 , Apr 6 2:19 PM
          • 0 Attachment
            On 06/04/09 22:18, Kenneth Reid Beesley wrote:
            >
            >
            > On 6 Apr 2009, at 12:22, Tony Mechelynck wrote:
            >
            >>
            >> Vim is now capable of displaying any Unicode codepoint for which the
            >> installed 'guifont' has a glyph, even outside the BMP (i.e., even
            >> above
            >> U+FFFF),
            >
            > Tony,
            >
            > Good news.
            >
            > Many may not know that MacVim has been doing this rather well for
            > quite a while.
            > I routinely edit texts in Deseret Alphabet and Shaw (Shavian)
            > Alphabet, which lie in the
            > supplementary area.
            [...]

            It's actually patch 7.1.116 (30-Nov-2007). So no news-breaking scoop
            anymore, but as long as Vim's support of Unicode outside the BMP was
            less than optimal, the problem I'm raising in this thread might have
            made itself felt less acutely.


            Best regards,
            Tony.
            --
            Joe's sister puts spaghetti in her shoes!

            --~--~---------~--~----~------------~-------~--~----~
            You received this message from the "vim_multibyte" maillist.
            For more information, visit http://www.vim.org/maillist.php
            -~----------~----~----~----~------~----~------~--~---
          • Tony Mechelynck
            On 06/04/09 22:18, Kenneth Reid Beesley wrote: [...] ... [...] In keymap files, it seems to work on Linux too (I use it in my owncoded phonetic keymaps for
            Message 5 of 6 , Apr 6 2:38 PM
            • 0 Attachment
              On 06/04/09 22:18, Kenneth Reid Beesley wrote:
              [...]
              > In MacVim, at least, supplementary code point values can appear
              > usefully in<Char-> in keymap files.
              > Entries like the following appear in my deseret-sampa_utf-8.vim keymap
              > file. It all works great.
              [...]

              In keymap files, it seems to work on Linux too (I use it in my owncoded
              "phonetic" keymaps for Arabic and Russian); but I was talking of
              double-quoted strings.

              These Arabic and Russian keymaps aren't above U+FFFF but anywhere above
              0x7F the <Char- > notation gives me problems inside double-quoted
              strings. I believe this is related to the documented fact that "\xnn"
              doesn't give valid UTF-8 values above 0x7F -- use "\u00nn" instead.


              Best regards,
              Tony.
              --
              If God is perfect, why did He create discontinuous functions?

              --~--~---------~--~----~------------~-------~--~----~
              You received this message from the "vim_multibyte" maillist.
              For more information, visit http://www.vim.org/maillist.php
              -~----------~----~----~----~------~----~------~--~---
            • John (Eljay) Love-Jensen
              Hi Tony, ... /Uxxxx /uxxxx /U{x} /U{xx} /U{xxx} /U{xxxx} /U{xxxxx} /U{xxxxxx} /U{xxxxxxx} /U{xxxxxxxx} --Eljay
              Message 6 of 6 , Apr 7 4:30 AM
              • 0 Attachment
                Re: Suggestion: Redefine \Uxxxxx in double-quoted strings Hi Tony,

                > I don't see a convenient alternative though.  Anyone?

                /Uxxxx
                /uxxxx
                /U{x}
                /U{xx}
                /U{xxx}
                /U{xxxx}
                /U{xxxxx}
                /U{xxxxxx}
                /U{xxxxxxx}
                /U{xxxxxxxx}

                --Eljay

                --~--~---------~--~----~------------~-------~--~----~
                You received this message from the "vim_multibyte" maillist.
                For more information, visit http://www.vim.org/maillist.php
                -~----------~----~----~----~------~----~------~--~---

              Your message has been successfully submitted and would be delivered to recipients shortly.