Loading ...
Sorry, an error occurred while loading the content.

TODO suggestion: Unicode codepoints above U+FFFF

Expand Messages
  • Tony Mechelynck
    I suggest adding a todo item to make gvim display Unicode codepoints above U+FFFF as other than a question mark. Probably not with high priority (I guess 3
    Message 1 of 4 , Jul 27, 2005
    • 0 Attachment
      I suggest adding a "todo" item to make gvim display Unicode codepoints above
      U+FFFF as other than a question mark. Probably not with high priority (I
      guess 3 to 5 would be adequate, unless CJK users prefer something higher).

      Rationale: There are already some printable characters assigned to these
      codepoints (including for instance some rare CJK characters). The way I see
      it, sooner or later someone is going to make fonts for them (if it's not
      already done), and sooner or later someone is going to use gvim to edit
      files containing them. It can already be done, but it's not practical: gvim
      displays only a question mark, and "ga" is required to ascertain which
      character is actually there.

      (I searched todo.txt dated 2005 Jul 25 for 7.00aa using the command
      "/unicode\|utf" with ":set ignorecase smartcase" and got no relevant hits.)


      Best regards,
      Tony
    • Bram Moolenaar
      ... Most of Vim can handle Unicode characters above 0xffff. The code already recognizes characters 0x20000 to 0x2fffd as double-width. It s the displaying
      Message 2 of 4 , Jul 28, 2005
      • 0 Attachment
        Tony Mechelynck wrote:

        > I suggest adding a "todo" item to make gvim display Unicode codepoints
        > above U+FFFF as other than a question mark. Probably not with high
        > priority (I guess 3 to 5 would be adequate, unless CJK users prefer
        > something higher).
        >
        > Rationale: There are already some printable characters assigned to
        > these codepoints (including for instance some rare CJK characters).
        > The way I see it, sooner or later someone is going to make fonts for
        > them (if it's not already done), and sooner or later someone is going
        > to use gvim to edit files containing them. It can already be done, but
        > it's not practical: gvim displays only a question mark, and "ga" is
        > required to ascertain which character is actually there.
        >
        > (I searched todo.txt dated 2005 Jul 25 for 7.00aa using the command
        > "/unicode\|utf" with ":set ignorecase smartcase" and got no relevant
        > hits.)

        Most of Vim can handle Unicode characters above 0xffff. The code
        already recognizes characters 0x20000 to 0x2fffd as double-width. It's
        the displaying code that has some trouble. Esp. for Win32, since it
        uses UTF-16, which is very clumsy.

        The display code can be adjusted as soon as there is a font to try out
        if it actually works. It should already work for GTK 2 without any
        changes.

        I rather see that Microsoft supports UTF-8, but that probably won't
        happen...

        --
        SOLDIER: Where did you get the coconuts?
        ARTHUR: Through ... We found them.
        SOLDIER: Found them? In Mercea. The coconut's tropical!
        "Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

        /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
        /// Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
        \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
        \\\ Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html ///
      • Mike Williams
        ... Has any tested with a UTF-16 surrogate pair? If so and it failed you most likely have to start rootling around the font file to find alternate encoding
        Message 3 of 4 , Jul 28, 2005
        • 0 Attachment
          Bram Moolenaar did utter on 28/07/2005 10:53:
          > Tony Mechelynck wrote:
          >
          >
          >>I suggest adding a "todo" item to make gvim display Unicode codepoints
          >>above U+FFFF as other than a question mark. Probably not with high
          >>priority (I guess 3 to 5 would be adequate, unless CJK users prefer
          >>something higher).
          >>
          >>Rationale: There are already some printable characters assigned to
          >>these codepoints (including for instance some rare CJK characters).
          >>The way I see it, sooner or later someone is going to make fonts for
          >>them (if it's not already done), and sooner or later someone is going
          >>to use gvim to edit files containing them. It can already be done, but
          >>it's not practical: gvim displays only a question mark, and "ga" is
          >>required to ascertain which character is actually there.
          >>
          >>(I searched todo.txt dated 2005 Jul 25 for 7.00aa using the command
          >>"/unicode\|utf" with ":set ignorecase smartcase" and got no relevant
          >>hits.)
          >
          >
          > Most of Vim can handle Unicode characters above 0xffff. The code
          > already recognizes characters 0x20000 to 0x2fffd as double-width. It's
          > the displaying code that has some trouble. Esp. for Win32, since it
          > uses UTF-16, which is very clumsy.
          >
          > The display code can be adjusted as soon as there is a font to try out
          > if it actually works. It should already work for GTK 2 without any
          > changes.
          >
          > I rather see that Microsoft supports UTF-8, but that probably won't
          > happen...

          Has any tested with a UTF-16 surrogate pair? If so and it failed you
          most likely have to start rootling around the font file to find
          alternate encoding maps, extract the glyph id and render using that
          rather than the encoded character. TT and/or OT fonts can support
          encodings beyond the BMP.

          TTFN

          Mike
          --
          No matter how far you've gone down the wrong road, turn back.
        • Tony Mechelynck
          ... From: Bram Moolenaar To: Tony Mechelynck Cc: Sent: Thursday, July 28, 2005 11:53
          Message 4 of 4 , Jul 28, 2005
          • 0 Attachment
            ----- Original Message -----
            From: "Bram Moolenaar" <Bram@...>
            To: "Tony Mechelynck" <antoine.mechelynck@...>
            Cc: <vim-dev@...>
            Sent: Thursday, July 28, 2005 11:53 AM
            Subject: Re: TODO suggestion: Unicode codepoints above U+FFFF
            [...]

            > Most of Vim can handle Unicode characters above 0xffff. The code
            > already recognizes characters 0x20000 to 0x2fffd as double-width. It's
            > the displaying code that has some trouble. Esp. for Win32, since it
            > uses UTF-16, which is very clumsy.
            >
            > The display code can be adjusted as soon as there is a font to try out
            > if it actually works. It should already work for GTK 2 without any
            > changes.

            In the absence of a font, how hard would it be to display (without the
            quotes) "<123456>" rather than the present "?" -- or maybe an option similar
            to 'isprint' would be in order to force <hex> display of user-defined
            character ranges (for which the user knows that he hasn't got proper
            glyphs)? 'isnoprint' maybe, which would apply only to multibyte character
            ranges above what 'isprint' handles?

            Anyway, if _anyone_ on this list knows of a CJK font for Windows which
            includes all CJK characters, also the "extensions" above and below the basic
            range, please speak up. I have a UTF-8 file -- based on the Unicode
            Consortium's Unihan.txt, which means I can use it privately but not
            distribute it :-( -- with which to test it. At the moment I mainly use
            MingLiU (a "Traditional Chinese" font) for Chinese, but it hasn't got the
            CJK extensions.
            >
            > I rather see that Microsoft supports UTF-8, but that probably won't
            > happen...

            Some programs, like WordPad or NT-series Notepad, can read UTF-8 files if
            they have a BOM. Notepad (but not the 9x version) can even write UTF-8 (with
            BOM). (I don't know how they represent the data internally though. I suspect
            UTF-16le.) Microsoft is the tortoise to Vim's <strike>hare</strike> jet
            rocket, to be sure, but I think we shouldn't despair.


            Best regards,
            Tony.
          Your message has been successfully submitted and would be delivered to recipients shortly.