Loading ...
Sorry, an error occurred while loading the content.

Re: Handling Unicode codepoints outside the BMP

Expand Messages
  • nicolasweber@gmx.de
    Sorry to awake this old thread...there was a newer thread requesting correct display of unicode chars = 0x10000, but I can t find it, so I reply to this.
    Message 1 of 5 , Sep 10, 2007
    • 0 Attachment
      Sorry to awake this old thread...there was a newer thread requesting
      correct display of unicode chars >= 0x10000, but I can't find it, so I
      reply to this. Since this thread is so old, I'm quoting the whole
      thread below. Sorry if this offends you in some way ;-)

      I can confirm that the Mac OS X versions of vim (carbon gvim and
      console vim) display characters outside of the BMP correctly (carbon
      vim has some drawing issues, but it has those as well when it displays
      BMP double-width characters as well. If we get this going, I could
      look into that as well). I can test on win32 and ubuntu (gtk version)
      as well if there's interest.

      Here's a diff (which is not supposed to become an official
      patch! ;-) ) that enables displaying of characters outside of the BMP:

      Index: screen.c
      ===================================================================
      --- screen.c (revision 475)
      +++ screen.c (working copy)
      @@ -2305,9 +2305,9 @@
      prev_c = u8c;
      #endif
      /* Non-BMP character: display as ? or fullwidth ?. */
      - if (u8c >= 0x10000)
      - ScreenLinesUC[idx] = (cells == 2) ? 0xff1f : (int)'?';
      - else
      + //if (u8c >= 0x10000)
      + // ScreenLinesUC[idx] = (cells == 2) ? 0xff1f : (int)'?';
      + //else
      ScreenLinesUC[idx] = u8c;
      for (i = 0; i < Screen_mco; ++i)
      {
      @@ -3678,25 +3678,25 @@
      if ((mb_l == 1 && c >= 0x80)
      || (mb_l >= 1 && mb_c == 0)
      || (mb_l > 1 && (!vim_isprintc(mb_c)
      - || mb_c >= 0x10000)))
      + /* || mb_c >= 0x10000 */)))
      {
      /*
      * Illegal UTF-8 byte: display as <xx>.
      * Non-BMP character : display as ? or fullwidth ?.
      */
      - if (mb_c < 0x10000)
      - {
      + //if (mb_c < 0x10000)
      + //{
      transchar_hex(extra, mb_c);
      # ifdef FEAT_RIGHTLEFT
      if (wp->w_p_rl) /* reverse */
      rl_mirror(extra);
      # endif
      - }
      - else if (utf_char2cells(mb_c) != 2)
      - STRCPY(extra, "?");
      - else
      - /* 0xff1f in UTF-8: full-width '?' */
      - STRCPY(extra, "\357\274\237");
      + //}
      + //else if (utf_char2cells(mb_c) != 2)
      + // STRCPY(extra, "?");
      + //else
      + // /* 0xff1f in UTF-8: full-width '?' */
      + // STRCPY(extra, "\357\274\237");

      p_extra = extra;
      c = *p_extra;
      @@ -6229,12 +6229,12 @@
      u8c = utfc_ptr2char(ptr, u8cc);
      mbyte_cells = utf_char2cells(u8c);
      /* Non-BMP character: display as ? or fullwidth ?. */
      - if (u8c >= 0x10000)
      - {
      - u8c = (mbyte_cells == 2) ? 0xff1f : (int)'?';
      - if (attr == 0)
      - attr = hl_attr(HLF_8);
      - }
      + //if (u8c >= 0x10000)
      + //{
      + // u8c = (mbyte_cells == 2) ? 0xff1f : (int)'?';
      + // if (attr == 0)
      + // attr = hl_attr(HLF_8);
      + //}
      # ifdef FEAT_ARABIC
      if (p_arshape && !p_tbidi && ARABIC_CHAR(u8c))
      {

      On Jul 29, 4:03 pm, Tony Mechelynck <antoine.mechely...@...>
      wrote:
      > Bram Moolenaar wrote:
      > > Tony Mechelynck wrote:
      >
      > >> Talking of todo lists, is the following item on it?
      >
      > >> - In the GUI, when using a multi-byte 'encoding' use glyphs for any codepoints
      > >> present in the 'guifont', even for Unicode codepoints above U+FFFF.
      >
      > >> Rationale: Many CJK codepoints are inplane2 (U+20000 to U+2FFFF).
      > >> Representing all of them indiscriminately by double-wide question marks makes
      > >> editing East-Asian text files difficult.
      >
      > >> Hm... There seems to be something similar at todo.txt (2007-Jul-22) line 720:
      >
      > >> 8 When 'encoding' is "utf-8", should use 'guifont' for both normal and wide
      > >> characters to make Asian languages work. Win32 fonts contain both
      > >> type of characters.
      >
      > >> This is already implemented (at least in the GTK2 GUI) for wide and narrow
      > >> characters in the BMP (Basic MultilingualPlane, U+0000 to U+FFFF). It remains
      > >> to be implemented for characters outside the BMP. Many TTF fonts installed on
      > >> my Linux system (and which I got from Novell/SuSE but also from various other
      > >> royalty-free sources) include glyphs both inside and outside the BMP, as can
      > >> be seen by displaying the page in Firefox with an appropriate CSS sheet. The
      > >> latest fonts which I installed are particularly complete if not extremely
      > >> pretty: "HAN NOM A" and "HAN NOM B", both from
      > >>http://sourceforge.net/project/showfiles.php?group_id=153105&package_...
      > >> -- I got the tip fromhttp://en.wikipedia.org/wiki/GB18030#Glyphs
      >
      > > It is very unlikely I will work on this myself. Thus we will have to
      > > wait for someone to work on this.
      >
      > OK. Anyone wants to take up the challenge? I have done some preliminary
      > detective work. The problem is in function gui_mch_draw_string() which is
      > different for each GUI flavour, as follows, checking for a test on (c >=
      > 0x10000). Line numbers below are for the latest (7.1.043) sources.
      >
      > gui_gtk_x11.c:6064 if >= U+10000, replace by question mark
      > gui_mac.c no special handling
      > gui_photon.c no special handling
      > gui_riscos.c no special handling
      > gui_w16.c no special handling
      > gui_w32.c:2343 if >= U+10000, convert to UTF-16 surrogate pair
      > gui_x11.c:2565 if >= U+10000, replace by question mark (1)
      > gui_x11.c:2572 if >= U+10000, replace by question mark (2)
      >
      > (1) with FEAT_XFONTSET, if current_fontset != NULL
      > (2) otherwise
      >
      > It seems that the Win32 GUI converts UTF-8 to UTF-16 with surrogates (thus, I
      > suppose, displaying characters correctly even outside the BMP) and that X11
      > GUIs (GTK, and non-GTK-non-Photon) explicitly replace anything outside the BMP
      > by a wide question mark (which I call the "faulty" handling). For Mac, Photon,
      > RiscOS and W16, either the OS handles it transparently or the problem cannot
      > occur (+multi_byte not possible?), I don't know.
      >
      > I don't feel up to modifying this platform-dependent code, but I hope that
      > with the above, someone will be willing. Any takers?
      >
      > Best regards,
      > Tony.
      > --
      > The temperature of Heaven can be rather accurately computed. Our
      > authority is Isaiah 30:26, "Moreover, the light of the Moon shall be as
      > the light of the Sun and the light of the Sun shall be sevenfold, as
      > the light of seven days." Thus Heaven receives from the Moon as much
      > radiation as we do from the Sun, and in addition 7*7 (49) times as much
      > as the Earth does from the Sun, or 50 times in all. The light we
      > receive from the Moon is one 1/10,000 of the light we receive from the
      > Sun, so we can ignore that ... The radiation falling on Heaven will
      > heat it to the point where the heat lost by radiation is just equal to
      > the heat received by radiation, i.e., Heaven loses 50 times as much
      > heat as the Earth by radiation. Using the Stefan-Boltzmann law for
      > radiation, (H/E)^4 = 50, where E is the absolute temperature of the
      > earth (300K), gives H as 798K (525C). The exact temperature of Hell
      > cannot be computed ... [However] Revelations 21:8 says "But the
      > fearful, and unbelieving ... shall have their part in the lake which
      > burneth with fire and brimstone." A lake of molten brimstone means
      > that its temperature must be at or below the boiling point, 444.6C. We
      > have, then, that Heaven, at 525C is hotter than Hell at 445C.
      > -- From "Applied Optics" vol. 11, A14, 1972


      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_mac" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---
    • Bram Moolenaar
      ... Thanks for looking into this. Looks good. Except that a check for UNICODE16 is needed, if that is defined then we really can use only 16 bits. -- GUARD
      Message 2 of 5 , Sep 10, 2007
      • 0 Attachment
        Nicolas Weber wrote:

        > Sorry to awake this old thread...there was a newer thread requesting
        > correct display of unicode chars >= 0x10000, but I can't find it, so I
        > reply to this. Since this thread is so old, I'm quoting the whole
        > thread below. Sorry if this offends you in some way ;-)
        >
        > I can confirm that the Mac OS X versions of vim (carbon gvim and
        > console vim) display characters outside of the BMP correctly (carbon
        > vim has some drawing issues, but it has those as well when it displays
        > BMP double-width characters as well. If we get this going, I could
        > look into that as well). I can test on win32 and ubuntu (gtk version)
        > as well if there's interest.
        >
        > Here's a diff (which is not supposed to become an official
        > patch! ;-) ) that enables displaying of characters outside of the BMP:

        Thanks for looking into this. Looks good. Except that a check for
        UNICODE16 is needed, if that is defined then we really can use only 16
        bits.

        --
        GUARD #2: Wait a minute -- supposing two swallows carried it together?
        GUARD #1: No, they'd have to have it on a line.
        GUARD #2: Well, simple! They'd just use a standard creeper!
        GUARD #1: What, held under the dorsal guiding feathers?
        GUARD #2: Well, why not?
        The Quest for the Holy Grail (Monty Python)

        /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
        /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
        \\\ download, build and distribute -- http://www.A-A-P.org ///
        \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

        --~--~---------~--~----~------------~-------~--~----~
        You received this message from the "vim_mac" maillist.
        For more information, visit http://www.vim.org/maillist.php
        -~----------~----~----~----~------~----~------~--~---
      • Bram Moolenaar
        ... Yes. -- MORTICIAN: What? CUSTOMER: Nothing -- here s your nine pence. DEAD PERSON: I m not dead! MORTICIAN: Here -- he says he s not dead!
        Message 3 of 5 , Sep 11, 2007
        • 0 Attachment
          Edward L. Fox wrote:

          > On 9/11/07, Bram Moolenaar <Bram@...> wrote:
          > > [...]
          > > Thanks for looking into this. Looks good. Except that a check for
          > > UNICODE16 is needed, if that is defined then we really can use only 16
          > > bits.
          >
          > Do you mean this?

          Yes.

          --
          MORTICIAN: What?
          CUSTOMER: Nothing -- here's your nine pence.
          DEAD PERSON: I'm not dead!
          MORTICIAN: Here -- he says he's not dead!
          CUSTOMER: Yes, he is.
          DEAD PERSON: I'm not!
          The Quest for the Holy Grail (Monty Python)

          /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
          /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
          \\\ download, build and distribute -- http://www.A-A-P.org ///
          \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

          --~--~---------~--~----~------------~-------~--~----~
          You received this message from the "vim_mac" maillist.
          For more information, visit http://www.vim.org/maillist.php
          -~----------~----~----~----~------~----~------~--~---
        • Bram Moolenaar
          ... Only two bytes are put in buf[], thus more needs to be changed here to make it work. ... Here too. -- Shit makes the flowers grow and that s beautiful ///
          Message 4 of 5 , Sep 11, 2007
          • 0 Attachment
            Edward L. Fox wrote:

            > On 9/11/07, Edward L. Fox <edyfox@...> wrote:
            > > Hi Bram,
            > >
            > > On 9/11/07, Bram Moolenaar <Bram@...> wrote:
            > > > [...]
            > > > Thanks for looking into this. Looks good. Except that a check for
            > > > UNICODE16 is needed, if that is defined then we really can use only 16
            > > > bits.
            >
            > Hmmm... It seems that more files need to be patched:
            >
            > Index: gui_gtk_x11.c
            > ===================================================================
            > --- gui_gtk_x11.c (revision 513)
            > +++ gui_gtk_x11.c (working copy)
            > @@ -6070,8 +6070,10 @@
            > if (enc_utf8)
            > {
            > c = utf_ptr2char(p);
            > +#ifdef UNICODE16
            > if (c >= 0x10000) /* show chars > 0xffff as ? */
            > c = 0xbf;
            > +#endif
            > buf[textlen].byte1 = c >> 8;
            > buf[textlen].byte2 = c;
            > p += utf_ptr2len(p);

            Only two bytes are put in buf[], thus more needs to be changed here to
            make it work.

            > Index: gui_x11.c
            > ===================================================================
            > --- gui_x11.c (revision 513)
            > +++ gui_x11.c (working copy)
            > @@ -2562,15 +2562,19 @@
            > # ifdef FEAT_XFONTSET
            > if (current_fontset != NULL)
            > {
            > - if (c >= 0x10000 && sizeof(wchar_t) <= 2)
            > +#ifdef UNICODE16
            > + if (c >= 0x10000)
            > c = 0xbf; /* show chars > 0xffff as ? */
            > +#endif
            > ((wchar_t *)buf)[wlen] = c;
            > }
            > else
            > # endif
            > {
            > +#ifdef UNICODE16
            > if (c >= 0x10000)
            > c = 0xbf; /* show chars > 0xffff as ? */
            > +#endif
            > ((XChar2b *)buf)[wlen].byte1 = (unsigned)c >> 8;
            > ((XChar2b *)buf)[wlen].byte2 = c;
            > }

            Here too.

            --
            Shit makes the flowers grow and that's beautiful

            /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
            /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
            \\\ download, build and distribute -- http://www.A-A-P.org ///
            \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

            --~--~---------~--~----~------------~-------~--~----~
            You received this message from the "vim_mac" maillist.
            For more information, visit http://www.vim.org/maillist.php
            -~----------~----~----~----~------~----~------~--~---
          • Bram Moolenaar
            ... Ehm, are you just guessing here? You better find a way to test it. I don t want to include this without testing. I doubt X11 uses UTF-16, that is
            Message 5 of 5 , Sep 11, 2007
            • 0 Attachment
              Edward L. Fox wrote:

              > On 9/11/07, Bram Moolenaar <Bram@...> wrote:
              > > [...]
              > >
              > > Only two bytes are put in buf[], thus more needs to be changed here to
              > > make it work.
              > >
              > > [...]
              > >
              > > Here too.
              > > [...]
              >
              > I don't know how to deal with the characters outside BMP for XChar2b.
              > I just assume that it is UTF-16. Please help me check this patch:

              Ehm, are you just guessing here? You better find a way to test it.
              I don't want to include this without testing.

              I doubt X11 uses UTF-16, that is something that MS-Windows uses. GTK
              uses utf-8 in most places. Other X11 toolkits probably differ, since
              they are older.

              --
              DEAD PERSON: I'm getting better!
              CUSTOMER: No, you're not -- you'll be stone dead in a moment.
              MORTICIAN: Oh, I can't take him like that -- it's against regulations.
              The Quest for the Holy Grail (Monty Python)

              /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
              /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
              \\\ download, build and distribute -- http://www.A-A-P.org ///
              \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

              --~--~---------~--~----~------------~-------~--~----~
              You received this message from the "vim_mac" maillist.
              For more information, visit http://www.vim.org/maillist.php
              -~----------~----~----~----~------~----~------~--~---
            Your message has been successfully submitted and would be delivered to recipients shortly.