Loading ...
Sorry, an error occurred while loading the content.

Re: About Unicode CJK Unified Extension B

Expand Messages
  • Bram Moolenaar
    ... I don t have anything to say about this. I m not aware of restrictions in the code to 16 bit characters, but the GTK code is complex and full of hacks (to
    Message 1 of 16 , Feb 28, 2006
    • 0 Attachment
      Tony Mechelynck wrote:

      > Edward G.J. Lee wrote:
      > [...]
      > > My MingLiU(Ver 3.21 and Ver 5.03) are not MS UCS4 encoding font.
      > > They only have glyphs in BMP. But I try to test sursong.ttf(
      > > Simsun (Founder Extended)), Sun-ExtA/Sun-ExtB[1] and Han Nom font[2]
      > > still cannot display.
      > >
      > > So my guess, this is not font's problem.
      > >
      > > Thanks for the help.
      > >
      > > Edward
      > > [1] http://okuc.net/software/UniFonts.exe
      > > [2] http://vietunicode.sourceforge.net/fonts/fonts_hannom.html
      >
      >
      > Here, all CJK fonts that I have, whether Korean, Japanese, Traditional
      > Chinese or Simplified Chinese, all display (in gvim) double-wide blue
      > question marks for any ideograms outside the BMP. Let's wait and see
      > what Bram has to say about it.

      I don't have anything to say about this. I'm not aware of restrictions
      in the code to 16 bit characters, but the GTK code is complex and full
      of hacks (to be able to use proportinally spaced fonts, to work around
      bugs in pango, etc.). It requires an expert to look into this.

      It might be that other applications use font replacement to display
      characters that aren't actually in the font. I'm not sure what happens
      for Vim.

      --
      A)bort, R)etry, D)o it right this time

      /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
      /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
      \\\ download, build and distribute -- http://www.A-A-P.org ///
      \\\ help me help AIDS victims -- http://www.ICCF.nl ///
    • A. J. Mechelynck
      ... It s not only GTK: I get the same symptoms on Windows: any CJK character above U+FFFF is shown in gvim (using the default highlights and syntax set to
      Message 2 of 16 , Feb 28, 2006
      • 0 Attachment
        Bram Moolenaar wrote:
        > Tony Mechelynck wrote:
        >
        >> Edward G.J. Lee wrote:
        >> [...]
        >>> My MingLiU(Ver 3.21 and Ver 5.03) are not MS UCS4 encoding font.
        >>> They only have glyphs in BMP. But I try to test sursong.ttf(
        >>> Simsun (Founder Extended)), Sun-ExtA/Sun-ExtB[1] and Han Nom font[2]
        >>> still cannot display.
        >>>
        >>> So my guess, this is not font's problem.
        >>>
        >>> Thanks for the help.
        >>>
        >>> Edward
        >>> [1] http://okuc.net/software/UniFonts.exe
        >>> [2] http://vietunicode.sourceforge.net/fonts/fonts_hannom.html
        >>
        >> Here, all CJK fonts that I have, whether Korean, Japanese, Traditional
        >> Chinese or Simplified Chinese, all display (in gvim) double-wide blue
        >> question marks for any ideograms outside the BMP. Let's wait and see
        >> what Bram has to say about it.
        >
        > I don't have anything to say about this. I'm not aware of restrictions
        > in the code to 16 bit characters, but the GTK code is complex and full
        > of hacks (to be able to use proportinally spaced fonts, to work around
        > bugs in pango, etc.). It requires an expert to look into this.
        >
        > It might be that other applications use font replacement to display
        > characters that aren't actually in the font. I'm not sure what happens
        > for Vim.
        >

        It's not only GTK: I get the same symptoms on Windows: any CJK character
        above U+FFFF is shown in gvim (using the default highlights and 'syntax'
        set to something nonexistent, e.g., ":set syntax=nononono") as a
        double-wide _blue_ question mark in any CJK font. Characters not in the
        font but below U+FFFF are displayed in a font-specific way, e.g. as a
        double-wide space in MingLiU (a Traditional Chinese font) or in NSimSun
        (a Simplified Chinese font), as a bullet in MsGothic (a Japanese font),
        as a kind of small "carpenter's square" in GulimChe (a Korean font), etc.

        In Courier_New, which is not an East-Asian font, I see hollow squares
        occupying the left half of a double-wide character cell, highlighted in
        black below U+FFFF, in blue above it.

        -- The range used outside the base plane for CJK ideograms is at U+20000
        to U+2FA1D. Most of these codepoints are defined but I haven't checked
        them all. So if you want to try and reproduce this, just hit (e.g.)
        ^VU20000 ^VU20001 (etc.) ^VU2FA1D (in Insert mode in a [NoName] buffer).


        Best regards,
        Tony.
      • Bram Moolenaar
        ... That is to be expected, Vim only supports 16 bit characters for Win32. MS-Windows has the lousy UTF-16 solution for the rest, that hasn t been implemented
        Message 3 of 16 , Feb 28, 2006
        • 0 Attachment
          Tony Mechelynck wrote:

          > It's not only GTK: I get the same symptoms on Windows: any CJK character
          > above U+FFFF is shown in gvim (using the default highlights and 'syntax'
          > set to something nonexistent, e.g., ":set syntax=nononono") as a
          > double-wide _blue_ question mark in any CJK font. Characters not in the
          > font but below U+FFFF are displayed in a font-specific way, e.g. as a
          > double-wide space in MingLiU (a Traditional Chinese font) or in NSimSun
          > (a Simplified Chinese font), as a bullet in MsGothic (a Japanese font),
          > as a kind of small "carpenter's square" in GulimChe (a Korean font), etc.

          That is to be expected, Vim only supports 16 bit characters for Win32.
          MS-Windows has the lousy UTF-16 solution for the rest, that hasn't been
          implemented yet. I expect this to get very messy...

          --
          hundred-and-one symptoms of being an internet addict:
          3. Your bookmark takes 15 minutes to scroll from top to bottom.

          /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
          /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
          \\\ download, build and distribute -- http://www.A-A-P.org ///
          \\\ help me help AIDS victims -- http://www.ICCF.nl ///
        • Edward G.J. Lee
          Hello Bram, ... At least under GNU/Linux or *BSD box, the console vim(not GUI) should display beyond U+FFFF characters correctly in UTF-8 terminal with full
          Message 4 of 16 , Feb 28, 2006
          • 0 Attachment
            Hello Bram,

            On Tue, Feb 28, 2006, Bram Moolenaar wrote:
            >
            > That is to be expected, Vim only supports 16 bit characters for Win32.
            > MS-Windows has the lousy UTF-16 solution for the rest, that hasn't been
            > implemented yet. I expect this to get very messy...

            At least under GNU/Linux or *BSD box, the console vim(not GUI)
            should display beyond U+FFFF characters correctly in UTF-8
            terminal with full Unicode support installed font of X. Am
            I right?

            My problem is it can't.

            Do you have any idea? Thanks.



            Edward
          • Bram Moolenaar
            ... Oh, I forgot something. The structures used for the screen are limited to 16 bit, because there were no fonts for other characters. If you say that you
            Message 5 of 16 , Feb 28, 2006
            • 0 Attachment
              Edward G.J. Lee wrote:

              > On Tue, Feb 28, 2006, Bram Moolenaar wrote:
              > >
              > > That is to be expected, Vim only supports 16 bit characters for Win32.
              > > MS-Windows has the lousy UTF-16 solution for the rest, that hasn't been
              > > implemented yet. I expect this to get very messy...
              >
              > At least under GNU/Linux or *BSD box, the console vim(not GUI)
              > should display beyond U+FFFF characters correctly in UTF-8
              > terminal with full Unicode support installed font of X. Am
              > I right?
              >
              > My problem is it can't.
              >
              > Do you have any idea? Thanks.

              Oh, I forgot something. The structures used for the screen are limited
              to 16 bit, because there were no fonts for other characters. If you say
              that you can actually display characters above 0x10000 I'll have to
              change that.

              Do we need three or four bytes? We'll probably need to use four bytes
              anyway, since there is no data type for three bytes.

              Since using these characters is rare, I'll probably have to make it a
              configuration option to avoid wasting memory. There also still is a
              todo item to support more than 2 combining characters. We may end up
              using 20 bytes per screen position.... The number of combining
              characters could be an option, but doing that for the number of bytes
              per character would be complicated. That probably has to be a feature,
              thus decided at compile time.

              --
              hundred-and-one symptoms of being an internet addict:
              8. You spend half of the plane trip with your laptop on your lap...and your
              child in the overhead compartment.

              /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
              /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
              \\\ download, build and distribute -- http://www.A-A-P.org ///
              \\\ help me help AIDS victims -- http://www.ICCF.nl ///
            • Edward G.J. Lee
              Dear Bram, ... Yes, I can display U+20000..U+2A6DF correctly in my gnome-terminal. I have a simple Ruby script to generate all those characters,
              Message 6 of 16 , Mar 1 1:06 AM
              • 0 Attachment
                Dear Bram,

                On Tue, Feb 28, 2006, Bram Moolenaar wrote:

                > Oh, I forgot something. The structures used for the screen are limited
                > to 16 bit, because there were no fonts for other characters. If you say
                > that you can actually display characters above 0x10000 I'll have to
                > change that.

                Yes, I can display U+20000..U+2A6DF correctly in my gnome-terminal.
                I have a simple Ruby script to generate all those characters,

                http://edt1023.sayya.org/ruby/u.rb
                http://edt1023.sayya.org/ruby/tmp/cjkextb.png

                > Do we need three or four bytes? We'll probably need to use four bytes
                > anyway, since there is no data type for three bytes.

                We need four bytes, I think? We need cover the Unicode range from
                0x10000 to 0x10FFFF.

                > Since using these characters is rare, I'll probably have to make it a
                > configuration option to avoid wasting memory. There also still is a
                > todo item to support more than 2 combining characters. We may end up
                > using 20 bytes per screen position.... The number of combining
                > characters could be an option, but doing that for the number of bytes
                > per character would be complicated. That probably has to be a feature,
                > thus decided at compile time.

                I have to admit that those characters are rare used in an ordinary
                artcile. But the problem is people's name in CJKV area, especial
                Chinese people. They may use characters in Unicode CJKV Unified
                Extension B, and I have to type the name correct.

                And I'm makeing an input table of XIM in Chinese, as you may know,
                the table need include completely all the character in Extension B.
                So I need a familiar editor to type those characters and its keys.

                The another example is LaTeX CJK. The cvs version of LaTeX CJK had
                full support of Unicode range now, and I need to edit the example
                for testing,

                http://edt1023.sayya.org/tex/tmp/nobmp2.tex
                http://edt1023.sayya.org/tex/tmp/nobmp2.pdf

                So, it's great to support CJKV Unified Extension B as an option of
                Vim. Thanks in advance.



                Edward
              • Nikolai Weibull
                ... We need ceil(log2(0x10FFFF)) = 21 bits, or, more realistically, 24 bits, or, even more realistically, 32 bits. I don t think we need to worry about memory
                Message 7 of 16 , Mar 1 1:28 PM
                • 0 Attachment
                  On 3/1/06, Edward G.J. Lee <edt1023@...> wrote:
                  > We need four bytes, I think? We need cover the Unicode range from
                  > 0x10000 to 0x10FFFF.

                  We need ceil(log2(0x10FFFF)) = 21 bits, or, more realistically, 24
                  bits, or, even more realistically, 32 bits. I don't think we need to
                  worry about memory consumption for the display of characters though.
                  At least on any modern system. Perhaps the MS-DOS port needs special
                  treatment...

                  nikolai
                • Bram Moolenaar
                  I have made changes to the code to use 32 bits for storing Unicode characters. It s included in last nights snapshot. I have no way to try it out. It s not
                  Message 8 of 16 , Mar 6 1:21 AM
                  • 0 Attachment
                    I have made changes to the code to use 32 bits for storing Unicode
                    characters. It's included in last nights snapshot.

                    I have no way to try it out. It's not unlikely that there are a few
                    problems.

                    For Win32 I changed the conversion from UTF-8 to UCS-2 to produce
                    UTF-16. I don't know if that is sufficient for drawing the characters.

                    GTK2 does everything with UTF-8, thus it should work as it is.

                    I also added 'maxcombine' to support up to 6 combining characters.
                    That's enough for everyone, right?

                    --
                    hundred-and-one symptoms of being an internet addict:
                    51. You put a pillow case over your laptop so your lover doesn't see it while
                    you are pretending to catch your breath.

                    /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                    /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
                    \\\ download, build and distribute -- http://www.A-A-P.org ///
                    \\\ help me help AIDS victims -- http://www.ICCF.nl ///
                  Your message has been successfully submitted and would be delivered to recipients shortly.