Loading ...
Sorry, an error occurred while loading the content.

About Unicode CJK Unified Extension B

Expand Messages
  • Edward G.J. Lee
    Dear all, I use Vim as my favorite editor, but I need edit the article that include CJK Unified Ideographs Extension B(U+20000...) in UTF-8 encoding. Is there
    Message 1 of 18 , Feb 27, 2006
    • 0 Attachment
      Dear all,

      I use Vim as my favorite editor, but I need edit the
      article that include CJK Unified Ideographs Extension
      B(U+20000...) in UTF-8 encoding.

      Is there any plan to support that?
      [Note] It can be copy&paste to another editor(gedit)
      correctly, but cannot display correctly in gvim.

      Thanks.


      Edward
    • A. J. Mechelynck
      ... It is already supported by gvim (with encoding set to UTF-8), but to see the CJK characters you need a guifont which has them. If your font hasn t got
      Message 2 of 18 , Feb 27, 2006
      • 0 Attachment
        Edward G.J. Lee wrote:
        > Dear all,
        >
        > I use Vim as my favorite editor, but I need edit the
        > article that include CJK Unified Ideographs Extension
        > B(U+20000...) in UTF-8 encoding.
        >
        > Is there any plan to support that?
        > [Note] It can be copy&paste to another editor(gedit)
        > correctly, but cannot display correctly in gvim.
        >
        > Thanks.
        >
        >
        > Edward
        >
        >
        >

        It is already supported by gvim (with 'encoding' set to UTF-8), but to
        see the CJK characters you need a 'guifont' which has them. If your font
        hasn't got them, everything will work except that the glyphs won't be
        displayed (or will be displayed as hollow boxes or something similar).
        Those "unrecognizable" characters will still occupy one or two screen
        cells depending on their CJK width (i.e., two cells for "full-width"
        ideograms).

        To change the 'guifont', see my tip
        http://vim.sourceforge.net/tips/tip.php?tip_id=632 "Setting the font in
        the GUI", including the remarks at the bottom.

        To determine which Unicode character is under the cursor, use ga in
        Normal mode. Use g8 to see by which bytes it is represented in UTF-8
        encoding.

        To input a Unicode character higher than U+FFFF in Insert/Replace or
        Command-line mode, use Ctrl-V U xxxxxxxx where:

        - Ctrl-V means "hit Ctrl-V, unless you use Ctrl-V to paste, in which
        case you should hit Ctrl-Q"
        - U is an uppercase U as in Uniform
        - xxxxxxxx is the hexadecimal value of the character (00000000 to
        7FFFFFFF). You can input less than 8 hex digits provided that the
        character is followed by a character not in [0-9a-fA-F]. If you want to
        enter a "high Unicode" character followed by a hex digit, you can still
        do it by separating them by <Left><Right>
        - The spaces in Ctrl-V U xxxxxxxx above are only for legibility, you
        should not type them.

        You can also use IM (xim, Windows IME, Windows Global IME) if you have
        it installed and your version of gvim supports it (i.e., is compiled
        with +xim or +multi_byte_ime/dyn); or a keymap, but I fear you will have
        to write your keymap yourself if you want support for Unicode keypoints
        outside the basic plane, i.e., higher than U+FFFF.

        See
        http://vim.sourceforge.net/tips/tip.php?tip_id=632
        :help setting-guifont
        :help ga
        :help g8
        :help i_CTRL-V_digit


        Best regards,
        Tony.
      • Edward G.J. Lee
        ... Here is my .gvimrc setting if has( gui_running ) if has( gui_gtk2 ) set guifont=Andale Mono 13 set guifontwide=DFSongStd 15 elseif has( gui_kde ) [...]
        Message 3 of 18 , Feb 27, 2006
        • 0 Attachment
          On Tue, Feb 28, 2006, A. J. Mechelynck wrote:

          > It is already supported by gvim (with 'encoding' set to UTF-8), but to
          > see the CJK characters you need a 'guifont' which has them. If your font
          > hasn't got them, everything will work except that the glyphs won't be
          > displayed (or will be displayed as hollow boxes or something similar).
          > Those "unrecognizable" characters will still occupy one or two screen
          > cells depending on their CJK width (i.e., two cells for "full-width"
          > ideograms).
          >
          > To change the 'guifont', see my tip
          > http://vim.sourceforge.net/tips/tip.php?tip_id=632 "Setting the font in
          > the GUI", including the remarks at the bottom.

          Here is my .gvimrc setting

          if has("gui_running")
          if has("gui_gtk2")
          set guifont=Andale\ Mono\ 13
          set guifontwide=DFSongStd\ 15
          elseif has("gui_kde")
          [...]

          I'm using gtk+-2 version of Vim7, and I'm sure DFSongStd has
          U+20000 character/glyph. But gvim can't display it.

          I change to use vim on gnome-terminal(UTF-8 locale), but vim can't
          display it either.

          > To input a Unicode character higher than U+FFFF in Insert/Replace or
          > Command-line mode, use Ctrl-V U xxxxxxxx where:

          I know that, but it give only a '0' if I input 20000 after
          Ctrl-V U. It accept four [hex]digit only. Did I miss something?

          BTW, yes I can input no BMP charactes useing Gcin.
          http://www.csie.nctu.edu.tw/~cp76/gcin/

          Thanks.



          Edward
        • A. J. Mechelynck
          ... The guifontwide must be exactly the same height as the guifont , and twice its width. This is not the case here: you have selected a 13-point-high
          Message 4 of 18 , Feb 27, 2006
          • 0 Attachment
            Edward G.J. Lee wrote:
            > On Tue, Feb 28, 2006, A. J. Mechelynck wrote:
            >
            >> It is already supported by gvim (with 'encoding' set to UTF-8), but to
            >> see the CJK characters you need a 'guifont' which has them. If your font
            >> hasn't got them, everything will work except that the glyphs won't be
            >> displayed (or will be displayed as hollow boxes or something similar).
            >> Those "unrecognizable" characters will still occupy one or two screen
            >> cells depending on their CJK width (i.e., two cells for "full-width"
            >> ideograms).
            >>
            >> To change the 'guifont', see my tip
            >> http://vim.sourceforge.net/tips/tip.php?tip_id=632 "Setting the font in
            >> the GUI", including the remarks at the bottom.
            >
            > Here is my .gvimrc setting
            >
            > if has("gui_running")
            > if has("gui_gtk2")
            > set guifont=Andale\ Mono\ 13
            > set guifontwide=DFSongStd\ 15
            > elseif has("gui_kde")
            > [...]

            The 'guifontwide' must be exactly the same height as the 'guifont', and
            twice its width. This is not the case here: you have selected a
            13-point-high 'guifont' but a 15-point-high 'guifontwide'.

            Try
            if has("gui_running")
            set guifontwide=
            if has("gui_gtk2")
            set guifont=DFSongStd\ 15
            elseif has("gui_kde")
            set guifont=DFSongStd/15
            elseif has("x11")
            " I'm not sure to which value to set it, but
            " it will be long. Maybe something like
            " the following (untested)
            exe "set guifont=-*-dfsongstd-medium-r-normal"
            \ . "-*-*-250-*-*-m-*-*"
            else
            set guifont=DFSongStd:h15:cDEFAULT
            endif
            endif

            and see what happens. Can it display "high" Unicode? Can it display
            "base plane" Chinese? Can it display English?

            >
            > I'm using gtk+-2 version of Vim7, and I'm sure DFSongStd has
            > U+20000 character/glyph. But gvim can't display it.
            >
            > I change to use vim on gnome-terminal(UTF-8 locale), but vim can't
            > display it either.
            >
            >> To input a Unicode character higher than U+FFFF in Insert/Replace or
            >> Command-line mode, use Ctrl-V U xxxxxxxx where:
            >
            > I know that, but it give only a '0' if I input 20000 after
            > Ctrl-V U. It accept four [hex]digit only. Did I miss something?

            You must use an uppercase U (i.e., Shift-u), not a lowercase u. See
            ":help i_CTRL-V_digit" again.

            >
            > BTW, yes I can input no BMP charactes useing Gcin.
            > http://www.csie.nctu.edu.tw/~cp76/gcin/
            >
            > Thanks.
            >
            >
            >
            > Edward
            >
            >
            >

            Best regards,
            Tony.
          • Edward G.J. Lee
            Thanks Tony, ... Still cannot display U+20000, it display a question mark. Can it display high Unicode? Do you mean non BMP? no. Can it display base plane
            Message 5 of 18 , Feb 27, 2006
            • 0 Attachment
              Thanks Tony,

              On Tue, Feb 28, 2006, A. J. Mechelynck wrote:
              >
              > The 'guifontwide' must be exactly the same height as the 'guifont', and
              > twice its width. This is not the case here: you have selected a
              > 13-point-high 'guifont' but a 15-point-high 'guifontwide'.
              >
              > Try
              > if has("gui_running")
              > set guifontwide=
              > if has("gui_gtk2")
              > set guifont=DFSongStd\ 15
              > elseif has("gui_kde")
              > set guifont=DFSongStd/15
              > elseif has("x11")
              > " I'm not sure to which value to set it, but
              > " it will be long. Maybe something like
              > " the following (untested)
              > exe "set guifont=-*-dfsongstd-medium-r-normal"
              > \ . "-*-*-250-*-*-m-*-*"
              > else
              > set guifont=DFSongStd:h15:cDEFAULT
              > endif
              > endif
              >
              > and see what happens. Can it display "high" Unicode? Can it display
              > "base plane" Chinese? Can it display English?

              Still cannot display U+20000, it display a question mark.

              Can it display "high" Unicode? Do you mean non BMP? no.
              Can it display "base plane" Chinese? yes.
              Can it display English? yes.

              It can display BMP Unicode only.

              BTW, I can use gedit/leafpad/mined to edit the same file and
              can display U+20000 useing DFSongStd.

              > You must use an uppercase U (i.e., Shift-u), not a lowercase u. See
              > ":help i_CTRL-V_digit" again.

              Ooops, my fault. But still give me a question mark.



              Edward
            • A. J. Mechelynck
              ... [...] Hm. IIUC this means that DFSongStd has got BMP Chinese and ASCII. Which GUI flavour have you got? If it s GTK+2, :help guifontwide-gtk2 says that
              Message 6 of 18 , Feb 27, 2006
              • 0 Attachment
                Edward G.J. Lee wrote:
                > Thanks Tony,
                >
                > On Tue, Feb 28, 2006, A. J. Mechelynck wrote:
                >> The 'guifontwide' must be exactly the same height as the 'guifont', and
                >> twice its width. This is not the case here: you have selected a
                >> 13-point-high 'guifont' but a 15-point-high 'guifontwide'.
                >>
                >> Try
                >> if has("gui_running")
                >> set guifontwide=
                >> if has("gui_gtk2")
                >> set guifont=DFSongStd\ 15
                >> elseif has("gui_kde")
                >> set guifont=DFSongStd/15
                >> elseif has("x11")
                >> " I'm not sure to which value to set it, but
                >> " it will be long. Maybe something like
                >> " the following (untested)
                >> exe "set guifont=-*-dfsongstd-medium-r-normal"
                >> \ . "-*-*-250-*-*-m-*-*"
                >> else
                >> set guifont=DFSongStd:h15:cDEFAULT
                >> endif
                >> endif
                >>
                >> and see what happens. Can it display "high" Unicode? Can it display
                >> "base plane" Chinese? Can it display English?
                >
                > Still cannot display U+20000, it display a question mark.
                >
                > Can it display "high" Unicode? Do you mean non BMP? no.
                > Can it display "base plane" Chinese? yes.
                > Can it display English? yes.
                >
                > It can display BMP Unicode only.
                >
                > BTW, I can use gedit/leafpad/mined to edit the same file and
                > can display U+20000 useing DFSongStd.
                [...]

                Hm. IIUC this means that DFSongStd has got BMP Chinese and ASCII. Which
                GUI flavour have you got? If it's GTK+2, ":help guifontwide-gtk2" says
                that if you leave 'guifontwide' empty, Pango/Xft will choose a character
                in another font for any character not available in your 'guifont'. I
                suspect that gedit etc. do something similar. IIRC I have seen Firefox
                displaying HTML pages with characters borrowed from different fonts.

                Do you have other Traditional Chinese fonts? Under W32, I use MingLiU;
                it displays ideograms from U+20000 to U+2FA1D as double-wide question
                marks in blue (not in black like "ordinary" CJK characters); but
                "unknown" base plane ideograms like, for instance those from U+FA30 to
                U+FAD9 or from U+3400 to U+4DB5, are simply displayed as double-wide
                spaces. Hmmm-mm-mm... maybe we have found a bug or a limitation in gvim.
                In fact, I think I vaguely remember something Bram said some time back.

                Bram, is gvim capable of displaying Unicode codepoints higher than
                U+FFFF as something else than a double-wide question mark in SpecialKey
                highlight? (Assuming that 'encoding' is UTF-8, and that the 'guifont'
                has them) If it isn't, how hard would it be to lift this limitation?


                Best regards,
                Tony.
              • Edward G.J. Lee
                ... Yes, that s what I think, but gvim seems didn t do this correctly. I m useing GTK2 GUI. My Vim version is, VIM - Vi IMproved 7.0aa ALPHA (2006 Feb 21,
                Message 7 of 18 , Feb 27, 2006
                • 0 Attachment
                  On Tue, Feb 28, 2006, A. J. Mechelynck wrote:
                  > Edward G.J. Lee wrote:
                  > >
                  > > Still cannot display U+20000, it display a question mark.
                  > >
                  > > Can it display "high" Unicode? Do you mean non BMP? no.
                  > > Can it display "base plane" Chinese? yes.
                  > > Can it display English? yes.
                  > >
                  > > It can display BMP Unicode only.
                  > >
                  > > BTW, I can use gedit/leafpad/mined to edit the same file and
                  > > can display U+20000 useing DFSongStd.
                  > [...]
                  >
                  > Hm. IIUC this means that DFSongStd has got BMP Chinese and ASCII. Which
                  > GUI flavour have you got? If it's GTK+2, ":help guifontwide-gtk2" says
                  > that if you leave 'guifontwide' empty, Pango/Xft will choose a character
                  > in another font for any character not available in your 'guifont'. I
                  > suspect that gedit etc. do something similar. IIRC I have seen Firefox
                  > displaying HTML pages with characters borrowed from different fonts.

                  Yes, that's what I think, but gvim seems didn't do this correctly.
                  I'm useing GTK2 GUI. My Vim version is,

                  VIM - Vi IMproved 7.0aa ALPHA (2006 Feb 21, compiled Feb 23 2006
                  12:05:43)

                  > Do you have other Traditional Chinese fonts? Under W32, I use MingLiU;
                  > it displays ideograms from U+20000 to U+2FA1D as double-wide question
                  > marks in blue (not in black like "ordinary" CJK characters); but
                  > "unknown" base plane ideograms like, for instance those from U+FA30 to
                  > U+FAD9 or from U+3400 to U+4DB5, are simply displayed as double-wide
                  > spaces. Hmmm-mm-mm... maybe we have found a bug or a limitation in gvim.
                  > In fact, I think I vaguely remember something Bram said some time back.

                  My MingLiU(Ver 3.21 and Ver 5.03) are not MS UCS4 encoding font.
                  They only have glyphs in BMP. But I try to test sursong.ttf(
                  Simsun (Founder Extended)), Sun-ExtA/Sun-ExtB[1] and Han Nom font[2]
                  still cannot display.

                  So my guess, this is not font's problem.

                  Thanks for the help.



                  Edward
                  [1] http://okuc.net/software/UniFonts.exe
                  [2] http://vietunicode.sourceforge.net/fonts/fonts_hannom.html
                • A. J. Mechelynck
                  Edward G.J. Lee wrote: [...] ... Here, all CJK fonts that I have, whether Korean, Japanese, Traditional Chinese or Simplified Chinese, all display (in gvim)
                  Message 8 of 18 , Feb 28, 2006
                  • 0 Attachment
                    Edward G.J. Lee wrote:
                    [...]
                    > My MingLiU(Ver 3.21 and Ver 5.03) are not MS UCS4 encoding font.
                    > They only have glyphs in BMP. But I try to test sursong.ttf(
                    > Simsun (Founder Extended)), Sun-ExtA/Sun-ExtB[1] and Han Nom font[2]
                    > still cannot display.
                    >
                    > So my guess, this is not font's problem.
                    >
                    > Thanks for the help.
                    >
                    >
                    >
                    > Edward
                    > [1] http://okuc.net/software/UniFonts.exe
                    > [2] http://vietunicode.sourceforge.net/fonts/fonts_hannom.html


                    Here, all CJK fonts that I have, whether Korean, Japanese, Traditional
                    Chinese or Simplified Chinese, all display (in gvim) double-wide blue
                    question marks for any ideograms outside the BMP. Let's wait and see
                    what Bram has to say about it.


                    Best regards,
                    Tony.
                  • Bram Moolenaar
                    ... I don t have anything to say about this. I m not aware of restrictions in the code to 16 bit characters, but the GTK code is complex and full of hacks (to
                    Message 9 of 18 , Feb 28, 2006
                    • 0 Attachment
                      Tony Mechelynck wrote:

                      > Edward G.J. Lee wrote:
                      > [...]
                      > > My MingLiU(Ver 3.21 and Ver 5.03) are not MS UCS4 encoding font.
                      > > They only have glyphs in BMP. But I try to test sursong.ttf(
                      > > Simsun (Founder Extended)), Sun-ExtA/Sun-ExtB[1] and Han Nom font[2]
                      > > still cannot display.
                      > >
                      > > So my guess, this is not font's problem.
                      > >
                      > > Thanks for the help.
                      > >
                      > > Edward
                      > > [1] http://okuc.net/software/UniFonts.exe
                      > > [2] http://vietunicode.sourceforge.net/fonts/fonts_hannom.html
                      >
                      >
                      > Here, all CJK fonts that I have, whether Korean, Japanese, Traditional
                      > Chinese or Simplified Chinese, all display (in gvim) double-wide blue
                      > question marks for any ideograms outside the BMP. Let's wait and see
                      > what Bram has to say about it.

                      I don't have anything to say about this. I'm not aware of restrictions
                      in the code to 16 bit characters, but the GTK code is complex and full
                      of hacks (to be able to use proportinally spaced fonts, to work around
                      bugs in pango, etc.). It requires an expert to look into this.

                      It might be that other applications use font replacement to display
                      characters that aren't actually in the font. I'm not sure what happens
                      for Vim.

                      --
                      A)bort, R)etry, D)o it right this time

                      /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                      /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
                      \\\ download, build and distribute -- http://www.A-A-P.org ///
                      \\\ help me help AIDS victims -- http://www.ICCF.nl ///
                    • A. J. Mechelynck
                      ... It s not only GTK: I get the same symptoms on Windows: any CJK character above U+FFFF is shown in gvim (using the default highlights and syntax set to
                      Message 10 of 18 , Feb 28, 2006
                      • 0 Attachment
                        Bram Moolenaar wrote:
                        > Tony Mechelynck wrote:
                        >
                        >> Edward G.J. Lee wrote:
                        >> [...]
                        >>> My MingLiU(Ver 3.21 and Ver 5.03) are not MS UCS4 encoding font.
                        >>> They only have glyphs in BMP. But I try to test sursong.ttf(
                        >>> Simsun (Founder Extended)), Sun-ExtA/Sun-ExtB[1] and Han Nom font[2]
                        >>> still cannot display.
                        >>>
                        >>> So my guess, this is not font's problem.
                        >>>
                        >>> Thanks for the help.
                        >>>
                        >>> Edward
                        >>> [1] http://okuc.net/software/UniFonts.exe
                        >>> [2] http://vietunicode.sourceforge.net/fonts/fonts_hannom.html
                        >>
                        >> Here, all CJK fonts that I have, whether Korean, Japanese, Traditional
                        >> Chinese or Simplified Chinese, all display (in gvim) double-wide blue
                        >> question marks for any ideograms outside the BMP. Let's wait and see
                        >> what Bram has to say about it.
                        >
                        > I don't have anything to say about this. I'm not aware of restrictions
                        > in the code to 16 bit characters, but the GTK code is complex and full
                        > of hacks (to be able to use proportinally spaced fonts, to work around
                        > bugs in pango, etc.). It requires an expert to look into this.
                        >
                        > It might be that other applications use font replacement to display
                        > characters that aren't actually in the font. I'm not sure what happens
                        > for Vim.
                        >

                        It's not only GTK: I get the same symptoms on Windows: any CJK character
                        above U+FFFF is shown in gvim (using the default highlights and 'syntax'
                        set to something nonexistent, e.g., ":set syntax=nononono") as a
                        double-wide _blue_ question mark in any CJK font. Characters not in the
                        font but below U+FFFF are displayed in a font-specific way, e.g. as a
                        double-wide space in MingLiU (a Traditional Chinese font) or in NSimSun
                        (a Simplified Chinese font), as a bullet in MsGothic (a Japanese font),
                        as a kind of small "carpenter's square" in GulimChe (a Korean font), etc.

                        In Courier_New, which is not an East-Asian font, I see hollow squares
                        occupying the left half of a double-wide character cell, highlighted in
                        black below U+FFFF, in blue above it.

                        -- The range used outside the base plane for CJK ideograms is at U+20000
                        to U+2FA1D. Most of these codepoints are defined but I haven't checked
                        them all. So if you want to try and reproduce this, just hit (e.g.)
                        ^VU20000 ^VU20001 (etc.) ^VU2FA1D (in Insert mode in a [NoName] buffer).


                        Best regards,
                        Tony.
                      • Bram Moolenaar
                        ... That is to be expected, Vim only supports 16 bit characters for Win32. MS-Windows has the lousy UTF-16 solution for the rest, that hasn t been implemented
                        Message 11 of 18 , Feb 28, 2006
                        • 0 Attachment
                          Tony Mechelynck wrote:

                          > It's not only GTK: I get the same symptoms on Windows: any CJK character
                          > above U+FFFF is shown in gvim (using the default highlights and 'syntax'
                          > set to something nonexistent, e.g., ":set syntax=nononono") as a
                          > double-wide _blue_ question mark in any CJK font. Characters not in the
                          > font but below U+FFFF are displayed in a font-specific way, e.g. as a
                          > double-wide space in MingLiU (a Traditional Chinese font) or in NSimSun
                          > (a Simplified Chinese font), as a bullet in MsGothic (a Japanese font),
                          > as a kind of small "carpenter's square" in GulimChe (a Korean font), etc.

                          That is to be expected, Vim only supports 16 bit characters for Win32.
                          MS-Windows has the lousy UTF-16 solution for the rest, that hasn't been
                          implemented yet. I expect this to get very messy...

                          --
                          hundred-and-one symptoms of being an internet addict:
                          3. Your bookmark takes 15 minutes to scroll from top to bottom.

                          /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                          /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
                          \\\ download, build and distribute -- http://www.A-A-P.org ///
                          \\\ help me help AIDS victims -- http://www.ICCF.nl ///
                        • Edward G.J. Lee
                          Hello Bram, ... At least under GNU/Linux or *BSD box, the console vim(not GUI) should display beyond U+FFFF characters correctly in UTF-8 terminal with full
                          Message 12 of 18 , Feb 28, 2006
                          • 0 Attachment
                            Hello Bram,

                            On Tue, Feb 28, 2006, Bram Moolenaar wrote:
                            >
                            > That is to be expected, Vim only supports 16 bit characters for Win32.
                            > MS-Windows has the lousy UTF-16 solution for the rest, that hasn't been
                            > implemented yet. I expect this to get very messy...

                            At least under GNU/Linux or *BSD box, the console vim(not GUI)
                            should display beyond U+FFFF characters correctly in UTF-8
                            terminal with full Unicode support installed font of X. Am
                            I right?

                            My problem is it can't.

                            Do you have any idea? Thanks.



                            Edward
                          • Bram Moolenaar
                            ... Oh, I forgot something. The structures used for the screen are limited to 16 bit, because there were no fonts for other characters. If you say that you
                            Message 13 of 18 , Feb 28, 2006
                            • 0 Attachment
                              Edward G.J. Lee wrote:

                              > On Tue, Feb 28, 2006, Bram Moolenaar wrote:
                              > >
                              > > That is to be expected, Vim only supports 16 bit characters for Win32.
                              > > MS-Windows has the lousy UTF-16 solution for the rest, that hasn't been
                              > > implemented yet. I expect this to get very messy...
                              >
                              > At least under GNU/Linux or *BSD box, the console vim(not GUI)
                              > should display beyond U+FFFF characters correctly in UTF-8
                              > terminal with full Unicode support installed font of X. Am
                              > I right?
                              >
                              > My problem is it can't.
                              >
                              > Do you have any idea? Thanks.

                              Oh, I forgot something. The structures used for the screen are limited
                              to 16 bit, because there were no fonts for other characters. If you say
                              that you can actually display characters above 0x10000 I'll have to
                              change that.

                              Do we need three or four bytes? We'll probably need to use four bytes
                              anyway, since there is no data type for three bytes.

                              Since using these characters is rare, I'll probably have to make it a
                              configuration option to avoid wasting memory. There also still is a
                              todo item to support more than 2 combining characters. We may end up
                              using 20 bytes per screen position.... The number of combining
                              characters could be an option, but doing that for the number of bytes
                              per character would be complicated. That probably has to be a feature,
                              thus decided at compile time.

                              --
                              hundred-and-one symptoms of being an internet addict:
                              8. You spend half of the plane trip with your laptop on your lap...and your
                              child in the overhead compartment.

                              /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                              /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
                              \\\ download, build and distribute -- http://www.A-A-P.org ///
                              \\\ help me help AIDS victims -- http://www.ICCF.nl ///
                            • Edward G.J. Lee
                              Dear Bram, ... Yes, I can display U+20000..U+2A6DF correctly in my gnome-terminal. I have a simple Ruby script to generate all those characters,
                              Message 14 of 18 , Mar 1, 2006
                              • 0 Attachment
                                Dear Bram,

                                On Tue, Feb 28, 2006, Bram Moolenaar wrote:

                                > Oh, I forgot something. The structures used for the screen are limited
                                > to 16 bit, because there were no fonts for other characters. If you say
                                > that you can actually display characters above 0x10000 I'll have to
                                > change that.

                                Yes, I can display U+20000..U+2A6DF correctly in my gnome-terminal.
                                I have a simple Ruby script to generate all those characters,

                                http://edt1023.sayya.org/ruby/u.rb
                                http://edt1023.sayya.org/ruby/tmp/cjkextb.png

                                > Do we need three or four bytes? We'll probably need to use four bytes
                                > anyway, since there is no data type for three bytes.

                                We need four bytes, I think? We need cover the Unicode range from
                                0x10000 to 0x10FFFF.

                                > Since using these characters is rare, I'll probably have to make it a
                                > configuration option to avoid wasting memory. There also still is a
                                > todo item to support more than 2 combining characters. We may end up
                                > using 20 bytes per screen position.... The number of combining
                                > characters could be an option, but doing that for the number of bytes
                                > per character would be complicated. That probably has to be a feature,
                                > thus decided at compile time.

                                I have to admit that those characters are rare used in an ordinary
                                artcile. But the problem is people's name in CJKV area, especial
                                Chinese people. They may use characters in Unicode CJKV Unified
                                Extension B, and I have to type the name correct.

                                And I'm makeing an input table of XIM in Chinese, as you may know,
                                the table need include completely all the character in Extension B.
                                So I need a familiar editor to type those characters and its keys.

                                The another example is LaTeX CJK. The cvs version of LaTeX CJK had
                                full support of Unicode range now, and I need to edit the example
                                for testing,

                                http://edt1023.sayya.org/tex/tmp/nobmp2.tex
                                http://edt1023.sayya.org/tex/tmp/nobmp2.pdf

                                So, it's great to support CJKV Unified Extension B as an option of
                                Vim. Thanks in advance.



                                Edward
                              • Nikolai Weibull
                                ... We need ceil(log2(0x10FFFF)) = 21 bits, or, more realistically, 24 bits, or, even more realistically, 32 bits. I don t think we need to worry about memory
                                Message 15 of 18 , Mar 1, 2006
                                • 0 Attachment
                                  On 3/1/06, Edward G.J. Lee <edt1023@...> wrote:
                                  > We need four bytes, I think? We need cover the Unicode range from
                                  > 0x10000 to 0x10FFFF.

                                  We need ceil(log2(0x10FFFF)) = 21 bits, or, more realistically, 24
                                  bits, or, even more realistically, 32 bits. I don't think we need to
                                  worry about memory consumption for the display of characters though.
                                  At least on any modern system. Perhaps the MS-DOS port needs special
                                  treatment...

                                  nikolai
                                • Bram Moolenaar
                                  I have made changes to the code to use 32 bits for storing Unicode characters. It s included in last nights snapshot. I have no way to try it out. It s not
                                  Message 16 of 18 , Mar 6, 2006
                                  • 0 Attachment
                                    I have made changes to the code to use 32 bits for storing Unicode
                                    characters. It's included in last nights snapshot.

                                    I have no way to try it out. It's not unlikely that there are a few
                                    problems.

                                    For Win32 I changed the conversion from UTF-8 to UCS-2 to produce
                                    UTF-16. I don't know if that is sufficient for drawing the characters.

                                    GTK2 does everything with UTF-8, thus it should work as it is.

                                    I also added 'maxcombine' to support up to 6 combining characters.
                                    That's enough for everyone, right?

                                    --
                                    hundred-and-one symptoms of being an internet addict:
                                    51. You put a pillow case over your laptop so your lover doesn't see it while
                                    you are pretending to catch your breath.

                                    /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                                    /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
                                    \\\ download, build and distribute -- http://www.A-A-P.org ///
                                    \\\ help me help AIDS victims -- http://www.ICCF.nl ///
                                  • Christian MICHON
                                    ... I know this is a bit late, but I tried today to make a custom gtk2 gvim build without multibyte, and I apparently cannot unless I change heavily the code
                                    Message 17 of 18 , Sep 29, 2006
                                    • 0 Attachment
                                      On 3/6/06, Bram Moolenaar <Bram@...> wrote:
                                      >
                                      > GTK2 does everything with UTF-8, thus it should work as it is.
                                      >

                                      I know this is a bit late, but I tried today to make a custom gtk2
                                      gvim build without multibyte, and I apparently cannot unless I
                                      change heavily the code around the utf8 stuff...

                                      The problem I face is that all my previous builds were gtk-1.2.10
                                      without multibyte, and when I do cut and paste from gvim 7
                                      to one of those gtk1 build, I get nasty \@utf8\@ chains
                                      embedded in whatever I copy.

                                      Is there a fix or this is the expected gtk2 behaviour ?

                                      --
                                      Christian
                                    • A.J.Mechelynck
                                      ... GTK2 gvim does all its I/O in UTF-8. Therefore, IIUC, you cannot have both +gui_gtk2 and -multibyte. You might try setting encoding to latin1 in the GTK2
                                      Message 18 of 18 , Sep 29, 2006
                                      • 0 Attachment
                                        Christian MICHON wrote:
                                        > On 3/6/06, Bram Moolenaar <Bram@...> wrote:
                                        >>
                                        >> GTK2 does everything with UTF-8, thus it should work as it is.
                                        >>
                                        >
                                        > I know this is a bit late, but I tried today to make a custom gtk2
                                        > gvim build without multibyte, and I apparently cannot unless I
                                        > change heavily the code around the utf8 stuff...
                                        >
                                        > The problem I face is that all my previous builds were gtk-1.2.10
                                        > without multibyte, and when I do cut and paste from gvim 7
                                        > to one of those gtk1 build, I get nasty \@utf8\@ chains
                                        > embedded in whatever I copy.
                                        >
                                        > Is there a fix or this is the expected gtk2 behaviour ?
                                        >

                                        GTK2 gvim does all its I/O in UTF-8. Therefore, IIUC, you cannot have both
                                        +gui_gtk2 and -multibyte.

                                        You might try setting 'encoding' to latin1 in the GTK2 build to see if it
                                        makes a difference. Of course, there's no way you can paste codepoints >
                                        U+00FF into an 8-bit build of Vim, except as multi-byte gibberish.

                                        The long-term fix, of course, is to stop using those earlier -multibyte
                                        builds. 8-bit _files_ should be compatible between + and - multibyte builds
                                        anyway.


                                        Best regards,
                                        Tony.
                                      Your message has been successfully submitted and would be delivered to recipients shortly.