Re: About Unicode CJK Unified Extension B
- Dear Bram,
On Tue, Feb 28, 2006, Bram Moolenaar wrote:
> Oh, I forgot something. The structures used for the screen are limited
> to 16 bit, because there were no fonts for other characters. If you say
> that you can actually display characters above 0x10000 I'll have to
> change that.
Yes, I can display U+20000..U+2A6DF correctly in my gnome-terminal.
I have a simple Ruby script to generate all those characters,
> Do we need three or four bytes? We'll probably need to use four bytes
> anyway, since there is no data type for three bytes.
We need four bytes, I think? We need cover the Unicode range from
0x10000 to 0x10FFFF.
> Since using these characters is rare, I'll probably have to make it a
> configuration option to avoid wasting memory. There also still is a
> todo item to support more than 2 combining characters. We may end up
> using 20 bytes per screen position.... The number of combining
> characters could be an option, but doing that for the number of bytes
> per character would be complicated. That probably has to be a feature,
> thus decided at compile time.
I have to admit that those characters are rare used in an ordinary
artcile. But the problem is people's name in CJKV area, especial
Chinese people. They may use characters in Unicode CJKV Unified
Extension B, and I have to type the name correct.
And I'm makeing an input table of XIM in Chinese, as you may know,
the table need include completely all the character in Extension B.
So I need a familiar editor to type those characters and its keys.
The another example is LaTeX CJK. The cvs version of LaTeX CJK had
full support of Unicode range now, and I need to edit the example
So, it's great to support CJKV Unified Extension B as an option of
Vim. Thanks in advance.
- On 3/1/06, Edward G.J. Lee <edt1023@...> wrote:
> We need four bytes, I think? We need cover the Unicode range fromWe need ceil(log2(0x10FFFF)) = 21 bits, or, more realistically, 24
> 0x10000 to 0x10FFFF.
bits, or, even more realistically, 32 bits. I don't think we need to
worry about memory consumption for the display of characters though.
At least on any modern system. Perhaps the MS-DOS port needs special
- I have made changes to the code to use 32 bits for storing Unicode
characters. It's included in last nights snapshot.
I have no way to try it out. It's not unlikely that there are a few
For Win32 I changed the conversion from UTF-8 to UCS-2 to produce
UTF-16. I don't know if that is sufficient for drawing the characters.
GTK2 does everything with UTF-8, thus it should work as it is.
I also added 'maxcombine' to support up to 6 combining characters.
That's enough for everyone, right?
hundred-and-one symptoms of being an internet addict:
51. You put a pillow case over your laptop so your lover doesn't see it while
you are pretending to catch your breath.
/// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
/// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\ download, build and distribute -- http://www.A-A-P.org ///
\\\ help me help AIDS victims -- http://www.ICCF.nl ///
- On 3/6/06, Bram Moolenaar <Bram@...> wrote:
>I know this is a bit late, but I tried today to make a custom gtk2
> GTK2 does everything with UTF-8, thus it should work as it is.
gvim build without multibyte, and I apparently cannot unless I
change heavily the code around the utf8 stuff...
The problem I face is that all my previous builds were gtk-1.2.10
without multibyte, and when I do cut and paste from gvim 7
to one of those gtk1 build, I get nasty \@utf8\@ chains
embedded in whatever I copy.
Is there a fix or this is the expected gtk2 behaviour ?
- Christian MICHON wrote:
> On 3/6/06, Bram Moolenaar <Bram@...> wrote:GTK2 gvim does all its I/O in UTF-8. Therefore, IIUC, you cannot have both
>> GTK2 does everything with UTF-8, thus it should work as it is.
> I know this is a bit late, but I tried today to make a custom gtk2
> gvim build without multibyte, and I apparently cannot unless I
> change heavily the code around the utf8 stuff...
> The problem I face is that all my previous builds were gtk-1.2.10
> without multibyte, and when I do cut and paste from gvim 7
> to one of those gtk1 build, I get nasty \@utf8\@ chains
> embedded in whatever I copy.
> Is there a fix or this is the expected gtk2 behaviour ?
+gui_gtk2 and -multibyte.
You might try setting 'encoding' to latin1 in the GTK2 build to see if it
makes a difference. Of course, there's no way you can paste codepoints >
U+00FF into an 8-bit build of Vim, except as multi-byte gibberish.
The long-term fix, of course, is to stop using those earlier -multibyte
builds. 8-bit _files_ should be compatible between + and - multibyte builds