666Re: mbyte.c patch
- Jul 17, 2002On Wed, Jul 17, 2002 at 12:54:07PM +0200, Bram Moolenaar wrote:
> This indeed looks very broken. ImeGetTempComposition() will alwaysAlright. Patch attached:
> return NULL. It already was like this in 6.0. I didn't hear specific
> complaints about this. I suppose this means we might as well remove
> this code.
Remove HanExtTextOut, bInComposition, ImeGetTempComposition,
Removed a now-unneeded pair of braces from gui_mch_draw_string. I
didn't unindent the block of code between it, since that'd bloat the
patch with stuff that looks like changes but isn't. (I wish patch
could handle this better.) I'll let you do that.
Remaining problems in the renderer:
SBCS encodings should render with "is_funky_dbcs"; right now we'll get
garbage unless the encoding matches the system.
The Arabic/Hebrew hack isn't always done. If we pass that range of text
to the renderer as a block, we'll get weird results. Even if we don't
have RL support enabled (or even compiled), and we're in Unicode, and the
Hebrew text wouldn't look right to a Hebrew user, we still might be a
regular user editing mixed-language text; we don't want weird things
len isn't set right in is_funky_dbcs; wide characters cause underlining
Padding isn't set right in Unicode or is_funky_dbcs, so bold and italic
characters cause spacing glitches.
Also in this patch:
Move padding computation to a function (set_padding).
Move ETO_IGNORELANGUAGE setting out of each individual call, to
make sure it's always used.
Set padding in Unicode.
(I'd have put these in a separate patch, but they're working on the same
block of code.)
This fixes bold/italics in Unicode, and simplifies the renderer a
I think these problems are ultimately because there are too many code
paths. I'd suggest always converting the string to Unicode at the start
of the function. (At least in NT, it shouldn't be a speed hit, since I
believe the font API will convert to Unicode anyway.) The RL (RevOut)
special case needs to be done in Unicode before this will work. (Which
I have code for, but am holding off on in the interests of keeping the
If this was done, the rest of the above problems would just go away.
> > The only way I can see system conversions causing problems is if round-tripIt'd cause problems with the clipboard, though.
> > conversion fails. I don't know of any cases of this, but if anyone does I'd
> > definitely like to know.
> Perhaps it goes wrong when the codepage is set wrong? This is
> especially for 8-bit codepages where you probably don't notice the
> mistake if you do use the right font. Conversion to Unicode will reveal
> the problem.
Hmm. I think the default encodings could be improved a bit. For
example, if a user tries to load a file, and it fails, they're likely
to first change "encoding". Of course, changing that alone isn't
correct, but it may happen to work. Or, they may change just
"fileencodings", which is closer, but that probably won't work, since
you can't convert most other encodings to latin1. You have to change two
values, and it takes a bit of reading to figure out exactly what to do.
(Enough that a lot of users are likely to get it wrong.)
First, I'd change the default "encodings" to UTF-8 in Windows. "latin1"
is only a reasonable default if the system encoding happens to be one
that's like latin1; UTF-8 is almost always a better default (especially
since most users should be able to leave it alone.)
Second, I'd change the default Unicode fencs from "ucs-bom,utf-8,latin1"
to "ucs-bom,utf-8,CP####" in Windows. Again, latin1 is only a
reasonable default for latin1 users; setting it intelligently should
work for a lot more people without changes. (I think this should let
most people edit locally-encoded files without having to touch encoding-
related settings at all.)
> > I'll work on making fileio use MBtoWC for codepage<->Unicode conversionAttached.
> > when possible. Even if you leave encodings as is, this will help.
This is straightforward, except for one thing: MBtoWC and WCtoMB are bad
at handling streamed data. It has no way to nicely handle a final
I've dealt with this by testing both; if converting the whole string
doesn't work, bump the last character into the save buffer and try
again. (It doesn't actually convert, so it shouldn't be too slow, but
if it is I can optimize by doing this test manually.)
Once this is tested, along with the clipboard, we might try asking any
resident CJK people to try working in encoding=UTF-8 and tell us if they
notice any problems (or if any problems are fixed.)
That's probably the best way to see if using UTF-8 will cause problems:
fix the known ones (which I'm doing) and then ask people to try it.
Drop the theoreticals. :)
- << Previous post in topic Next post in topic >>