Loading ...
Sorry, an error occurred while loading the content.

666Re: mbyte.c patch

Expand Messages
  • Glenn Maynard
    Jul 17, 2002
    • 0 Attachment
      On Wed, Jul 17, 2002 at 12:54:07PM +0200, Bram Moolenaar wrote:
      > This indeed looks very broken. ImeGetTempComposition() will always
      > return NULL. It already was like this in 6.0. I didn't hear specific
      > complaints about this. I suppose this means we might as well remove
      > this code.

      Alright. Patch attached:

      Remove HanExtTextOut, bInComposition, ImeGetTempComposition,

      Removed a now-unneeded pair of braces from gui_mch_draw_string. I
      didn't unindent the block of code between it, since that'd bloat the
      patch with stuff that looks like changes but isn't. (I wish patch
      could handle this better.) I'll let you do that.

      Remaining problems in the renderer:

      SBCS encodings should render with "is_funky_dbcs"; right now we'll get
      garbage unless the encoding matches the system.

      The Arabic/Hebrew hack isn't always done. If we pass that range of text
      to the renderer as a block, we'll get weird results. Even if we don't
      have RL support enabled (or even compiled), and we're in Unicode, and the
      Hebrew text wouldn't look right to a Hebrew user, we still might be a
      regular user editing mixed-language text; we don't want weird things

      len isn't set right in is_funky_dbcs; wide characters cause underlining

      Padding isn't set right in Unicode or is_funky_dbcs, so bold and italic
      characters cause spacing glitches.

      Also in this patch:

      Move padding computation to a function (set_padding).

      Move ETO_IGNORELANGUAGE setting out of each individual call, to
      make sure it's always used.

      Set padding in Unicode.

      (I'd have put these in a separate patch, but they're working on the same
      block of code.)

      This fixes bold/italics in Unicode, and simplifies the renderer a

      I think these problems are ultimately because there are too many code
      paths. I'd suggest always converting the string to Unicode at the start
      of the function. (At least in NT, it shouldn't be a speed hit, since I
      believe the font API will convert to Unicode anyway.) The RL (RevOut)
      special case needs to be done in Unicode before this will work. (Which
      I have code for, but am holding off on in the interests of keeping the
      patch down.)

      If this was done, the rest of the above problems would just go away.

      > > The only way I can see system conversions causing problems is if round-trip
      > > conversion fails. I don't know of any cases of this, but if anyone does I'd
      > > definitely like to know.
      > Perhaps it goes wrong when the codepage is set wrong? This is
      > especially for 8-bit codepages where you probably don't notice the
      > mistake if you do use the right font. Conversion to Unicode will reveal
      > the problem.

      It'd cause problems with the clipboard, though.

      Hmm. I think the default encodings could be improved a bit. For
      example, if a user tries to load a file, and it fails, they're likely
      to first change "encoding". Of course, changing that alone isn't
      correct, but it may happen to work. Or, they may change just
      "fileencodings", which is closer, but that probably won't work, since
      you can't convert most other encodings to latin1. You have to change two
      values, and it takes a bit of reading to figure out exactly what to do.
      (Enough that a lot of users are likely to get it wrong.)

      First, I'd change the default "encodings" to UTF-8 in Windows. "latin1"
      is only a reasonable default if the system encoding happens to be one
      that's like latin1; UTF-8 is almost always a better default (especially
      since most users should be able to leave it alone.)

      Second, I'd change the default Unicode fencs from "ucs-bom,utf-8,latin1"
      to "ucs-bom,utf-8,CP####" in Windows. Again, latin1 is only a
      reasonable default for latin1 users; setting it intelligently should
      work for a lot more people without changes. (I think this should let
      most people edit locally-encoded files without having to touch encoding-
      related settings at all.)

      > > I'll work on making fileio use MBtoWC for codepage<->Unicode conversion
      > > when possible. Even if you leave encodings as is, this will help.


      This is straightforward, except for one thing: MBtoWC and WCtoMB are bad
      at handling streamed data. It has no way to nicely handle a final
      incomplete sequence.

      I've dealt with this by testing both; if converting the whole string
      doesn't work, bump the last character into the save buffer and try
      again. (It doesn't actually convert, so it shouldn't be too slow, but
      if it is I can optimize by doing this test manually.)

      Once this is tested, along with the clipboard, we might try asking any
      resident CJK people to try working in encoding=UTF-8 and tell us if they
      notice any problems (or if any problems are fixed.)

      That's probably the best way to see if using UTF-8 will cause problems:
      fix the known ones (which I'm doing) and then ask people to try it.
      Drop the theoreticals. :)

      Glenn Maynard
    • Show all 14 messages in this topic