Loading ...
Sorry, an error occurred while loading the content.

658Re: mbyte.c patch

Expand Messages
  • Bram Moolenaar
    Jul 14, 2002
      Glenn Maynard wrote:

      > On Sun, Jul 14, 2002 at 12:51:38PM +0200, Bram Moolenaar wrote:
      > > Question: You changed UTF-16 to UCS-2 in several places. Are you sure
      > > this is correct? I thought that MS-Windows does use UTF-16.
      > Hmm. You're probably right; thinking about it, I'm really not sure.
      > > For the
      > > code it doesn't really matter, I suppose, since it's still using
      > > two-byte words.
      > For this code, it shouldn't, since it's only used for the IME which
      > probably won't generate anything that'll translate outside of the BMP.
      > (Actually, is there anything at all within the regular Windows codepages
      > that's outside the BMP?) Still, it should be commented correctly to
      > avoid confusion later on.

      As far as I know, MS-Windows only supports UCS-2 so far, but since they
      finally discovered that 16 bits is not enough (just like 640 Kbyte wasn't
      enough! :-), they found the UTF-16 hack to work around it. A really
      ugly solution compared to UTF-8. I don't know how much of MS-Windows
      currently actually supports UTF-16.

      > Also, the code setting up the IME converters:
      > convert_setup(&ime_conv, "ucs-2", p_enc);
      > ime_conv_cp.vc_type = CONV_DBCS_TO_UCS2;
      > ime_conv_cp.vc_dbcs = GetACP();
      > ime_conv_cp.vc_factor = 2; /* we don't really know anything about the codepage */
      > This works if p_enc is Unicode or latin1 (due to the special cases), but
      > I don't think it'll work if it's anything else (ie. cp932), since it'll
      > fall back on iconv and that'll force "UCS-2" to "UTF-8".

      Hmm, the call to iconv probably has to be fixed for this. Sounds like
      we need an extra flag in vimconv_T that indicates if any Unicode is
      handled as UTF-8 or not. Perhaps this could also be handled when
      filling vimconv_T, we don't need the flag then.

      > Now, setting encoding to anything but UTF-8 or latin1 currently doesn't
      > work anyway: it doesn't render correctly. Is that intended to work?
      > If not, encoding should probably reject other settings. It'd simplify
      > a lot of things if the internal encoding was always UTF-8 in Windows.
      > I think you mentioned this idea before.

      Setting 'encoding' to some Asian codepage should certainly work. Korean
      and Japanese users couldn't work without this.

      Unfortunately we can't drop all kinds of encodings and use UTF-8,
      conversion from/to Unicode will not always be possible. There is the
      famous yen vs backslash problem, for example.

      BLACK KNIGHT: I'm invincible!
      ARTHUR: You're a looney.
      "Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

      /// Bram Moolenaar -- Bram@... -- http://www.moolenaar.net \\\
      /// Creator of Vim -- http://vim.sf.net -- ftp://ftp.vim.org/pub/vim \\\
      \\\ Project leader for A-A-P -- http://www.a-a-p.org ///
      \\\ Help me helping AIDS orphans in Uganda - http://iccf-holland.org ///
    • Show all 14 messages in this topic