Loading ...
Sorry, an error occurred while loading the content.

675Re: windows and unicode filenames, etc.

Expand Messages
  • Bram Moolenaar
    Aug 5, 2002
    • 0 Attachment
      Glenn Maynard wrote:

      > On Mon, Aug 05, 2002 at 09:09:38PM +0200, Bram Moolenaar wrote:
      > > > Editing files with Unicode in the filename that don't fit in the ANSI
      > > > codepage doesn't work. Fixed, except for the browser, and except for
      > > > renaming (since I don't really want to go near win32's mch_rename, but
      > > > it does need fixing.)
      > >
      > > I thought it did work for some DBCS encodings. I did include patches
      > > for this in the past.
      >
      > If encoding is set to the current codepage, it'll work: the paths are
      > being sent directly to the system routines, unedited, and that's what
      > the *A (ANSI) versions expect--the ANSI codepage.
      >
      > If encoding is set to anything else (including Unicode), it'll only work
      > for ASCII, and will probably do something nonsensical for anything else.
      >
      > If encoding is set to the current codepage, it's impossible to represent
      > filenames that don't fit in that codepage, too. (I can't edit files
      > with Japanese in the filename, since my codepage is US.)

      I think it so far only worked for text in the system codepage. When
      setting 'encoding' to something else I would guess we don't convert,
      thus you end up with nonsense. Converting the title to Unicode should
      work (if the wide version of the function is available, might not be
      true on Win 9x).

      > > This has a big drawback: for DBCS codes finding the start of a character
      > > is complicated and slow. Don't want to use the same code for single
      > > byte encodings. There are quite a few other places where DBCS is
      > > handled much slower.
      > >
      > > Isn't it easier to ignore enc_dbcs where the code needs to be used for
      > > both encodings?
      >
      > Well, I need to be able to know the codepage if encoding is set to one.
      > This is easy if encoding=cp932, for example, but it's less easy if it's
      > "2byte-cp932" or something like that.

      Ah, you are running into the problem that enc_dbcs is both used as a
      flag that DBCS encoding is being used and the number of the codepage
      used for 'encoding'. We could separate the two to avoid confusion.
      Introduce enc_codepage perhaps?

      > Perhaps there should be a single function, win_get_penc_codepage(),
      > which does all of that parsing and returns the codepage (or -1 if it's
      > not a codepage)?

      Since 'encoding' doesn't change very often this could be done once and
      stored in a global variable, just like enc_utf8 and enc_dbcs.

      > Also, the is_funky_dbcs code in the win32 renderer should use this, too,
      > since it needs to do the same thing. (Render with Unicode conversion if
      > win_get_penc_codepage() != GetACP(); then is_funky_dbcs can probably go
      > away, too, since nothing else uses it.)

      If GetACP() is really fast, then is_funky_dbcs becomes obsolete.
      Otherwise, I thought you were planning to rename it anyway.

      > > > I'll probably revert removing the broken Korean stuff and just comment out
      > > > the call for now; I doubt it's needed, but it's not important.
      > >
      > > Still didn't find someone who can tell when the code is really needed?
      >
      > Can you contact the person named in the code? I can't find him in the
      > archives at all. I still suspect it's no longer needed, due to the
      > newer IME fixes, and the Korean IME does work for me, but I don't know
      > about eg. older Korean IM's from 9x. All that code does is poll the IME
      > when the cursor blinks, and prints whatever's in there on the cursor;
      > since the IME displays the character automatically, there's no need for
      > this. (But before the new IME code, this may not have worked.)

      I last received a message from Sung-Hoon Baek in 1998...
      Hopefully another Korean can help us here! Namsh?

      > I don't know about the weird fake-backslash code. I can see why it was
      > wanted: MS Korean fonts actually do apparently have a Yen sign on \, which
      > I'd imagine Korean users might not want. If you want, I can try to make
      > this code work for now, and add an option for this. (I think it should
      > be replaced completely at some point, as I've mentioned, but I don't
      > expect to have that ready soon, since I need to figure out how to retrofit
      > that without being overly intrusive. Also, since it's a nontrivial
      > block of code, I'd much rather wait until the current stuff is settled,
      > or the diff is going to get unmanagable and there'll be too much to test
      > properly.)

      Even though your reasons sound sensible, I'm a bit careful about
      throwing out code that nobody complained about.

      --
      "A clear conscience is usually the sign of a bad memory."
      -- Steven Wright

      /// Bram Moolenaar -- Bram@... -- http://www.moolenaar.net \\\
      /// Creator of Vim -- http://vim.sf.net -- ftp://ftp.vim.org/pub/vim \\\
      \\\ Project leader for A-A-P -- http://www.a-a-p.org ///
      \\\ Lord Of The Rings helps Uganda - http://iccf-holland.org/lotr.html ///
    • Show all 6 messages in this topic