Loading ...
Sorry, an error occurred while loading the content.

1004Re: Filename encodings under Win32

Expand Messages
  • Bram Moolenaar
    Oct 13, 2003
      Camillo wrote:

      > > Vim should support UTF-8 in 9x, too.
      >
      > Of course, but with the necessary restrictions. Displaying unicode is a
      > problem, as is entering filenames. Those functions are restricted to the
      > ACP on Win9x.

      On Windows NT/XP there are also restrictions, especially when using
      non-NTFS filesystems. There was a discussion about this in the Linux
      UTF-8 maillist a long time ago. There was no good universal solution
      for handling filenames that they could come up with.

      Vim could use Unicode functions for accessing files, but this will be a
      huge change. Requires lots of testing. Main problem is when 'encoding'
      is not a Unicode encoding, then conversions need to be done, which may
      fail.

      If you use filenames that cannot be represented in the active codepage,
      you probably have problems with other programs. Thus sticking with the
      active codepage functions isn't too bad. But then Vim needs to convert
      from 'encoding' to the active codepage!

      > It is a bugfix. Currently, when using UTF-8 on WinNT, vim is broken in (at
      > least) the following regards:
      >
      > - Opening non-ascii filenames, regardless of codepage
      > å.txt internally becomes <e5>.txt
      >
      > - Saving filenames
      > å.txt is saved in UTF-8 format (Ã¥.txt) and displayed incorrectly in
      > title bar

      The file names are handled as byte strings. Thus so long as you use the
      right bytes it should work. Problem is when you are typing/editing with
      a different encoding from the active codepage.

      > - The default termencoding should be set intelligently, UTF-8 as
      > termencoding breaks input of non-ascii.

      Why would 'termencoding' be "utf-8"? This won't work, unless you are
      using an xterm on MS-Windows. The default 'termencoding' is empty,
      which means 'encoding' is used. There is no better default. When you
      change 'encoding' you might have to change 'termencoding' as well, but
      this depends on your situation.

      > - The default fileencoding breaks when "going UTF-8", most probably a
      > better behavior would be to default to the ACP always.

      'fileencoding' is set when reading a file. Perhaps you mean
      'fileencodings'? This one needs to be tweaked by the user, because it
      depends on what kind of files you edit. Main problem is that an ASCII
      file can be any encoding, Vim can't detect what it is, thus the user has
      to specify what he wants Vim to do with it.

      > - Also, my vim (6.2) defaults to "latin1", not my current codepage. That
      > would indicate that the ACP detection does not work.

      Where does it use "latin1"? Not in 'encoding', I suppose.

      > OK, the list above sounds like whining, but earlier I did suggest that the
      > fixes are fairly straightforward.

      Mostly it's quite more complicated. Different users have different
      situations, it is hard to think of solutions that work for most people.

      > On WinNT, vim should use unicode apis, essentially benefitting
      > automatically from NT native Unicode. This only involves one additional
      > encoding/decoding step before calling the apis.

      The problem is that conversions to/from Unicode only work when you know
      the encoding of the text you are converting. The encoding isn't always
      known. Vim sometimes uses "latin1", so that you at least get 8-bit
      clean editing, even though the actual encoding is unknown.

      > On Win9x, vim should use ANSI apis. The only thing missing is again the
      > encoding/decoding, although it's trickier with the ANSI apis. There are
      > many cases where an user would enter UTF-8 stuff that doesn't smootly
      > convert to the current CP. I think vim's current code should detect that
      > easily.

      You can use a few Unicode functions on Win9x, we already do. I don't
      see a reason to change this.

      --
      I'm in shape. Round IS a shape.

      /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
      /// Creator of Vim - Vi IMproved -- http://www.Vim.org \\\
      \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
      \\\ Help AIDS victims, buy here: http://ICCF-Holland.org/click1.html ///
    • Show all 29 messages in this topic