Loading ...
Sorry, an error occurred while loading the content.

1010Re: Filename encodings under Win32

Expand Messages
  • Bram Moolenaar
    Oct 13, 2003
      Camillo wrote:

      > > Main problem is that sometimes we don't know what the encoding is.
      >
      > On Windows? I would disagree here. Any filesystem mounted by Windows
      > should be mounted in a way that adheres to Windows naming conventions.
      > We're not discussing file contents here.

      A file name may appear in a file (e.g., a list of files in a README
      file). And I don't know what happens with file names on removable media
      (e.g., a CD). Probably depends on the file system it contains. And
      networked file systems is another problem.

      > > In that situation you can treat the filename as a sequence of bytes in most
      > > places, but conversion is impossible. This happens more often than you
      > > would expect. Put a floppy disk or CD into your computer...
      >
      > So why convert it? :) The current display/saving problems stem from the
      > fact that the file name is interpreted as UTF-8, a coding which Windows
      > does not recognize for file names or strings.

      We need to locate places where the encoding is different from what a
      system function expects. There are still a few things that need to be
      fixed.

      > > There is also the situation that Vim uses the active codepage, but the
      > > file is actually in another encoding that could not be detected. Then
      > > doing "gf" on a filename will work if you don't do conversion, but it
      > > will fail if you try converting with the wrong encoding in mind.
      >
      > AFAIK, Windows will internally convert the path into Unicode if you call
      > the ANSI function. Thus if gf succeeds as you describe, it should succeed
      > if you use the unicode api as well. In both cases a 8-bit binary string
      > undergoes "cp2unicode" conversion.

      If Vim defaults to the active codepage then conversion to Unicode would
      do the same as using the ANSI function. Thus it's only a problem when
      'encoding' is different from the active codepage. And when 'encoding'
      is a Unicode variant we can use the "W" functions. Still, this means
      all fopen() and stat() calls must be adjusted. When 'encoding' is not
      the active codepage we could either leave the file name untranslated (as
      it's now) or convert it to Unicode. Don't know which one would work
      best...

      > > Your active codepage must be latin1 then. Vim gets the default from the
      > > active codepage.
      >
      > My code page is cp1252. It's not latin1 (iso-8859-1). In practice, both
      > are 8-bit-raw.

      cp1252 and latin1 are not identical, but for practical use they can be
      handled as the same encoding. Vim indeed uses this as the "raw" 8-bit
      encoding that avoids messing up your characters when you don't know what
      encoding it actually is.

      --
      hundred-and-one symptoms of being an internet addict:
      194. Your business cards contain your e-mail and home page address.

      /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
      /// Creator of Vim - Vi IMproved -- http://www.Vim.org \\\
      \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
      \\\ Help AIDS victims, buy here: http://ICCF-Holland.org/click1.html ///
    • Show all 29 messages in this topic