Loading ...
Sorry, an error occurred while loading the content.

Re: Filename encodings under Win32

Expand Messages
  • Bram Moolenaar
    ... The default that Vim starts with is encoding set to the active codepage and fileencoding set to ucs-bom . This means it falls back to encoding when
    Message 1 of 29 , Oct 13, 2003
    • 0 Attachment
      Glenn Maynard wrote:

      > On Sun, Oct 12, 2003 at 10:44:05PM +0200, Tony Mechelynck wrote:
      > > As long as 'fileencoding', 'printencoding' and (most important)
      > > 'termencoding' default (when empty) to whatever is the current value of
      > > 'encoding', the latter must not (IMHO) be set to UTF-8 by default.
      > >
      > > (Let's spell it out) In my humble opinion, Vim should require as little
      > > "tuning" as possible to handle the language interfaces the same way as the
      > > operating system does, and this means that, when the user sets nothing else
      > > in his startup and configuration files, keyboard input, printer output and
      > > file creation should default to whatever is set in the locale.
      >
      > This is a trivial fix, which I already proposed many months ago: the
      > defaults in Windows should be the results of
      >
      > exe "set fileencodings=ucs-bom,utf-8,cp" . getacp() . ",latin1"
      > exe "set fileencoding=cp" . getacp()
      >
      > and now adding:
      >
      > exe "set printencoding=cp" . getacp()

      The default that Vim starts with is 'encoding' set to the active
      codepage and 'fileencoding' set to "ucs-bom". This means it falls back
      to 'encoding' when there is no BOM. That should work almost the same
      way as what you give here, but without the explicit use of the codepage
      name. When the user sets 'encoding' the other ones follow. In your
      example the user has to set all three options.

      Perhaps setting 'termencoding' can be omitted if we can use the Unicode
      functions for keyboard input. Perhaps someone can figure out how to do
      this properly. And make use the input methods still work!

      > Note that "getacp" is a function in a patch I sent which was lost or
      > forgotton: return the ANSI codepage.

      Can't recall that patch. I generally give OS-specific additions a low
      priority.

      > Switching "encoding" to "utf-8" should be transparent, once proper
      > conversions for win32 calls are in place. Regular users don't care
      > about what encoding their editor uses internally, any more than they
      > care about what type of data structures they use.

      The problem still is that conversion from and to UTF-8 is not
      transparent. Especially when editing files with an unknown encoding.

      > On the other hand, if utf-8 internally is fully supported, then utf-8
      > can be the *only* internal encoding--which would make the rendering
      > code much simpler and more robust. I remember finding lots of little
      > errors in the renderer (eg. underlining glitches for double-width
      > characters) that went away with utf-8, and I don't think Vim renders
      > correctly at all if eg. "encoding" is set to "cp1242" and the ACP
      > is CP932 (needs a double conversion).

      UTF-8 is already fully supported in Vim. They may be a few glitches on
      the conversions though. The clipboard also still doesn't work 100%.

      --
      hundred-and-one symptoms of being an internet addict:
      182. You may not know what is happening in the world, but you know
      every bit of net-gossip there is.

      /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
      /// Creator of Vim - Vi IMproved -- http://www.Vim.org \\\
      \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
      \\\ Help AIDS victims, buy here: http://ICCF-Holland.org/click1.html ///
    • Camillo Särs
      ... Well, if I can t write a filename the way I need to write it, I have a problem. Fortunately this is mostly theoretic for me, but for some users resorting
      Message 2 of 29 , Oct 13, 2003
      • 0 Attachment
        Glenn Maynard wrote:
        > They don't break down, they're just imperfect.

        Well, if I can't write a filename the way I need to write it, I have a
        problem. Fortunately this is mostly theoretic for me, but for some users
        resorting to plain us-ascii is not a possibility. These mails are more an
        attempt at getting vim to work better than to improve my life. After all,
        I believe in contributing when I can, if only by highlighting problems and
        proposing solutions.

        > Vim should support UTF-8 in 9x, too.

        Of course, but with the necessary restrictions. Displaying unicode is a
        problem, as is entering filenames. Those functions are restricted to the
        ACP on Win9x.

        >>- Vim on NT does not work well with unicode/utf-8.
        >
        > It works well for many uses; I use enc=utf-8 exclusively, to edit files
        > in both UTF-8 (with characters well beyond CP1242 and CP932) and other
        > encodings.

        Yes, editing is not the problem. It's the system calls that cause the
        trouble, as we have established.

        >>- The fixes are fairly straightforward (use Unicode API, UTF-8 internally)
        >>- Win9x need to work in cp mode, but that's already supported
        >
        > No, convert between UTF-8 and the ACP and use the ANSI API calls. This
        > will make enc=utf-8 work in both 9x and NT.

        No it will not. You would then restrict NT users to their local code page
        only, and that's almost "reverting to DOS". On Win9x we need to stick to
        ACP, but on NT I don't see any reason not to go Unicode. Also, the UTF-8
        to UCS-2 mapping is quick and straightforward, with few hidden catches.
        Mapping utf-8 to ACP is tricky and lossy.

        Also, the code you had implemented already used the "W" APIs correctly. I
        don't understand why you would now advocate dropping widechar and unicode
        support.

        > Using Unicode calls when available is useful (eg. to display non-ACP
        > text in the titlebar), but that's "new feature" territory, not "bugfix".

        It is a bugfix. Currently, when using UTF-8 on WinNT, vim is broken in (at
        least) the following regards:

        - Opening non-ascii filenames, regardless of codepage
        å.txt internally becomes <e5>.txt

        - Saving filenames
        å.txt is saved in UTF-8 format (Ã¥.txt) and displayed incorrectly in
        title bar

        - The default termencoding should be set intelligently, UTF-8 as
        termencoding breaks input of non-ascii.

        - The default fileencoding breaks when "going UTF-8", most probably a
        better behavior would be to default to the ACP always.

        - Also, my vim (6.2) defaults to "latin1", not my current codepage. That
        would indicate that the ACP detection does not work.

        OK, the list above sounds like whining, but earlier I did suggest that the
        fixes are fairly straightforward.

        On WinNT, vim should use unicode apis, essentially benefitting
        automatically from NT native Unicode. This only involves one additional
        encoding/decoding step before calling the apis.

        On Win9x, vim should use ANSI apis. The only thing missing is again the
        encoding/decoding, although it's trickier with the ANSI apis. There are
        many cases where an user would enter UTF-8 stuff that doesn't smootly
        convert to the current CP. I think vim's current code should detect that
        easily.

        Camillo
        --
        Camillo Särs <+ged+@...> ** Aim for the impossible and you
        <http://www.iki.fi/+ged> ** will achieve the improbable.
        PGP public key available **
      • Bram Moolenaar
        ... On Windows NT/XP there are also restrictions, especially when using non-NTFS filesystems. There was a discussion about this in the Linux UTF-8 maillist a
        Message 3 of 29 , Oct 13, 2003
        • 0 Attachment
          Camillo wrote:

          > > Vim should support UTF-8 in 9x, too.
          >
          > Of course, but with the necessary restrictions. Displaying unicode is a
          > problem, as is entering filenames. Those functions are restricted to the
          > ACP on Win9x.

          On Windows NT/XP there are also restrictions, especially when using
          non-NTFS filesystems. There was a discussion about this in the Linux
          UTF-8 maillist a long time ago. There was no good universal solution
          for handling filenames that they could come up with.

          Vim could use Unicode functions for accessing files, but this will be a
          huge change. Requires lots of testing. Main problem is when 'encoding'
          is not a Unicode encoding, then conversions need to be done, which may
          fail.

          If you use filenames that cannot be represented in the active codepage,
          you probably have problems with other programs. Thus sticking with the
          active codepage functions isn't too bad. But then Vim needs to convert
          from 'encoding' to the active codepage!

          > It is a bugfix. Currently, when using UTF-8 on WinNT, vim is broken in (at
          > least) the following regards:
          >
          > - Opening non-ascii filenames, regardless of codepage
          > å.txt internally becomes <e5>.txt
          >
          > - Saving filenames
          > å.txt is saved in UTF-8 format (Ã¥.txt) and displayed incorrectly in
          > title bar

          The file names are handled as byte strings. Thus so long as you use the
          right bytes it should work. Problem is when you are typing/editing with
          a different encoding from the active codepage.

          > - The default termencoding should be set intelligently, UTF-8 as
          > termencoding breaks input of non-ascii.

          Why would 'termencoding' be "utf-8"? This won't work, unless you are
          using an xterm on MS-Windows. The default 'termencoding' is empty,
          which means 'encoding' is used. There is no better default. When you
          change 'encoding' you might have to change 'termencoding' as well, but
          this depends on your situation.

          > - The default fileencoding breaks when "going UTF-8", most probably a
          > better behavior would be to default to the ACP always.

          'fileencoding' is set when reading a file. Perhaps you mean
          'fileencodings'? This one needs to be tweaked by the user, because it
          depends on what kind of files you edit. Main problem is that an ASCII
          file can be any encoding, Vim can't detect what it is, thus the user has
          to specify what he wants Vim to do with it.

          > - Also, my vim (6.2) defaults to "latin1", not my current codepage. That
          > would indicate that the ACP detection does not work.

          Where does it use "latin1"? Not in 'encoding', I suppose.

          > OK, the list above sounds like whining, but earlier I did suggest that the
          > fixes are fairly straightforward.

          Mostly it's quite more complicated. Different users have different
          situations, it is hard to think of solutions that work for most people.

          > On WinNT, vim should use unicode apis, essentially benefitting
          > automatically from NT native Unicode. This only involves one additional
          > encoding/decoding step before calling the apis.

          The problem is that conversions to/from Unicode only work when you know
          the encoding of the text you are converting. The encoding isn't always
          known. Vim sometimes uses "latin1", so that you at least get 8-bit
          clean editing, even though the actual encoding is unknown.

          > On Win9x, vim should use ANSI apis. The only thing missing is again the
          > encoding/decoding, although it's trickier with the ANSI apis. There are
          > many cases where an user would enter UTF-8 stuff that doesn't smootly
          > convert to the current CP. I think vim's current code should detect that
          > easily.

          You can use a few Unicode functions on Win9x, we already do. I don't
          see a reason to change this.

          --
          I'm in shape. Round IS a shape.

          /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
          /// Creator of Vim - Vi IMproved -- http://www.Vim.org \\\
          \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
          \\\ Help AIDS victims, buy here: http://ICCF-Holland.org/click1.html ///
        • Camillo Särs
          ... Right, I forgot about those. AFAIK, the fuctions do not fail silently in those cases, so it s just (yet) more work. Essentially, file names then come
          Message 4 of 29 , Oct 13, 2003
          • 0 Attachment
            Bram Moolenaar wrote:
            > On Windows NT/XP there are also restrictions, especially when using
            > non-NTFS filesystems.

            Right, I forgot about those. AFAIK, the fuctions do not fail silently in
            those cases, so it's just (yet) more work. Essentially, file names then
            come from a restricted charset (code page limits).

            > There was a discussion about this in the Linux UTF-8 maillist a
            > long time ago. There was no good universal solution
            > for handling filenames that they could come up with.

            I bet. For many systems, the current behavior is adequate even if
            technically speaking wrong. I'm not trying to propose a universal
            solution, I'm just advocating the view that on win32, vim should do the
            "windows thing" with unicode/utf-8.

            > Vim could use Unicode functions for accessing files, but this will be a
            > huge change.

            Why so? The code earlier in this thread probably did much of what is
            needed. It also involved numerous other changes, which I ignored. I'm not
            being nosy, I'm just curious why this would be a "huge change". It's not
            the file contents we are getting at, it's the filenames (and the GUI).

            Also note that when using the native code page as the encoding (read:
            latin1), using the ANSI functions do work as expected. So the fixes would
            only need to concern the UTF-8 encoding, if you get picky. :)

            > Requires lots of testing.

            That's unicode for you. However, deriving a decent test set using
            available unicode test files should be a fairly straight-forward thing.

            > Main problem is when 'encoding' is not a Unicode encoding, then conversions
            > need to be done, which may fail.

            But what I assume you are doing now is even worse, isn't it? Essentially
            you are be feeding some user-selected encoding to functions that require
            ANSI characters. How's that for "a lot of testing"?

            Conversions from almost any encoding to unicode should work. I would not
            expect major trouble there. And note that if the conversion from the
            encoding to unicode fails, I expect that the current usage would fail even
            more severely. And there haven't been reports of that, has there?

            There certainly are tricky encodings that could cause problems. However,
            I'm mostly concerned with the basic use case of utf-8 and
            "fileencodings=ucs-bom,utf-8,latin1". This under a code page of cp1252.

            > If you use filenames that cannot be represented in the active codepage,
            > you probably have problems with other programs.

            But I have filenames that can be represented in the active code page
            (å.txt), but which get encoded into incompatible UTF-8 characters!

            > Thus sticking with the active codepage functions isn't too bad.

            If it worked that way, but it doesn't. Setting "encoding=utf-8" changes
            that behavior - only us-ascii is usable in filenames.

            > But then Vim needs to convert from 'encoding' to the active codepage!

            That would help most users. Including me. But it would not be the
            "ultimate" solution to unicode on win32, as it would still cause trouble
            with characters outside the codepage. As I see it, the easiest fix is
            actually using the unicode-api, as there are less (or no) conversion
            failures that way.

            > The file names are handled as byte strings. Thus so long as you use the
            > right bytes it should work. Problem is when you are typing/editing with
            > a different encoding from the active codepage.

            My point exactly! :)

            > Why would 'termencoding' be "utf-8"? This won't work, unless you are
            > using an xterm on MS-Windows.

            Yeah, but that's what you get if you just blindly do "set encoding=utf-8".
            Took me a while to figure that one out. I need to do "set
            termencoding=cp1252" first, or the "let &termencoding = &encoding". Not
            exactly transparent to non-experts.

            > The default 'termencoding' is empty, which means 'encoding' is used.
            > There is no better default.

            On Windows, I'd say "detect active code page" is the right choice.

            > When you change 'encoding' you might have to change 'termencoding' as
            > well, but this depends on your situation.

            As noted above, that's the unintuitive behavior I was getting at. A
            windows user, knowing that unicode is the native charset, does a "set
            encoding=utf-8" and expects things to work. They don't, but depending on
            the language, it may take a while before a non-ascii character is entered.

            >>- The default fileencoding breaks when "going UTF-8", most probably a
            >>better behavior would be to default to the ACP always.
            >
            > 'fileencoding' is set when reading a file. Perhaps you mean
            > 'fileencodings'? This one needs to be tweaked by the user, because it
            > depends on what kind of files you edit. Main problem is that an ASCII
            > file can be any encoding, Vim can't detect what it is, thus the user has
            > to specify what he wants Vim to do with it.

            Yes, I was unclear. Let me elaborate, although this point is rather
            exotic, and you can safely ignore me. :)

            When setting "encoding=utf-8", any new files will suddenly be utf-8 as
            well. For "ordinary" windows users, this may not be the desired result.
            What I was getting at was that *perhaps* the default fileencoding should be
            "cp####" in this case, unless the user explicitly sets it to something else
            (presumably utf-8). Before you object, yes, that's silly.

            Why use "encoding=utf-8" if you still want to create new files as ANSI?
            Well, quite a few windows applications don't do UTF-8. But using UTF-8
            internally still allows users to *transparently* edit existing
            unicode/utf-8 files without conversions.

            Anyway, I digress. This thought of mine was not that bright. Just forget it.

            >>- Also, my vim (6.2) defaults to "latin1", not my current codepage. That
            >>would indicate that the ACP detection does not work.
            >
            > Where does it use "latin1"? Not in 'encoding', I suppose.

            Yes. Without a _vimrc, I get:
            encoding=latin1
            fileencodings=ucs-bom
            termencoding=

            Thus changing the encoding only has funny effects.

            > Mostly it's quite more complicated. Different users have different
            > situations, it is hard to think of solutions that work for most people.

            Well, if you decide to make the unicode implementation work as it should,
            most people should be able to get what they want. It might involve a bit
            of tweaking, but nothing more.

            > The problem is that conversions to/from Unicode only work when you know
            > the encoding of the text you are converting. The encoding isn't always
            > known. Vim sometimes uses "latin1", so that you at least get 8-bit
            > clean editing, even though the actual encoding is unknown.

            I claim that on Windows, you should always have a good idea of the
            encoding. It's either explicitly set by the user, "cp####", or unicode.
            Windows has good support for converting ANSI to unicode, so this should be
            a non-issue. And again, as this is about non-UTF-8 data, you already have
            this problem anyway, because you are calling the ANSI functions with the
            "unknown" data. That it works should prove my point. ;-)

            But in the universal case, I agree with you.

            >>On Win9x, vim should use ANSI apis. The only thing missing is again the
            >>encoding/decoding, although it's trickier with the ANSI apis. There are
            >>many cases where an user would enter UTF-8 stuff that doesn't smootly
            >>convert to the current CP. I think vim's current code should detect that
            >>easily.
            >
            > You can use a few Unicode functions on Win9x, we already do. I don't
            > see a reason to change this.

            Sorry, I didn't want to imply that. I agree that we should stick to the
            unicode functions that are supported on Win9x, and only revert to ANSI
            "when forced".

            Camillo
            --
            Camillo Särs <+ged+@...> ** Aim for the impossible and you
            <http://www.iki.fi/+ged> ** will achieve the improbable.
            PGP public key available **
          • Bram Moolenaar
            ... Because every fopen(), stat() etc. will have to be changed. ... This only means extra work, since an if (encoding == ...) has to be added to select
            Message 5 of 29 , Oct 13, 2003
            • 0 Attachment
              Camillo wrote:

              > > Vim could use Unicode functions for accessing files, but this will be a
              > > huge change.
              >
              > Why so? The code earlier in this thread probably did much of what is
              > needed. It also involved numerous other changes, which I ignored. I'm not
              > being nosy, I'm just curious why this would be a "huge change". It's not
              > the file contents we are getting at, it's the filenames (and the GUI).

              Because every fopen(), stat() etc. will have to be changed.

              > Also note that when using the native code page as the encoding (read:
              > latin1), using the ANSI functions do work as expected. So the fixes would
              > only need to concern the UTF-8 encoding, if you get picky. :)

              This only means extra work, since an "if (encoding == ...)" has to be
              added to select between the traditional file access method and the
              Unicode method.

              > > Requires lots of testing.
              >
              > That's unicode for you. However, deriving a decent test set using
              > available unicode test files should be a fairly straight-forward thing.

              No, it's actually impossible to test this automatically. It involves
              creating various Win32 environments with code page settings, network
              filesystems and installed libraries. Only end-user tests can discover
              the real problems.

              > > Main problem is when 'encoding' is not a Unicode encoding, then conversions
              > > need to be done, which may fail.
              >
              > But what I assume you are doing now is even worse, isn't it? Essentially
              > you are be feeding some user-selected encoding to functions that require
              > ANSI characters. How's that for "a lot of testing"?

              The currently used functions work fine for accessing existing files.
              It's only when typing a new name or when displaying the name that
              problems may occur.

              > Conversions from almost any encoding to unicode should work. I would not
              > expect major trouble there. And note that if the conversion from the
              > encoding to unicode fails, I expect that the current usage would fail even
              > more severely. And there haven't been reports of that, has there?

              Main problem is that sometimes we don't know what the encoding is. In
              that situation you can treat the filename as a sequence of bytes in most
              places, but conversion is impossible. This happens more often than you
              would expect. Put a floppy disk or CD into your computer...

              There is also the situation that Vim uses the active codepage, but the
              file is actually in another encoding that could not be detected. Then
              doing "gf" on a filename will work if you don't do conversion, but it
              will fail if you try converting with the wrong encoding in mind.

              > > Thus sticking with the active codepage functions isn't too bad.
              >
              > If it worked that way, but it doesn't. Setting "encoding=utf-8" changes
              > that behavior - only us-ascii is usable in filenames.

              I don't see why. You can use a file selector to open any file and write
              it back under the same name. Vim doesn't need to know the encoding of
              the filename that way.

              If you type a file name in utf-8 it won't work properly, thus you have
              to use another method to obtain the file name. It's clumsy, I know.

              > > But then Vim needs to convert from 'encoding' to the active codepage!
              >
              > That would help most users. Including me. But it would not be the
              > "ultimate" solution to unicode on win32, as it would still cause trouble
              > with characters outside the codepage. As I see it, the easiest fix is
              > actually using the unicode-api, as there are less (or no) conversion
              > failures that way.

              As said above, this only works if we are 100% sure of what encoding the
              text (filename) is in, and we don't always know that.

              > > Why would 'termencoding' be "utf-8"? This won't work, unless you are
              > > using an xterm on MS-Windows.
              >
              > Yeah, but that's what you get if you just blindly do "set encoding=utf-8".
              > Took me a while to figure that one out. I need to do "set
              > termencoding=cp1252" first, or the "let &termencoding = &encoding". Not
              > exactly transparent to non-experts.

              Setting 'encoding' is full of side effects. There is a clear warning in
              the docs about this.

              > > The default 'termencoding' is empty, which means 'encoding' is used.
              > > There is no better default.
              >
              > On Windows, I'd say "detect active code page" is the right choice.

              I remember this was proposed before, I can't remember why we didn't do
              it this way. Windows is different here, since we can find out what the
              active codepage is. On Unix it's not that clear (e.g., depends on what
              options the xterm was started with). Consistency between systems is
              preferred.

              > >>- Also, my vim (6.2) defaults to "latin1", not my current codepage. That
              > >>would indicate that the ACP detection does not work.
              > >
              > > Where does it use "latin1"? Not in 'encoding', I suppose.
              >
              > Yes. Without a _vimrc, I get:
              > encoding=latin1
              > fileencodings=ucs-bom
              > termencoding=
              >
              > Thus changing the encoding only has funny effects.

              Your active codepage must be latin1 then. Vim gets the default from the
              active codepage.

              --
              hundred-and-one symptoms of being an internet addict:
              192. Your boss asks you to "go fer" coffee and you come up with 235 FTP sites.

              /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
              /// Creator of Vim - Vi IMproved -- http://www.Vim.org \\\
              \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
              \\\ Help AIDS victims, buy here: http://ICCF-Holland.org/click1.html ///
            • Camillo Särs
              ... Right. You re not using Windows apis, of course. But to do things correctly, you would have to make sure that the fopen() etc. implementations [in
              Message 6 of 29 , Oct 13, 2003
              • 0 Attachment
                Bram Moolenaar wrote:
                > Because every fopen(), stat() etc. will have to be changed.

                Right. You're not using Windows apis, of course. But to do things
                correctly, you would have to make sure that the fopen() etc.
                implementations [in Windows] either convert the strings they receive or
                only are called with valid Windows file names. Converting internally may
                be risky, because you'd need a way to convey the encoding into the functions.

                > Main problem is that sometimes we don't know what the encoding is.

                On Windows? I would disagree here. Any filesystem mounted by Windows
                should be mounted in a way that adheres to Windows naming conventions.
                We're not discussing file contents here.

                > In that situation you can treat the filename as a sequence of bytes in most
                > places, but conversion is impossible. This happens more often than you
                > would expect. Put a floppy disk or CD into your computer...

                So why convert it? :) The current display/saving problems stem from the
                fact that the file name is interpreted as UTF-8, a coding which Windows
                does not recognize for file names or strings.

                > There is also the situation that Vim uses the active codepage, but the
                > file is actually in another encoding that could not be detected. Then
                > doing "gf" on a filename will work if you don't do conversion, but it
                > will fail if you try converting with the wrong encoding in mind.

                AFAIK, Windows will internally convert the path into Unicode if you call
                the ANSI function. Thus if gf succeeds as you describe, it should succeed
                if you use the unicode api as well. In both cases a 8-bit binary string
                undergoes "cp2unicode" conversion.

                > I don't see why. You can use a file selector to open any file and write
                > it back under the same name.

                Uhm. Thanks. I'm so used to using :edit and :view that this alternative
                hadn't even crossed my mind.

                > If you type a file name in utf-8 it won't work properly, thus you have
                > to use another method to obtain the file name. It's clumsy, I know.

                But it's a workaround. But my title bar still is a mess.

                > As said above, this only works if we are 100% sure of what encoding the
                > text (filename) is in, and we don't always know that.

                We should be sure. And *if* we get it wrong, the user should be able to
                correct it.

                > I remember this was proposed before, I can't remember why we didn't do
                > it this way. Windows is different here, since we can find out what the
                > active codepage is. On Unix it's not that clear (e.g., depends on what
                > options the xterm was started with). Consistency between systems is
                > preferred.

                I would disagree on consistency here. On windows, the encoding is either
                ANSI or unicode, or then it has been explicitly set to something known.
                And as long as we know the encoding, let's use it.

                > Your active codepage must be latin1 then. Vim gets the default from the
                > active codepage.

                My code page is cp1252. It's not latin1 (iso-8859-1). In practice, both
                are 8-bit-raw.

                Camillo
                --
                Camillo Särs <+ged+@...> ** Aim for the impossible and you
                <http://www.iki.fi/+ged> ** will achieve the improbable.
                PGP public key available **
              • Tony Mechelynck
                ... [...] ... [...] Glenn Maynard wants encoding to default to utf-8 regardless of the active codepage. IMHO this would require termencoding to default,
                Message 7 of 29 , Oct 13, 2003
                • 0 Attachment
                  Bram Moolenaar <Bram@...> wrote:
                  > Camillo wrote:
                  [...]
                  > > - The default termencoding should be set intelligently, UTF-8 as
                  > > termencoding breaks input of non-ascii.
                  >
                  > Why would 'termencoding' be "utf-8"? This won't work, unless you are
                  > using an xterm on MS-Windows. The default 'termencoding' is empty,
                  > which means 'encoding' is used. There is no better default. When you
                  > change 'encoding' you might have to change 'termencoding' as well, but
                  > this depends on your situation.
                  [...]

                  Glenn Maynard wants 'encoding' to default to "utf-8" regardless of the
                  active codepage. IMHO this would require 'termencoding' to default, not to
                  the empty string, but to what is currently the default 'encoding', namely
                  the active codepage. Such change in the 'termencoding' default would (again,
                  IMHO) be a GoodThing anyway, since it would allow the keyboard to go on
                  working whether or not the user alters 'encoding'. Of course it is already
                  possible to do

                  if &termencoding == ""
                  let &termencoding = &encoding
                  endif

                  but wouldn't it make it easier to the user (more user friendly) to have
                  'termencoding' default to the ACP not implicitly (&termencoding == "" and
                  'encoding' set to the ACP) but explicitly (by defaulting 'termencoding' to a
                  nonempty value representing the active codepage)? -- And it would make the
                  above "if" statement unnecessary but not harmful, so existing scripts should
                  not be broken.

                  Regards,
                  Tony.
                • Tony Mechelynck
                  ... [...] ... Took me some figuring too. A few hours ago I uploaded my solution to vim-onlline (set_utf8.vim,
                  Message 8 of 29 , Oct 13, 2003
                  • 0 Attachment
                    Camillo Särs <ged@...> wrote:
                    > Bram Moolenaar wrote:
                    [...]
                    > > Why would 'termencoding' be "utf-8"? This won't work, unless you
                    > > are
                    > > using an xterm on MS-Windows.
                    >
                    > Yeah, but that's what you get if you just blindly do "set
                    > encoding=utf-8". Took me a while to figure that one out. I need to
                    > do "set termencoding=cp1252" first, or the "let &termencoding =
                    > &encoding". Not exactly transparent to non-experts.

                    Took me some figuring too. A few hours ago I uploaded my solution to
                    vim-onlline (set_utf8.vim,
                    http://vim.sourceforge.net/scripts/script.php?script_id=789 ). I hope it
                    will make it transparent to non-experts. Yet I still believe that defaulting
                    'termencoding' to the locale's charset would be better than leaving it
                    empty -- and such a change wouldn't break the above-mentioned script, you're
                    welcome to look at its source.
                    >
                    > > The default 'termencoding' is empty, which means 'encoding' is used.
                    > > There is no better default.
                    >
                    > On Windows, I'd say "detect active code page" is the right choice.
                    >
                    > > When you change 'encoding' you might have to change 'termencoding'
                    > > as
                    > > well, but this depends on your situation.
                    >
                    > As noted above, that's the unintuitive behavior I was getting at. A
                    > windows user, knowing that unicode is the native charset, does a "set
                    > encoding=utf-8" and expects things to work. They don't, but
                    > depending on
                    > the language, it may take a while before a non-ascii character is
                    > entered.
                    [...]

                    Regards,
                    Tony.
                  • Bram Moolenaar
                    ... A file name may appear in a file (e.g., a list of files in a README file). And I don t know what happens with file names on removable media (e.g., a CD).
                    Message 9 of 29 , Oct 13, 2003
                    • 0 Attachment
                      Camillo wrote:

                      > > Main problem is that sometimes we don't know what the encoding is.
                      >
                      > On Windows? I would disagree here. Any filesystem mounted by Windows
                      > should be mounted in a way that adheres to Windows naming conventions.
                      > We're not discussing file contents here.

                      A file name may appear in a file (e.g., a list of files in a README
                      file). And I don't know what happens with file names on removable media
                      (e.g., a CD). Probably depends on the file system it contains. And
                      networked file systems is another problem.

                      > > In that situation you can treat the filename as a sequence of bytes in most
                      > > places, but conversion is impossible. This happens more often than you
                      > > would expect. Put a floppy disk or CD into your computer...
                      >
                      > So why convert it? :) The current display/saving problems stem from the
                      > fact that the file name is interpreted as UTF-8, a coding which Windows
                      > does not recognize for file names or strings.

                      We need to locate places where the encoding is different from what a
                      system function expects. There are still a few things that need to be
                      fixed.

                      > > There is also the situation that Vim uses the active codepage, but the
                      > > file is actually in another encoding that could not be detected. Then
                      > > doing "gf" on a filename will work if you don't do conversion, but it
                      > > will fail if you try converting with the wrong encoding in mind.
                      >
                      > AFAIK, Windows will internally convert the path into Unicode if you call
                      > the ANSI function. Thus if gf succeeds as you describe, it should succeed
                      > if you use the unicode api as well. In both cases a 8-bit binary string
                      > undergoes "cp2unicode" conversion.

                      If Vim defaults to the active codepage then conversion to Unicode would
                      do the same as using the ANSI function. Thus it's only a problem when
                      'encoding' is different from the active codepage. And when 'encoding'
                      is a Unicode variant we can use the "W" functions. Still, this means
                      all fopen() and stat() calls must be adjusted. When 'encoding' is not
                      the active codepage we could either leave the file name untranslated (as
                      it's now) or convert it to Unicode. Don't know which one would work
                      best...

                      > > Your active codepage must be latin1 then. Vim gets the default from the
                      > > active codepage.
                      >
                      > My code page is cp1252. It's not latin1 (iso-8859-1). In practice, both
                      > are 8-bit-raw.

                      cp1252 and latin1 are not identical, but for practical use they can be
                      handled as the same encoding. Vim indeed uses this as the "raw" 8-bit
                      encoding that avoids messing up your characters when you don't know what
                      encoding it actually is.

                      --
                      hundred-and-one symptoms of being an internet addict:
                      194. Your business cards contain your e-mail and home page address.

                      /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                      /// Creator of Vim - Vi IMproved -- http://www.Vim.org \\\
                      \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
                      \\\ Help AIDS victims, buy here: http://ICCF-Holland.org/click1.html ///
                    • Glenn Maynard
                      Note that I ve upgraded, and I m not having problems with files saving incorrectly in enc=utf-8. The remaining problems are mostly cosmetic, except for not
                      Message 10 of 29 , Oct 13, 2003
                      • 0 Attachment
                        Note that I've upgraded, and I'm not having problems with files saving
                        incorrectly in enc=utf-8. The remaining problems are mostly cosmetic,
                        except for not being able to ":w 漢字.txt" with the ACP being Japanese.

                        On Mon, Oct 13, 2003 at 02:25:04PM +0200, Bram Moolenaar wrote:
                        > Because every fopen(), stat() etc. will have to be changed.

                        I don't think handling Unicode in filenames is worth it in Windows. It
                        takes so much work that the only applications I know of that support it
                        are ones that are compiled as native Unicode apps. The only exception
                        I've seen is FB2k.

                        It's certainly useful to be able to have multilingual filenames, but
                        Windows makes it so hard that people really wanting to do that probably
                        need a new OS.

                        > I don't see why. You can use a file selector to open any file and write
                        > it back under the same name. Vim doesn't need to know the encoding of
                        > the filename that way.

                        Consider the case where a filename in NT contains illegal data, eg. an
                        invalid two-byte SJIS sequence. When you call NT ANSI system calls, it
                        converts the buffers you pass it to WCHAR. That conversion would fail.

                        Are you worried about not being able to open files off eg. a slightly
                        corrupt/malformed floppy disc containing filenames that won't convert
                        cleanly? That seems no worse than not being able to use non-ACP
                        filenames. If that works, it seems a poor trade for not being able to
                        enter non-ASCII filenames in utf-8. ":w 漢字.txt" responding with
                        '"漢字.txt" [New]' and writing the filename correctly seems pretty
                        fundamental, for Japanese users on Japanese systems, and that doesn't
                        work with enc=utf-8.

                        > I remember this was proposed before, I can't remember why we didn't do
                        > it this way. Windows is different here, since we can find out what the
                        > active codepage is. On Unix it's not that clear (e.g., depends on what
                        > options the xterm was started with). Consistency between systems is
                        > preferred.

                        Windows and Unix handle encodings fundamentally differently, so complete
                        consistency means one or the other system not working as well. It seems
                        like "consistency to a fault". :)

                        Here's what I see, though: Windows APIs are always giving ACP or Unicode
                        data. Vim honors that for some code paths: input methods, copying to
                        and from the system clipboard. It ignores it and uses Unix paradigms
                        for others: filenames, most other ANSI calls.

                        The former, in my experience, work consistently; I can enter text with
                        the IME in both UTF-8 and CP932, and copy and paste reliably. The
                        latter do not: entered filenames don't work, non-ASCII text in the
                        titlebar shows <ab> hex values.

                        --
                        Glenn Maynard
                      • Camillo Särs
                        ... Both floppies, CDs and network file systems are mounted by windows, and some translation of file names happens. AFAIK, you should be able to access all
                        Message 11 of 29 , Oct 13, 2003
                        • 0 Attachment
                          Bram Moolenaar wrote:
                          > A file name may appear in a file (e.g., a list of files in a README
                          > file). And I don't know what happens with file names on removable media
                          > (e.g., a CD). Probably depends on the file system it contains. And
                          > networked file systems is another problem.

                          Both floppies, CDs and network file systems are mounted by windows, and
                          "some" translation of file names happens. AFAIK, you should be able to
                          access all files on such file systems using WindowsNT naming conventions.
                          The file names may not be exactly what you anticipated, but they are
                          guaranteed to stay constant.

                          > We need to locate places where the encoding is different from what a
                          > system function expects. There are still a few things that need to be
                          > fixed.

                          Yup. As I'm not familiar with the vim sources, I don't know how much work
                          this would mean in reality. However, the set of functions is or should be
                          known, and fairly limited.

                          > When 'encoding' is not the active codepage we could either leave
                          > the file name untranslated (as it's now) or convert it to Unicode.
                          > Don't know which one would work best...

                          Me neither. But I think that a conversion to unicode should be "fairly"
                          straight-forward, as it is what NT does natively anyway. This leads me to
                          think that Vim should do the conversion, as it knows the encoding. Or
                          let's say, it thinks it knows it. :)

                          Cheers,
                          Camillo
                          --
                          Camillo Särs <+ged+@...> ** Aim for the impossible and you
                          <http://www.iki.fi/+ged> ** will achieve the improbable.
                          PGP public key available **
                        • Bram Moolenaar
                          ... So, what you suggest is to keep using the ordinary file system functions. But we must make sure that the file name is then in the active codepage
                          Message 12 of 29 , Oct 14, 2003
                          • 0 Attachment
                            Glenn Maynard wrote:

                            > On Mon, Oct 13, 2003 at 02:25:04PM +0200, Bram Moolenaar wrote:
                            > > Because every fopen(), stat() etc. will have to be changed.
                            >
                            > I don't think handling Unicode in filenames is worth it in Windows. It
                            > takes so much work that the only applications I know of that support it
                            > are ones that are compiled as native Unicode apps. The only exception
                            > I've seen is FB2k.
                            >
                            > It's certainly useful to be able to have multilingual filenames, but
                            > Windows makes it so hard that people really wanting to do that probably
                            > need a new OS.

                            So, what you suggest is to keep using the ordinary file system
                            functions. But we must make sure that the file name is then in the
                            active codepage encoding. When obtaining the file name with a system
                            function (e.g., a directory listing or file browser) it will already be
                            in that encoding. But when the user types a file name it's in the
                            encoding specified with 'encoding'. This means we would need to convert
                            the file name from 'encoding' to the active codepage at some point.
                            And the reverse conversion is needed when using a filename as a text
                            string, e.g., for "%p and in the window title.

                            This is still complicated, but probably requires less changes than using
                            Unicode functions for all file access. I only foresee trouble when
                            'encoding' is set to a non-Unicode codepage different from the active
                            codepage and using a filename that contains non-ASCII characters.
                            Perhaps this situation is too weird to take into account?

                            > > I don't see why. You can use a file selector to open any file and write
                            > > it back under the same name. Vim doesn't need to know the encoding of
                            > > the filename that way.
                            >
                            > Consider the case where a filename in NT contains illegal data, eg. an
                            > invalid two-byte SJIS sequence. When you call NT ANSI system calls, it
                            > converts the buffers you pass it to WCHAR. That conversion would fail.
                            >
                            > Are you worried about not being able to open files off eg. a slightly
                            > corrupt/malformed floppy disc containing filenames that won't convert
                            > cleanly? That seems no worse than not being able to use non-ACP
                            > filenames. If that works, it seems a poor trade for not being able to
                            > enter non-ASCII filenames in utf-8. ":w $B4A;z(B.txt"
                            > responding with '"$B4A;z(B.txt" [New]' and writing the
                            > filename correctly seems pretty fundamental, for Japanese users on
                            > Japanese systems, and that doesn't work with enc=utf-8.

                            Yep, using conversions means failure is possible. And failure mostly
                            means the text is in a different encoding than expected. It would take
                            some time to figure out how to do this in a way that the user isn't
                            confused.

                            --
                            hundred-and-one symptoms of being an internet addict:
                            210. When you get a divorce, you don't care about who gets the children,
                            but discuss endlessly who can use the email address.

                            /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                            /// Creator of Vim - Vi IMproved -- http://www.Vim.org \\\
                            \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
                            \\\ Help AIDS victims, buy here: http://ICCF-Holland.org/click1.html ///
                          • Camillo Särs
                            ... While that may sound attractive at first, I would strongly dissuade from that solution. I consider it to be a myth that using multilingual filenames on
                            Message 13 of 29 , Oct 14, 2003
                            • 0 Attachment
                              Bram Moolenaar wrote:
                              > Glenn Maynard wrote:
                              >>It's certainly useful to be able to have multilingual filenames, but
                              >>Windows makes it so hard that people really wanting to do that probably
                              >>need a new OS.
                              >
                              > So, what you suggest is to keep using the ordinary file system
                              > functions. But we must make sure that the file name is then in the
                              > active codepage encoding.

                              While that may sound attractive at first, I would strongly dissuade from
                              that solution. I consider it to be a myth that using multilingual
                              filenames on Windows is hard. Under NT, it's should be a breeze for any
                              application that is even slightly Unicode-aware. When you decide to make
                              changes in Vim, it makes sense to look to the future and try to go the
                              "Unicode" way. XP Home Edition is gaining ground - fast.

                              Win9x is a mess, because it's just a version of DOS on hormones, and thus
                              is solidly entrenched in the single code page per application world. Using
                              the current code page should suffice there, though.

                              > This is still complicated, but probably requires less changes than using
                              > Unicode functions for all file access.

                              Why? I don't get it. You don't need to use Unicode functions for anything
                              except stuff that accepts strings. The current implementation is wrong,
                              because it feeds "encoding" text to ANSI functions. If you change it, I
                              don't see why doing a conversion to Unicode would be any different than a
                              conversion to ANSI, other than the fact than converting to ANSI is riskier.

                              <http://www.microsoft.com/globaldev/> contains a lot of useful info. Quote:

                              "All Win32 APIs that take a text argument either as an input or output
                              variable have been provided with a generic function prototype and two
                              definitions: a version that is based on code pages or ANSI (called "A") to
                              handle code page-based text argument and a wide version (called "W ") to
                              handle Unicode."

                              For 9x, you might be interested in the "Microsoft Layer for Unicode"

                              > I only foresee trouble when 'encoding' is set to a non-Unicode
                              > codepage different from the active codepage and using
                              > a filename that contains non-ASCII characters.
                              > Perhaps this situation is too weird to take into account?

                              As long as you know the correct code page, you can use Windows APIs to
                              convert correctly. They take the code page as an argument.

                              Camillo
                              --
                              Camillo Särs <+ged+@...> ** Aim for the impossible and you
                              <http://www.iki.fi/+ged> ** will achieve the improbable.
                              PGP public key available **
                            • Bram Moolenaar
                              ... Vim not only supports Unicode but also many other encodings. When Vim would only use Unicode it would be simple, but that s not the situation. And above
                              Message 14 of 29 , Oct 14, 2003
                              • 0 Attachment
                                Camillo wrote:

                                > While that may sound attractive at first, I would strongly dissuade from
                                > that solution. I consider it to be a myth that using multilingual
                                > filenames on Windows is hard. Under NT, it's should be a breeze for any
                                > application that is even slightly Unicode-aware. When you decide to make
                                > changes in Vim, it makes sense to look to the future and try to go the
                                > "Unicode" way. XP Home Edition is gaining ground - fast.

                                Vim not only supports Unicode but also many other encodings. When Vim
                                would only use Unicode it would be simple, but that's not the situation.
                                And above that, Vim is also used on many other systems, and we try to
                                make it work the same way everywhere.

                                > > This is still complicated, but probably requires less changes than using
                                > > Unicode functions for all file access.
                                >
                                > Why? I don't get it. You don't need to use Unicode functions for anything
                                > except stuff that accepts strings. The current implementation is wrong,
                                > because it feeds "encoding" text to ANSI functions. If you change it, I
                                > don't see why doing a conversion to Unicode would be any different than a
                                > conversion to ANSI, other than the fact than converting to ANSI is riskier.
                                >
                                > <http://www.microsoft.com/globaldev/> contains a lot of useful info. Quote:
                                >
                                > "All Win32 APIs that take a text argument either as an input or output
                                > variable have been provided with a generic function prototype and two
                                > definitions: a version that is based on code pages or ANSI (called "A") to
                                > handle code page-based text argument and a wide version (called "W ") to
                                > handle Unicode."

                                Eh, what happens when I use fopen() or stat()? There is no ANSI or wide
                                version of these functions. And certainly not one that also works on
                                non-Win32 systems. And when using the wide version conversion needs to
                                be done from 'encoding' to Unicode, thus the conversion has to be there
                                as well. That's going to be a lot of work (many #ifdefs) and will
                                probably introduce new bugs.

                                > For 9x, you might be interested in the "Microsoft Layer for Unicode"
                                >
                                > > I only foresee trouble when 'encoding' is set to a non-Unicode
                                > > codepage different from the active codepage and using
                                > > a filename that contains non-ASCII characters.
                                > > Perhaps this situation is too weird to take into account?
                                >
                                > As long as you know the correct code page, you can use Windows APIs to
                                > convert correctly. They take the code page as an argument.

                                As mentioned before, we are not always sure what encoding the text has.
                                Conversion is then likely to fail. This especially happens for 8-bit
                                encodings, there is no way to automatically check what encoding these
                                files are.

                                I think we need a smart solution that doesn't attempt to handle all
                                situations but works predictably.

                                --
                                hundred-and-one symptoms of being an internet addict:
                                218. Your spouse hands you a gift wrapped magnet with your PC's name
                                on it and you accuse him or her of genocide.

                                /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                                /// Creator of Vim - Vi IMproved -- http://www.Vim.org \\\
                                \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
                                \\\ Help AIDS victims, buy here: http://ICCF-Holland.org/click1.html ///
                              • Glenn Maynard
                                ... It s not at all a myth if you want code that is 1: portable and 2: works on 9x, too. (If you can deal with nonportable code, you can use Windows s TCHAR
                                Message 15 of 29 , Oct 14, 2003
                                • 0 Attachment
                                  > While that may sound attractive at first, I would strongly dissuade from
                                  > that solution. I consider it to be a myth that using multilingual
                                  > filenames on Windows is hard. Under NT, it's should be a breeze for any

                                  It's not at all a myth if you want code that is 1: portable and 2: works
                                  on 9x, too. (If you can deal with nonportable code, you can use Windows's
                                  TCHAR mechanism, and if you don't care about anything but NT, you can write
                                  a UTF-16-only app. Neither of these are the case here, though.)

                                  It's not "hard", it's just "incredibly annoying".

                                  On Tue, Oct 14, 2003 at 02:20:27PM +0200, Bram Moolenaar wrote:
                                  > This is still complicated, but probably requires less changes than using
                                  > Unicode functions for all file access. I only foresee trouble when
                                  > 'encoding' is set to a non-Unicode codepage different from the active
                                  > codepage and using a filename that contains non-ASCII characters.
                                  > Perhaps this situation is too weird to take into account?

                                  If "encoding" is not the ACP codepage, then the main problem is that the
                                  user can enter characters that Vim simply can't put into a filename
                                  (and in 9x, that the system can't, either).

                                  I'd just do a conversion, and if the conversion fails, warn appropriately.

                                  > Eh, what happens when I use fopen() or stat()? There is no ANSI or wide
                                  > version of these functions. And certainly not one that also works on
                                  > non-Win32 systems. And when using the wide version conversion needs to
                                  > be done from 'encoding' to Unicode, thus the conversion has to be there
                                  > as well. That's going to be a lot of work (many #ifdefs) and will
                                  > probably introduce new bugs.

                                  It's not that much work. Windows has _wfopen and _wstat. Vim already
                                  has those abstracted (mch_fopen, mch_stat), so conversions would only
                                  happen in one place (and in a place that's intended to be platform-
                                  specific, mch_*). I believe the code I linked earlier did exactly this.

                                  The only thing needed is sane error recovery.

                                  > Yep, using conversions means failure is possible. And failure mostly
                                  > means the text is in a different encoding than expected. It would take
                                  > some time to figure out how to do this in a way that the user isn't
                                  > confused.

                                  Well, bear in mind the non-ACP case that already exists. If I create
                                  "foo ♡.txt", and try to edit it with Vim, it edits "foo ?.txt" (which
                                  it can't write, either, since "?" is an invalid character in Windows
                                  filenames). I'd suggest that editing a file with an invalid character
                                  (eg. invalid SJIS sequence) behave identically to editing a file with
                                  a valid character that can't be referenced (eg. "foo ♡.txt").

                                  --
                                  Glenn Maynard
                                • Camillo Särs
                                  ... Agreed. There s no way around that. ... Sounds very promising. It would be really great if it turns out that the changes are fairly minor. That way
                                  Message 16 of 29 , Oct 14, 2003
                                  • 0 Attachment
                                    Glenn Maynard wrote:
                                    > If "encoding" is not the ACP codepage, then the main problem is that the
                                    > user can enter characters that Vim simply can't put into a filename
                                    > (and in 9x, that the system can't, either).
                                    >
                                    > I'd just do a conversion, and if the conversion fails, warn appropriately.

                                    Agreed. There's no way around that.

                                    > It's not that much work. Windows has _wfopen and _wstat. Vim already
                                    > has those abstracted (mch_fopen, mch_stat), so conversions would only
                                    > happen in one place (and in a place that's intended to be platform-
                                    > specific, mch_*). I believe the code I linked earlier did exactly this.
                                    >
                                    > The only thing needed is sane error recovery.

                                    Sounds very promising. It would be really great if it turns out that the
                                    changes are fairly minor. That way there's a chance they would get
                                    implemented. :)

                                    If you decide to try the proposed changes out, I'm prepared to do some
                                    testing on a Win32 binary build. Sorry, can't build myself. :(

                                    Camillo
                                    --
                                    Camillo Särs <+ged+@...> ** Aim for the impossible and you
                                    <http://www.iki.fi/+ged> ** will achieve the improbable.
                                    PGP public key available **
                                  • Bram Moolenaar
                                    ... It s more complicated then that. You can have filenames in the ACP, encoding and Unicode. Filenames are stored in various places inside Vim, which
                                    Message 17 of 29 , Oct 15, 2003
                                    • 0 Attachment
                                      Glenn Maynard wrote:

                                      > On Tue, Oct 14, 2003 at 02:20:27PM +0200, Bram Moolenaar wrote:
                                      > > This is still complicated, but probably requires less changes than using
                                      > > Unicode functions for all file access. I only foresee trouble when
                                      > > 'encoding' is set to a non-Unicode codepage different from the active
                                      > > codepage and using a filename that contains non-ASCII characters.
                                      > > Perhaps this situation is too weird to take into account?
                                      >
                                      > If "encoding" is not the ACP codepage, then the main problem is that the
                                      > user can enter characters that Vim simply can't put into a filename
                                      > (and in 9x, that the system can't, either).
                                      >
                                      > I'd just do a conversion, and if the conversion fails, warn appropriately.

                                      It's more complicated then that. You can have filenames in the ACP,
                                      'encoding' and Unicode. Filenames are stored in various places inside
                                      Vim, which encoding is used for each of them? Obviously, a filename
                                      stored in buffer text and registers has to use 'encoding'.

                                      It's less obvious what to use for internal structures, such as
                                      curbuf->b_ffname. When 'encoding' is a Unicode encoding we can use
                                      UTF-8, that can be converted to anything else. That also works when the
                                      active codepage is not Unicode, we can use the wide functions then.

                                      When 'encoding' is the active codepage (this is the default, should
                                      happen a lot), we can use the active codepage. That avoids conversions
                                      (which may fail). No need to use wide functions then.

                                      The real problem is when 'encoding' is not the active codepage and it's
                                      also not a Unicode encoding. We could simply skip the conversion then.
                                      That doesn't work properly for non-ASCII characters, but it's how it
                                      already works right now. The right way would be to convert the file
                                      name to Unicode and use the wide functions.

                                      I guess this means all filenames inside Vim are in 'encoding'. Where
                                      needed, conversion needs to be done from/to Unicode and the wide
                                      functions are to be used then.

                                      The main thing to implement now is using the wide functions when
                                      'encoding' is UTF-8. This only requires a simple conversion between
                                      UTF-8 and UCS-16. I'll be waiting for a patch...

                                      --
                                      hundred-and-one symptoms of being an internet addict:
                                      231. You sprinkle Carpet Fresh on the rugs and put your vacuum cleaner
                                      in the front doorway permanently so it always looks like you are
                                      actually attempting to do something about that mess that has amassed
                                      since you discovered the Internet.

                                      /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                                      /// Creator of Vim - Vi IMproved -- http://www.Vim.org \\\
                                      \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
                                      \\\ Help AIDS victims, buy here: http://ICCF-Holland.org/click1.html ///
                                    Your message has been successfully submitted and would be delivered to recipients shortly.