Loading ...
Sorry, an error occurred while loading the content.

994Re: Filename encodings under Win32

Expand Messages
  • Tony Mechelynck
    Oct 12, 2003
    • 0 Attachment
      Glenn Maynard <glenn@...> wrote:
      > On Sun, Oct 12, 2003 at 10:44:05PM +0200, Tony Mechelynck wrote:
      > > As long as 'fileencoding', 'printencoding' and (most important)
      > > 'termencoding' default (when empty) to whatever is the current
      > > value of 'encoding', the latter must not (IMHO) be set to UTF-8 by
      > > default.
      > >
      > > (Let's spell it out) In my humble opinion, Vim should require as
      > > little "tuning" as possible to handle the language interfaces the
      > > same way as the operating system does, and this means that, when
      > > the user sets nothing else in his startup and configuration files,
      > > keyboard input, printer output and file creation should default to
      > > whatever is set in the locale.
      >
      > This is a trivial fix, which I already proposed many months ago: the
      > defaults in Windows should be the results of
      >
      > exe "set fileencodings=ucs-bom,utf-8,cp" . getacp() . ",latin1"
      > exe "set fileencoding=cp" . getacp()
      >
      > and now adding:
      >
      > exe "set printencoding=cp" . getacp()
      >
      > Note that "getacp" is a function in a patch I sent which was lost or
      > forgotton: return the ANSI codepage.
      >
      > (A slightly safer default would be to remove "utf-8" from the search,
      > to prevent false matches.) I havn't found any problems with this;
      > it's been
      > my default for a long time and I actively edit UTF-8 and CP932 files.

      Trivial or not, my opinion is that handling files and keypresses as per the
      locale shouldn't be a "fix", it should be the (program) default. The "minor
      fix" consists of making Unicode the (user's) default by means of a config
      setting; but see below about that.
      >
      > > If the user wants to handle Unicode files, is is quite possible to
      > > set gvim to do it, even in Win98 systems like mine; but this
      > > requires, among other things, storing the previous value of
      > > 'encoding' into 'termencoding' because the user cannot, by a mere
      > > snap of the fingers, change his keyboard input from some national
      > > encoding to Unicode.
      >
      > The input in a Windows window is well-defined; "termencoding" should
      > not
      > even be needed in Windows. Depending on which messages are trapped,
      > the input is always in the ANSI codepage or Unicode.

      Sorry, but it is. AFAIK, leaving 'termencoding' empty when switching
      'encoding' over from something else to Unicode produces dysfunctions in the
      keyboard for all users whose actual keyboard encoding is other than 7-bit
      ASCII -- roughly speaking, for all users with a keyboard for a language
      other than English (even Dutchmen like Bram need, as a minimum, the
      "lowercase e with diaeresis", which is over 128, and therefore receives a
      different representation in UTF-8 and in other encodings -- the codepoint
      number maybe the same but it is not represented identically). That's why the
      lines

      if &termencoding == ""
      let &termencoding = &encoding
      endif

      have been put in my script set_utf8.vim (newly uploaded to vim.online),
      before the actual switch of 'encoding' ro utf-8. Thanks to this, any
      accented keys (and my own keyboard has a lot of them) go on working
      identically (i.e., transparently) after the switchover as they did before.
      Of course, making utf-8 the vim default for 'encoding' would break the above
      code, with (AFAIK) no possibility of repair in mainline Vim (which hasn't
      got the getacp() function -- and don't talk to me about a patch, I don't
      want to use other than standard binaries; for one thing, I don't have a
      compiler and I don't want to get one: messing about with nonstandard
      compilations is definitely not my cup o'tea). It would break it, I mean,
      unless the vim default for 'termencoding' would change from the empty string
      (i.e. use whatever is the current global Vim 'encoding' at the time a key is
      pressed) to the user's locale (as found in $LANG at startup). But let's keep
      things simple, not break existing scripts, reduce Bram and other people's
      workloads, and keep Vim's handling of encodings as it is (the only change
      I'd like to see is to add a functioning 'printencoding' option to Windows
      versions of gvim, even though they don't print through PostScript).
      >
      > However, if it's being used anyway for some reason, then the solution
      > is
      > the same:
      >
      > exe "set termencoding=cp" . getacp()
      >
      > The only reason I know of not to set "encoding" to "utf-8" is that Vim
      > doesn't do proper conversions for Win32 calls.

      Users who only edit files in a single 8 bit encoding don't need to bother
      about Unicode. For others, it is a useful choice, but I maintain that it
      should remain a choice, and, if the locale set in the operating system is
      not a Unicode one, it should IMHO remain a conscious choice (or at least a
      voluntary one, that need not stay conscious once it has been written into
      the vimrc).
      >
      > > used by (g)vim (namely, 'encoding', 'fileencoding', 'termencoding'
      > > and 'printencoding', as well as a possible 8-bit encoding at the
      > > end of 'fileencodings') should, as I believe they already do,
      > > default directly or indirectly to whatever is set in the locale,
      > > and that a possible switchover to Unicode should be left to the
      > > voluntary and reasoned choice of the user.
      >
      > Switching "encoding" to "utf-8" should be transparent, once proper
      > conversions for win32 calls are in place. Regular users don't care
      > about what encoding their editor uses internally, any more than they
      > care about what type of data structures they use.
      >
      > On the other hand, if utf-8 internally is fully supported, then utf-8
      > can be the *only* internal encoding--which would make the rendering
      > code much simpler and more robust. I remember finding lots of little
      > errors in the renderer (eg. underlining glitches for double-width
      > characters) that went away with utf-8, and I don't think Vim renders
      > correctly at all if eg. "encoding" is set to "cp1242" and the ACP
      > is CP932 (needs a double conversion).
      >
      > --
      > Glenn Maynard

      UTF-8 is fully supported (well, almost fully: characterwise
      bidirectionality, a Unicode property, isn't supported) internally by
      multi-byte versions of gvim, but switching over "transparently" from
      "locale-oriented" to "Unicode-oriented" working requires careful attention
      to several options, foremost of which are 'termencoding' and
      'fileencodings'. To help the ordinary Vim user make that switchover
      "transparently" without (as we say in French) "getting his feet caught in
      the carpet", I uploaded a few minutes ago a new script called set_utf8.vim :
      go see it at http://vim.sourceforge.net/scripts/script.php?script_id=789 .
      With it and a Unicode-enabled version of Vim (with no need for any special
      patches), switching over from one's national locale to Unicode becomes a
      one-liner (you may call it a "trivial fix"). The idea of that script is to
      work as "transparently" as possible, e.g., to avoid messing up the existing
      keyboard's or (if possible) printer's interpretation of accented characters.

      Regards,
      Tony.
    • Show all 29 messages in this topic