Loading ...
Sorry, an error occurred while loading the content.

An Internationalization Problem of Vim 6.3 (and earlier) Windows version

Expand Messages
  • Wu Yongwei
    I have been using gVim since 6.1. One bug seems always there. Maybe it is because there are few Chinese users :-). I ll describe it in detail now. GVim can
    Message 1 of 5 , Jan 31, 2005
    View Source
    • 0 Attachment
      I have been using gVim since 6.1. One bug seems always there. Maybe it is
      because there are few Chinese users :-). I'll describe it in detail now.

      GVim can display Chinese menus and Chinese messages. It is GOOD. It can
      handle UTF-8 files in Windows code page 936 (Simplified Chinese), if the
      file has the BOM mark. Good too. What is not good is when the file has no
      BOM, and I have to specify "set encoding=UTF-8" manually. Then, if I press
      "I", and the "Chinese" for "INSERT" cannot display: something corrupt is
      there. All "Chinese" messages (in title, status line, etc.) are corrupt.
      It seems vim interpreted GB2312 sequences as UTF-8 sequences. The new
      console version suffers the same problem.

      I am newbie here. I hope it is not an old issue.

      Thanks and best regards,

      Yongwei
    • Antoine J. Mechelynck
      ... encoding defines how Vim represents the data internally. It has nothing to do with the BOM. When a BOM is present in a file, and fileencodings (plural)
      Message 2 of 5 , Jan 31, 2005
      View Source
      • 0 Attachment
        Wu Yongwei wrote:
        > I have been using gVim since 6.1. One bug seems always there. Maybe it is
        > because there are few Chinese users :-). I'll describe it in detail now.
        >
        > GVim can display Chinese menus and Chinese messages. It is GOOD. It can
        > handle UTF-8 files in Windows code page 936 (Simplified Chinese), if the
        > file has the BOM mark. Good too. What is not good is when the file has no
        > BOM, and I have to specify "set encoding=UTF-8" manually. Then, if I press
        > "I", and the "Chinese" for "INSERT" cannot display: something corrupt is
        > there. All "Chinese" messages (in title, status line, etc.) are corrupt.
        > It seems vim interpreted GB2312 sequences as UTF-8 sequences. The new
        > console version suffers the same problem.
        >
        > I am newbie here. I hope it is not an old issue.
        >
        > Thanks and best regards,
        >
        > Yongwei
        >
        >
        >
        'encoding' defines how Vim represents the data internally. It has
        nothing to do with the BOM. When a BOM is present in a file, and
        'fileencodings' (plural) includes "ucs-bom" (preferably at the start)
        then Vim automatically sets the proper 'fileencoding' (singular) as
        defined by the BOM.

        If all the Unicode codepoints in your file have equivalents in cp936,
        then you should be able to edit that file with 'encoding' set to cp936;
        if the file is in UTF-8 but has no BOM, you might have to read the file with

        :e ++enc=utf-8 filename.txt

        which does the equivalent of ":setlocal fileencoding=utf-8" on that
        file's buffer.

        OTOH, if you want to edit, say, a file in Simplified Chinese in parallel
        with a file in Traditional Chinese, then you may need to set 'encoding'
        to UTF-8 anyway, in order to access both families of glyphs (and you may
        need to either find a font which has both, or change the 'guifont' at
        the WinEnter autocommand event in accordance with the current file's
        charset).

        Summary:
        'encoding' = how Vim represents the data internally
        'fileencoding' = how the data is encoded on disc
        'fileencodings' = the heuristic Vim uses to determine the
        'fileencoding' of a file being read
        'printencoding' = how the data is encoded for printing
        'termencoding' = how the data is represented on keyboard input, and
        (not GUI) on console output
        ++enc= specifies a particular 'fileencoding' for reading or writing
        (see ":help ++opt")
        'bomb' (boolean) specifies whether a B.O.M. (codepoint U+FEFF
        ZERO-WIDTH NO-BREAK SPACE) is present at the start of the file.


        When changing 'encoding', Vim ought to be able to find the new
        representations of the translated messages; apparently it doesn't always
        do so perfectly. But (since I use ":language messages en") I don't know
        the details.


        Best regards,
        Tony.
      • Wu Yongwei
        Thanks for the useful info! I used set encoding because set fileencoding cannot change the encoding interpretation of the file (so is setlocal
        Message 3 of 5 , Jan 31, 2005
        View Source
        • 0 Attachment
          Thanks for the useful info!

          I used "set encoding" because "set fileencoding" cannot change the
          encoding interpretation of the file (so is setlocal fileencoding), while
          "set encoding" always works, though with a bad side-effect. Now with your
          help, I "set fileencodings=ucs-bom,utf-8,chinese" in _vimrc, and have most
          of my problems solved. So the incorrect message problem can be worked
          around now.

          Now one remaining problem: Is it possible to open a file with specified
          encoding on the command line? I know the following works:

          gvim -c "e ++enc=... ..."

          Anything shorter?

          Best regards,

          Yongwei





          "Antoine J. Mechelynck" <antoine.mechelynck@...>
          2005-01-31 19:58


          To: Wu Yongwei <adah@...>
          CC: vim@...
          Subject: Re: An Internationalization Problem of Vim 6.3 (and earlier) Windows
          version

          Wu Yongwei wrote:
          > I have been using gVim since 6.1. One bug seems always there. Maybe it
          is
          > because there are few Chinese users :-). I'll describe it in detail now.
          >
          > GVim can display Chinese menus and Chinese messages. It is GOOD. It can
          > handle UTF-8 files in Windows code page 936 (Simplified Chinese), if the

          > file has the BOM mark. Good too. What is not good is when the file has
          no
          > BOM, and I have to specify "set encoding=UTF-8" manually. Then, if I
          press
          > "I", and the "Chinese" for "INSERT" cannot display: something corrupt is

          > there. All "Chinese" messages (in title, status line, etc.) are corrupt.

          > It seems vim interpreted GB2312 sequences as UTF-8 sequences. The new
          > console version suffers the same problem.
          >
          > I am newbie here. I hope it is not an old issue.
          >
          > Thanks and best regards,
          >
          > Yongwei
          >
          >
          >
          'encoding' defines how Vim represents the data internally. It has
          nothing to do with the BOM. When a BOM is present in a file, and
          'fileencodings' (plural) includes "ucs-bom" (preferably at the start)
          then Vim automatically sets the proper 'fileencoding' (singular) as
          defined by the BOM.

          If all the Unicode codepoints in your file have equivalents in cp936,
          then you should be able to edit that file with 'encoding' set to cp936;
          if the file is in UTF-8 but has no BOM, you might have to read the file
          with

          :e ++enc=utf-8 filename.txt

          which does the equivalent of ":setlocal fileencoding=utf-8" on that
          file's buffer.

          OTOH, if you want to edit, say, a file in Simplified Chinese in parallel
          with a file in Traditional Chinese, then you may need to set 'encoding'
          to UTF-8 anyway, in order to access both families of glyphs (and you may
          need to either find a font which has both, or change the 'guifont' at
          the WinEnter autocommand event in accordance with the current file's
          charset).

          Summary:
          'encoding' = how Vim represents the data internally
          'fileencoding' = how the data is encoded on disc
          'fileencodings' = the heuristic Vim uses to determine the

          'fileencoding' of a file being read
          'printencoding' = how the data is encoded for printing
          'termencoding' = how the data is represented on keyboard
          input, and
          (not GUI) on console output
          ++enc= specifies a particular 'fileencoding' for reading
          or writing
          (see ":help ++opt")
          'bomb' (boolean) specifies whether a B.O.M. (codepoint
          U+FEFF
          ZERO-WIDTH NO-BREAK SPACE) is present at the start of the file.


          When changing 'encoding', Vim ought to be able to find the new
          representations of the translated messages; apparently it doesn't always
          do so perfectly. But (since I use ":language messages en") I don't know
          the details.


          Best regards,
          Tony.
        • panshizhu@routon.com
          ... see *modeline* in it, it may help. -- Sincerely Pan, Shizhu. ext: 2221 ... most ... now. ... the ... is ... corrupt. ... the
          Message 4 of 5 , Jan 31, 2005
          View Source
          • 0 Attachment
            :help auto-setting

            see *modeline* in it, it may help.

            --
            Sincerely
            Pan, Shizhu. ext: 2221




            Wu Yongwei <adah@...> wrote on 2005-02-01 11:48:55:

            > Thanks for the useful info!
            >
            > I used "set encoding" because "set fileencoding" cannot change the
            > encoding interpretation of the file (so is setlocal fileencoding), while
            > "set encoding" always works, though with a bad side-effect. Now with your

            > help, I "set fileencodings=ucs-bom,utf-8,chinese" in _vimrc, and have
            most
            > of my problems solved. So the incorrect message problem can be worked
            > around now.
            >
            > Now one remaining problem: Is it possible to open a file with specified
            > encoding on the command line? I know the following works:
            >
            > gvim -c "e ++enc=... ..."
            >
            > Anything shorter?
            >
            > Best regards,
            >
            > Yongwei
            >
            >
            >
            >
            >
            > "Antoine J. Mechelynck" <antoine.mechelynck@...>
            > 2005-01-31 19:58
            >
            >
            > To: Wu Yongwei <adah@...>
            > CC: vim@...
            > Subject: Re: An Internationalization Problem of Vim
            > 6.3 (and earlier) Windows
            > version
            >
            > Wu Yongwei wrote:
            > > I have been using gVim since 6.1. One bug seems always there. Maybe it
            > is
            > > because there are few Chinese users :-). I'll describe it in detail
            now.
            > >
            > > GVim can display Chinese menus and Chinese messages. It is GOOD. It can

            > > handle UTF-8 files in Windows code page 936 (Simplified Chinese), if
            the
            >
            > > file has the BOM mark. Good too. What is not good is when the file has
            > no
            > > BOM, and I have to specify "set encoding=UTF-8" manually. Then, if I
            > press
            > > "I", and the "Chinese" for "INSERT" cannot display: something corrupt
            is
            >
            > > there. All "Chinese" messages (in title, status line, etc.) are
            corrupt.
            >
            > > It seems vim interpreted GB2312 sequences as UTF-8 sequences. The new
            > > console version suffers the same problem.
            > >
            > > I am newbie here. I hope it is not an old issue.
            > >
            > > Thanks and best regards,
            > >
            > > Yongwei
            > >
            > >
            > >
            > 'encoding' defines how Vim represents the data internally. It has
            > nothing to do with the BOM. When a BOM is present in a file, and
            > 'fileencodings' (plural) includes "ucs-bom" (preferably at the start)
            > then Vim automatically sets the proper 'fileencoding' (singular) as
            > defined by the BOM.
            >
            > If all the Unicode codepoints in your file have equivalents in cp936,
            > then you should be able to edit that file with 'encoding' set to cp936;
            > if the file is in UTF-8 but has no BOM, you might have to read the file
            > with
            >
            > :e ++enc=utf-8 filename.txt
            >
            > which does the equivalent of ":setlocal fileencoding=utf-8" on that
            > file's buffer.
            >
            > OTOH, if you want to edit, say, a file in Simplified Chinese in parallel
            > with a file in Traditional Chinese, then you may need to set 'encoding'
            > to UTF-8 anyway, in order to access both families of glyphs (and you may
            > need to either find a font which has both, or change the 'guifont' at
            > the WinEnter autocommand event in accordance with the current file's
            > charset).
            >
            > Summary:
            > 'encoding' = how Vim represents the data internally
            > 'fileencoding' = how the data is encoded on disc
            > 'fileencodings' = the heuristic Vim uses to determine
            the
            >
            > 'fileencoding' of a file being read
            > 'printencoding' = how the data is encoded for printing
            > 'termencoding' = how the data is represented on keyboard

            > input, and
            > (not GUI) on console output
            > ++enc= specifies a particular 'fileencoding' for reading

            > or writing
            > (see ":help ++opt")
            > 'bomb' (boolean) specifies whether a B.O.M. (codepoint
            > U+FEFF
            > ZERO-WIDTH NO-BREAK SPACE) is present at the start of the file.
            >
            >
            > When changing 'encoding', Vim ought to be able to find the new
            > representations of the translated messages; apparently it doesn't always
            > do so perfectly. But (since I use ":language messages en") I don't know
            > the details.
            >
            >
            > Best regards,
            > Tony.
            >
          • Antoine J. Mechelynck
            ... I don t know. I m just a plain user like you, not an official after all. I m forwarding your message to the list to other users may answer. Best regards,
            Message 5 of 5 , Feb 1, 2005
            View Source
            • 0 Attachment
              Wu Yongwei wrote:
              > Althoug my practical problem could be considered solved, I am still
              > wondering about the problem. Are the localized messages not encoded in
              > Unicode? After all, VIM's behaviour is bizarre if "set encoding" will
              > change the message encoding, along with the interpretation of the buffer
              > content? Is it by design?
              >
              > And how about VIM 7? Will it have the problem, too?
              >
              > Best regards,
              >
              > Yongwei

              I don't know. I'm just a plain user like you, not an official after all.
              I'm forwarding your message to the list to other users may answer.

              Best regards,
              Tony.
            Your message has been successfully submitted and would be delivered to recipients shortly.