Loading ...
Sorry, an error occurred while loading the content.

Re: Keeping the original encoding.

Expand Messages
  • Tony Mechelynck
    ... Normally, Vim should detect the encoding and remember how to translate to disk what it has in memory. encoding is the representation of the characters in
    Message 1 of 2 , Feb 20, 2010
    • 0 Attachment
      On 20/02/10 09:16, kroiz wrote:
      > Hi
      > Some text files that I open are not readable and have weird signs,
      > The bom for this files is ff fe (UTF-16 BE according to wikipedia)
      > vim has fencs set to ucs-bom
      > it is vim version 7.2 on windows XP
      > I can load the files fine if before I load them I do
      > set encoding = utf-8
      > but I don't want to change the encoding of the file when saving.
      > Is there a way to that?
      >
      > thanks
      > Guy Kroizman
      >

      Normally, Vim should detect the encoding and remember how to translate
      to disk what it has in memory.

      'encoding' is the representation of the characters in Vim's internal
      memory. UTF-16 cannot be used because it has too many null bytes, which
      would terminate the C strings used by Vim (and BTW, FF FE is UTF-16le,
      not -be), but UTF-8 is capable of representing the characters of all
      charsets used on any computer, so if you set that, you're safe.

      When loading an already existing file, Vim uses a heuristic defined by
      the option 'fileencodings' (with s at the end): this is a
      comma-separated lists of possible charsets, as follows:

      - ucs-bom, if used (and it is recommended that it _be_ used) should come
      first
      - There should be no more than one 8-bit charset, and it should come last
      - Charsets are tried from left to right, and the first one which doesn't
      give an error signal is used to read the file. (That's why any 8-bit
      charset used should be last: such charsets cannot give an error signal).

      A typical value is: :set fencs=ucs-bom,utf-8,latin1

      This will correctly detect any Unicode file which has a BOM, or failing
      that Vim will try UTF-8, and if the file is not valid UTF-8 the file
      will then be shown to you under the assumption that it is Latin1. Vim
      stores the disk charset of the file in the local string option
      'fileencoding' for that file, and the presence or absence of a BOM in
      the local Boolean option 'bomb'. IOW, if the file you mentioned has been
      read correctly,

      :setlocal fileencoding? bomb?
      or, if (like me) you're lazy,
      :setl fenc? bomb?

      sould reply

      fileencoding=utf-16le
      bomb

      That means everything is OK, and that you don't need to do anything to
      record the file in the correct encoding -- :w or :wq will know how to do
      the required translation from the UTF-8 representation in memory.

      If you see something else, you can save the file in UTF-16le with BOM
      (which, then, will *not* be the original charset of the file) by doing

      :setlocal fenc=utf-16le bomb

      before you save the file. (Use :setlocal, not just :set, because the
      latter alters what will happen to _other_ files, especially the new ones
      you create thereafter.)

      For more info, and pointers to the relevant information in the help, see
      http://vim.wikia.com/wiki/Working_with_Unicode


      Best regards,
      Tony.
      --
      What good is having someone who can walk on water if you don't follow
      in his footsteps?

      --
      You received this message from the "vim_multibyte" maillist.
      For more information, visit http://www.vim.org/maillist.php
    Your message has been successfully submitted and would be delivered to recipients shortly.