Loading ...
Sorry, an error occurred while loading the content.
 

Re: UTF-8 puzzle

Expand Messages
  • Tony Mechelynck
    ... If the program which will read the text can handle a BOM, use it, even with UTF-8. In Vim, this is done with :setlocal bomb . It is usable for HTML, CSS,
    Message 1 of 2 , Dec 20, 2011
      On Dec 19, 3:31 pm, Gabriel <snoopy.6...@...> wrote:
      > I am using MacVim 7.3 under Lion.
      >
      > Now here is something that puzzles me with UTF8 files.
      >
      > I write a small text file in MacVim, saved it with utf-8 file
      > encoding, opened it in TextEdit, and I get all these funny square-root
      > characters instead of umlauts.
      >
      > And vice versa, I wrote simple text file in TextEdit, set the
      > preferences of TextEdit such that it saves files always as UTF-8,
      > saved the file, opened it in MacVim -- and I get all these funny <9c>
      > characters instead of umlauts.
      >
      > I also experimented a little with the extended attribute
      > com.apple.TextEncoding, to no avail.
      >
      > Can anybody tell me, which is the correct way to write/save UTF-8
      > files?
      >
      > Best regards,
      > Gabriel.

      If the program which will read the text can handle a BOM, use it, even
      with UTF-8. In Vim, this is done with ":setlocal bomb". It is usable
      for HTML, CSS, and some others. It is not usable for anything where
      the first line must start with #! (Unix shell scripts, Perl, ...). I
      think it doesn't work with C/C++ either but I could be wrong.

      For XML and HTML files the charset can also be declared in the file,
      as follows:

      <?xml version="1.0" encoding="UTF-8"?>
      or
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

      but (unlike the BOM) these are not recognised by Vim's fileencoding
      detection heuristics

      Now the umlauts:
      Character - Latin1 hex - UTF-8 hex - UTF-8 mistranslated as Latin1
      ä E4 C3 A4 ä
      Ä C4 C3 84 Ã
      ë EB C3 AB Ã<<
      Ë CB C3 8B Ã
      etc. (Latin 1 hex is also Unicode ordinal, i.e. ä <-> U+00E4 etc.)

      See also
      :help ga
      :help g8
      about how I found the above, and
      :help 'isprint'
      :help 'isfname'
      about how to tell vim which characters are "printable".

      --
      You received this message from the "vim_mac" maillist.
      Do not top-post! Type your reply below the text you are replying to.
      For more information, visit http://www.vim.org/maillist.php
    Your message has been successfully submitted and would be delivered to recipients shortly.