2717Re: Trouble getting started with vim and utf-8 file

  • Tony Mechelynck
    Apr 8, 2011
      On 08/04/11 07:33, DanKegel wrote:
      > The file http://winetricks.org/winetricks is, I hope, a utf-8 file,
      > but is not recognized as such in the vim that comes
      > with ubuntu 11.04 (with german locale, even).
      > It's mostly ascii, with just a few non-ascii lines, e.g.
      > # If you do not see an o with two dots over it here [ö], stop!
      > ...
      > mymenu="$HOME/.local/share/applications/wine/Programs/
      > Electronic Arts/Th
      > e Sims Medieval/The Sims™ Medieval.desktop"
      > That first line contains an o umlaut, and the second line contains the
      > trademark symbol.
      > Opening the file with vi winetricks shows
      > # If you do not see an o with two dots over it here [ö], stop!
      > ...
      > mymenu="$HOME/.local/share/applications/wine/Programs/
      > Electronic Arts/The Sims Medieval/The Simsâ<84>¢ Medieval.desktop"
      > which isn't right. Just opening up vi with no arguments, and doing
      > !!cat winetricks
      > brings the file in great, and the utf-8 chars look good, but then
      > saving it complains
      > "winetricks" CONVERSION ERROR in line 12328; 14640 lines, 496509
      > characters written
      > and yields a very corrupt file.
      > So what's going on? It seems that vim has decided the file Is Not
      > UTF-8. :se shows
      > fileencoding=latin1
      > fileencodings=ucs-bom,utf-8,default,latin1
      > even if I put
      > set encoding=utf8 fileencoding=utf8
      > in ~/.vimrc.
      > Help...
      > Thanks,
      > Dan

      I've downloaded that file in my browser, then tried to open it in Vim,
      which does not see it as UTF-8 even though I have 'enc' set to utf-8 and
      'fencs' set to ucs-bom,utf-8,latin1

      Intrigued, I hit 8g8 which brings me to line 7388 column 11 where the
      character µ ("micro" prefix, similar to Greek mu, 0xB5) cannot be UTF-8
      (bytes in the range 0x80 to 0xBF can only exist in UTF-8 as "trailing
      bytes" in a multibyte sequence whose first byte is 0xC0 or higher).
      Moving the cursor one position right and repeating gives me only a beep,
      so this is AFAICT the only illegal character in the file -- but one
      illegal byte in the whole file is enough to reject UTF-8 as the file's

      Rereading the file with

      :view ++enc=utf-8

      reads it as UTF-8 at the cost of an error message about line 7388, where
      the µ is now replaced by a question mark (but the o-umlaut at line 71
      appears as ö).

      It seems that your file is in UTF-8 at line 71 but in Latin1 at line
      7388, which means that it is the file's fault, not Vim's fault, that
      such a file cannot be displayed correctly.

      :help 8g8
      :help ++opt

      Best regards,
      Never hit a man with glasses. Hit him with a baseball bat.

