Loading ...
Sorry, an error occurred while loading the content.

Re: Trouble getting started with vim and utf-8 file

Expand Messages
  • Aleksey
    Here s what i ve found Opening this file in gVim doesn t show it right. Encoding detected is cp1251 (on my config) issuing this command ... did fine and
    Message 1 of 6 , Apr 8, 2011
    • 0 Attachment
      Here's what i've found

      Opening this file in gVim doesn't show it right. Encoding detected is
      cp1251 (on my config)

      issuing this command
      :e ++enc=utf-8

      did fine and displayed TM symbol, but it also gave warning about
      illegal byte at line 7388
      which looked so
      title="?Torrent 3.0" \
      Previous section had µ , so just replaced illegal char with it.

      Saving/opening from command line - works fine with encoding detected

      It doesn't answer your question, just a workaround

      On Apr 8, 9:33 am, DanKegel <daniel.r.ke...@...> wrote:
      > The filehttp://winetricks.org/winetricksis, I hope, a utf-8 file,
      > but is not recognized as such in the vim that comes
      > with ubuntu 11.04 (with german locale, even).
      > It's mostly ascii, with just a few non-ascii lines, e.g.
      >
      > #   If you do not see an o with two dots over it here [ö], stop!
      > ...
      >         mymenu="$HOME/.local/share/applications/wine/Programs/
      > Electronic Arts/Th
      > e Sims Medieval/The Sims™ Medieval.desktop"
      >
      > That first line contains an o umlaut, and the second line contains the
      > trademark symbol.
      >
      > Opening the file with vi winetricks shows
      >
      > #   If you do not see an o with two dots over it here [ö], stop!
      > ...
      >          mymenu="$HOME/.local/share/applications/wine/Programs/
      > Electronic Arts/The Sims Medieval/The Simsâ<84>¢ Medieval.desktop"
      >
      > which isn't right.  Just opening up vi with no arguments, and doing
      >   !!cat winetricks
      > brings the file in great, and the utf-8 chars look good, but then
      > saving it complains
      > "winetricks"  CONVERSION ERROR in line 12328; 14640 lines, 496509
      > characters written
      > and yields a very corrupt file.
      >
      > So what's going on?  It seems that vim has decided the file Is Not
      > UTF-8.  :se shows
      >   fileencoding=latin1
      >   fileencodings=ucs-bom,utf-8,default,latin1
      > even if I put
      >   set encoding=utf8 fileencoding=utf8
      > in ~/.vimrc.
      >
      > Help...
      >
      > Thanks,
      > Dan

      --
      You received this message from the "vim_multibyte" maillist.
      For more information, visit http://www.vim.org/maillist.php
    • Tony Mechelynck
      ... I ve downloaded that file in my browser, then tried to open it in Vim, which does not see it as UTF-8 even though I have enc set to utf-8 and fencs set
      Message 2 of 6 , Apr 8, 2011
      • 0 Attachment
        On 08/04/11 07:33, DanKegel wrote:
        > The file http://winetricks.org/winetricks is, I hope, a utf-8 file,
        > but is not recognized as such in the vim that comes
        > with ubuntu 11.04 (with german locale, even).
        > It's mostly ascii, with just a few non-ascii lines, e.g.
        >
        > # If you do not see an o with two dots over it here [ö], stop!
        > ...
        > mymenu="$HOME/.local/share/applications/wine/Programs/
        > Electronic Arts/Th
        > e Sims Medieval/The Sims™ Medieval.desktop"
        >
        > That first line contains an o umlaut, and the second line contains the
        > trademark symbol.
        >
        > Opening the file with vi winetricks shows
        >
        > # If you do not see an o with two dots over it here [ö], stop!
        > ...
        > mymenu="$HOME/.local/share/applications/wine/Programs/
        > Electronic Arts/The Sims Medieval/The Simsâ<84>¢ Medieval.desktop"
        >
        > which isn't right. Just opening up vi with no arguments, and doing
        > !!cat winetricks
        > brings the file in great, and the utf-8 chars look good, but then
        > saving it complains
        > "winetricks" CONVERSION ERROR in line 12328; 14640 lines, 496509
        > characters written
        > and yields a very corrupt file.
        >
        > So what's going on? It seems that vim has decided the file Is Not
        > UTF-8. :se shows
        > fileencoding=latin1
        > fileencodings=ucs-bom,utf-8,default,latin1
        > even if I put
        > set encoding=utf8 fileencoding=utf8
        > in ~/.vimrc.
        >
        > Help...
        >
        > Thanks,
        > Dan
        >

        I've downloaded that file in my browser, then tried to open it in Vim,
        which does not see it as UTF-8 even though I have 'enc' set to utf-8 and
        'fencs' set to ucs-bom,utf-8,latin1

        Intrigued, I hit 8g8 which brings me to line 7388 column 11 where the
        character µ ("micro" prefix, similar to Greek mu, 0xB5) cannot be UTF-8
        (bytes in the range 0x80 to 0xBF can only exist in UTF-8 as "trailing
        bytes" in a multibyte sequence whose first byte is 0xC0 or higher).
        Moving the cursor one position right and repeating gives me only a beep,
        so this is AFAICT the only illegal character in the file -- but one
        illegal byte in the whole file is enough to reject UTF-8 as the file's
        'fileencoding'.

        Rereading the file with

        :view ++enc=utf-8

        reads it as UTF-8 at the cost of an error message about line 7388, where
        the µ is now replaced by a question mark (but the o-umlaut at line 71
        appears as ö).

        It seems that your file is in UTF-8 at line 71 but in Latin1 at line
        7388, which means that it is the file's fault, not Vim's fault, that
        such a file cannot be displayed correctly.

        See
        :help 8g8
        :help ++opt


        Best regards,
        Tony.
        --
        Never hit a man with glasses. Hit him with a baseball bat.

        --
        You received this message from the "vim_multibyte" maillist.
        For more information, visit http://www.vim.org/maillist.php
      • Dan Kegel
        Thanks very much, guys! -- You received this message from the vim_multibyte maillist. For more information, visit http://www.vim.org/maillist.php
        Message 3 of 6 , Apr 8, 2011
        • 0 Attachment
          Thanks very much, guys!

          --
          You received this message from the "vim_multibyte" maillist.
          For more information, visit http://www.vim.org/maillist.php
        • John Beckett
          ... It looks like you created that file, so you need to fix it because it is not UTF-8. Downloading the file with wget and dumping the bytes shows that the
          Message 4 of 6 , Apr 8, 2011
          • 0 Attachment
            DanKegel wrote:
            > The file http://winetricks.org/winetricks is, I hope, a utf-8
            > file, but is not recognized as such in the vim that comes
            > with ubuntu 11.04 (with german locale, even).

            It looks like you created that file, so you need to fix it
            because it is not UTF-8.

            Downloading the file with wget and dumping the bytes shows that
            the character which I have shown as "?" in the following is not
            valid UTF-8:
            title="?Torrent 3.0" \

            That single byte is hex B5 or binary 10110101. That starts with
            "10" which is never valid as the first byte of a character in
            UTF-8.

            BTW you can find that in Vim by opening the file and typing 8g8
            which jumps to the next illegal byte sequence, then typing ga to
            show the value.

            John

            --
            You received this message from the "vim_multibyte" maillist.
            For more information, visit http://www.vim.org/maillist.php
          • Dan Kegel
            ... Yeah, that s what I gathered from the other replies. Thanks! - Dan -- You received this message from the vim_multibyte maillist. For more information,
            Message 5 of 6 , Apr 8, 2011
            • 0 Attachment
              On Sat, Apr 9, 2011 at 12:13 AM, John Beckett <johnb.beckett@...> wrote:
              > It looks like you created that file, so you need to fix it
              > because it is not UTF-8.
              >
              > Downloading the file with wget and dumping the bytes shows that
              > the character which I have shown as "?" in the following is not
              > valid UTF-8:
              >   title="?Torrent 3.0" \
              >
              > That single byte is hex B5 or binary 10110101. That starts with
              > "10" which is never valid as the first byte of a character in
              > UTF-8.
              >
              > BTW you can find that in Vim by opening the file and typing 8g8
              > which jumps to the next illegal byte sequence, then typing ga to
              > show the value.

              Yeah, that's what I gathered from the other replies.
              Thanks!
              - Dan

              --
              You received this message from the "vim_multibyte" maillist.
              For more information, visit http://www.vim.org/maillist.php
            Your message has been successfully submitted and would be delivered to recipients shortly.