Loading ...
Sorry, an error occurred while loading the content.

Re: Unexpected behavior loading cp1252 file as latin1

Expand Messages
  • Rhialto
    ... If this means that I get cp1252 characters in my file which I tried to keep pure Latin 1, this is very wrong... my system doesn t display those obnoxious
    Message 1 of 13 , Feb 1, 2011
    • 0 Attachment
      On Tue 01 Feb 2011 at 09:30:48 -0800, Ben Fritz wrote:
      > Converting from cp1252 to latin1 should fail depending on the
      > characters in the file, but latin1 to cp1252 should always work,
      > shouldn't it? I understand cp1252 to be a superset of latin1. Is it
      > because the system mis-represents its encoding to Vim as latin1 when
      > really it is cp1252 or something?

      If this means that I get cp1252 characters in my file which I tried to
      keep pure Latin 1, this is very wrong... my system doesn't display those
      obnoxious microsoft "extensions".

      -Olaf.
      --
      ___ Olaf 'Rhialto' Seibert -- There's no point being grown-up if you
      \X/ rhialto/at/xs4all.nl -- can't be childish sometimes. -The 4th Doctor
      X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*

      --
      You received this message from the "vim_dev" maillist.
      Do not top-post! Type your reply below the text you are replying to.
      For more information, visit http://www.vim.org/maillist.php
    • Benjamin Fritz
      ... For now, if this bothers you, you can set your encoding to something other than latin1 (like utf-8) and do a setglobal fenc=latin1. Also update your
      Message 2 of 13 , Feb 2, 2011
      • 0 Attachment
        On Tue, Feb 1, 2011 at 7:11 PM, Rhialto <rhialto@...> wrote:
        > On Tue 01 Feb 2011 at 09:30:48 -0800, Ben Fritz wrote:
        >> Converting from cp1252 to latin1 should fail depending on the
        >> characters in the file, but latin1 to cp1252 should always work,
        >> shouldn't it? I understand cp1252 to be a superset of latin1. Is it
        >> because the system mis-represents its encoding to Vim as latin1 when
        >> really it is cp1252 or something?
        >
        > If this means that I get cp1252 characters in my file which I tried to
        > keep pure Latin 1, this is very wrong... my system doesn't display those
        > obnoxious microsoft "extensions".
        >

        For now, if this bothers you, you can set your encoding to something
        other than latin1 (like utf-8) and do a setglobal fenc=latin1. Also
        update your fileencodings option so that latin1 actually gets
        detected.

        Now you will get a warning if you try to save a file and there are
        non-latin1 characters in it.

        I think it is a problem that with encoding=latin1, Vim acts
        differently and you will not get a warning for non-latin1 characters.
        But apparently a very common (and probably not very serious) problem.

        cp1252 is basically the same as latin1, with a few extras thrown in
        where latin1 doesn't have anything useful. So as long as you don't
        intentionally include any non-latin1 characters, your file will be
        identical to one saved as a strict latin1 file.

        --
        You received this message from the "vim_dev" maillist.
        Do not top-post! Type your reply below the text you are replying to.
        For more information, visit http://www.vim.org/maillist.php
      • Benjamin Fritz
        ... I see this in :help version7.txt (line 2470): Win32: Set the default for isprint back to the wrong default @,~-255 , because many people use
        Message 3 of 13 , Feb 3, 2011
        • 0 Attachment
          On Wed, Feb 2, 2011 at 9:59 AM, Benjamin Fritz <fritzophrenic@...> wrote:
          > On Tue, Feb 1, 2011 at 7:11 PM, Rhialto <rhialto@...> wrote:
          >> On Tue 01 Feb 2011 at 09:30:48 -0800, Ben Fritz wrote:
          >>> Converting from cp1252 to latin1 should fail depending on the
          >>> characters in the file, but latin1 to cp1252 should always work,
          >>> shouldn't it? I understand cp1252 to be a superset of latin1. Is it
          >>> because the system mis-represents its encoding to Vim as latin1 when
          >>> really it is cp1252 or something?
          >>
          >> If this means that I get cp1252 characters in my file which I tried to
          >> keep pure Latin 1, this is very wrong... my system doesn't display those
          >> obnoxious microsoft "extensions".
          >>
          >
          > For now, if this bothers you, you can set your encoding to something
          > other than latin1 (like utf-8) and do a setglobal fenc=latin1. Also
          > update your fileencodings option so that latin1 actually gets
          > detected.
          >
          > Now you will get a warning if you try to save a file and there are
          > non-latin1 characters in it.
          >

          I see this in :help version7.txt (line 2470):

          Win32: Set the default for 'isprint' back to the wrong default "@,~-255",
          because many people use Windows-1252 while 'encoding' is "latin1".

          Maybe this is related?

          --
          You received this message from the "vim_dev" maillist.
          Do not top-post! Type your reply below the text you are replying to.
          For more information, visit http://www.vim.org/maillist.php
        • Vlad Irnov
          ... After ... cp1252-specific characters are no longer displayed when encoding is cp1252, so this is not a solution. This is what I think happens: when
          Message 4 of 13 , Feb 3, 2011
          • 0 Attachment
            On Feb 3, 5:03 pm, Benjamin Fritz <fritzophre...@...> wrote:
            > On Wed, Feb 2, 2011 at 9:59 AM, Benjamin Fritz <fritzophre...@...> wrote:
            > > On Tue, Feb 1, 2011 at 7:11 PM, Rhialto <rhia...@...> wrote:
            > >> On Tue 01 Feb 2011 at 09:30:48 -0800, Ben Fritz wrote:
            > >>> Converting from cp1252 to latin1 should fail depending on the
            > >>> characters in the file, but latin1 to cp1252 should always work,
            > >>> shouldn't it? I understand cp1252 to be a superset of latin1. Is it
            > >>> because the system mis-represents its encoding to Vim as latin1 when
            > >>> really it is cp1252 or something?

            > I see this in :help version7.txt (line 2470):
            >
            > Win32: Set the default for 'isprint' back to the wrong default "@,~-255",
            > because many people use Windows-1252 while 'encoding' is "latin1".
            >
            > Maybe this is related?

            After
            :set isprint=@,161-255
            cp1252-specific characters are no longer displayed when encoding is
            cp1252, so this is not a solution.


            This is what I think happens: when encoding is set to latin1, Vim
            **displays** characters in the range 128 to 159 (hex 80 to 9F) as if
            encoding is set to cp1252.

            How to reproduce: (Windows 2000, gvim 7.3, Normal version)

            Start GUI Vim with a new empty buffer. Any decent font like DejaVu or
            Lucida Console should do. Execute the following code (copy into
            clipboard and execute with :@+ or :@*).

            :set enc=latin1
            :set fenc=utf-8
            :set isprint&
            :for i in range(128,159)
            : call setline(".", getline(".").nr2char(i))
            :endfor

            You should end up with 5 "no character" blocks plus 27 printable chars
            (don't know if they survive posting, the first one is Euro sign):

            € ‚ƒ„…†‡ˆ‰Š‹Œ Ž ‘’“”•–—˜™š›œ žŸ

            This is wrong. Latin1 character set has no printable chars in this
            range, so all chars should be displayed as "no character" blocks.
            From http://en.wikipedia.org/wiki/ISO/IEC_8859-1 :
            "The Windows-1252 codepage [cp1252 in Vim] coincides with ISO-8859-1
            [latin1 in Vim] for all codes except the range 128 to 159 (hex 80 to
            9F), where the little-used C1 controls are replaced with additional
            characters."

            This not a standard behavior -- other text editors do not display
            these chars when encoding is Latin1.

            When the buffer is saved, Vim converts from latin1 to Unicode. These
            chars becomes Unicode code points 0x0080 to 0x009F (decimal 128-159,
            each encoded in 2 bytes in utf-8). They are non-printable characters.
            This behavior is correct, but probably not what the user expects. To
            preserve the cp1252-specific characters as they are displayed by Vim,
            the encoding must be set to cp1252. The bullet character in Unicode is
            decimal 8226, en dash is 8211, em dash is 8212, each encoded in 3
            bytes in utf-8.
            Conversion tables:
            http://unicode.org/Public/MAPPINGS/ISO8859/8859-1.TXT
            http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT

            --
            You received this message from the "vim_dev" maillist.
            Do not top-post! Type your reply below the text you are replying to.
            For more information, visit http://www.vim.org/maillist.php
          Your message has been successfully submitted and would be delivered to recipients shortly.