Loading ...
Sorry, an error occurred while loading the content.

Re: Unicode conversion bug?

Expand Messages
  • T.P.S.Nakagawa
    Hello ... ... I see! On Windows XP, file encoding detect success or not, gvim can t display UTF8 + BOM, if &fileencodings setted. lush-up way is that. ...
    Message 1 of 25 , May 5, 2008
    • 0 Attachment
      Hello


      2008-05-05 8:25 (JST) , I sent follow message:
      > Today, I success compile iconv-1.12 for Windows.
      <>
      > ...but, I can't correctry open file edited and saved UTF-8 by notepad :(

      I see!

      On Windows XP, file encoding detect success or not, gvim can't display
      UTF8 + BOM, if &fileencodings setted.


      lush-up way is that.

      :set binary
      2x
      :set fenc=
      :w
      :set nobinary
      :e!


      cursed notepad!
      but isn't it bug of gvim?


      regard
      --

      NAKAGAWA Tsuneo (a.k.a. yaemon ) mailto:yaemon@...
      Web site ( Japanese ony ) http://www.kikansha.jp/~yaemon/

      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_multibyte" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---
    • Tony Mechelynck
      ... If what you said above is exact, it s a Notepad bug: a UTF-8 BOM is three bytes, a UTF-16 BOM (also used for UCS-2) is two bytes, a UTF-32 BOM is four
      Message 2 of 25 , May 5, 2008
      • 0 Attachment
        On 05/05/08 09:17, T.P.S.Nakagawa wrote:
        > Hello
        >
        >
        > 2008-05-05 8:25 (JST) , I sent follow message:
        > > Today, I success compile iconv-1.12 for Windows.
        > <>
        > > ...but, I can't correctry open file edited and saved UTF-8 by notepad :(
        >
        > I see!
        >
        > On Windows XP, file encoding detect success or not, gvim can't display
        > UTF8 + BOM, if&fileencodings setted.
        >
        >
        > lush-up way is that.
        >
        > :set binary
        > 2x
        > :set fenc=
        > :w
        > :set nobinary
        > :e!
        >
        >
        > cursed notepad!
        > but isn't it bug of gvim?
        >
        >
        > regard

        If what you said above is exact, it's a Notepad bug: a UTF-8 BOM is
        three bytes, a UTF-16 BOM (also used for UCS-2) is two bytes, a UTF-32
        BOM is four bytes. If there was a two-byte BOM on a UTF-8 file, it's a
        bug in the program which produced that file.

        When a Windows program (such as WordPad) saves a file as "Unicode text",
        it's usually UTF-16le with BOM, which means that the first two bytes are
        FF FE and that after that, even bytes are often null bytes.

        Best regards,
        Tony.
        --
        Don't take life too seriously -- you'll never get out of it alive.

        --~--~---------~--~----~------------~-------~--~----~
        You received this message from the "vim_multibyte" maillist.
        For more information, visit http://www.vim.org/maillist.php
        -~----------~----~----~----~------~----~------~--~---
      • T.P.S.Nakagawa
        Sorry, Tony. But I pleasure of report next thing of this problem. ... Oh yes. I delete 2 bytes , that displayed in unix UTF-8 console. But by shown od -xc
        Message 3 of 25 , May 5, 2008
        • 0 Attachment
          Sorry, Tony.

          But I pleasure of report next thing of this problem.

          2008-05-05 23:48 (JST) , Tony Mechelynck sent follow message:
          > If what you said above is exact, it's a Notepad bug: a UTF-8 BOM is
          > three bytes, a UTF-16 BOM (also used for UCS-2) is two bytes, a UTF-32

          Oh yes. I delete 2 bytes , that displayed in unix UTF-8 console.
          But by shown "od -xc" command, notepad attach 3 bytes of BOM. sorry.


          Then, I report more deep for this problem.
          Vim read UTF-8 + BOM , if fileencodings setted, allways display by UTF-8.
          so Windows Japanese version ( must display cp932 )
          so unix console setted ja_JP.eucJP.

          That's all of reason , bad display.

          I read 1 hour sources, around *p_fencs setting, but I sleeped.
          It's hard of read part of big source.


          Best regard, by yaemon.


          P.S. now, download page of libiconv is
          http://www.kikansha.jp/~yaemon/misc/libiconv
          --
          NAKAGAWA Tsuneo (a.k.a. yaemon ) mailto:yaemon@...
          Web site ( Japanese ony ) http://www.kikansha.jp/~yaemon/

          --~--~---------~--~----~------------~-------~--~----~
          You received this message from the "vim_multibyte" maillist.
          For more information, visit http://www.vim.org/maillist.php
          -~----------~----~----~----~------~----~------~--~---
        • Tony Mechelynck
          ... If your fileencodings starts with ucs-bom , Vim ought to detect correctly any Unicode encoding when there is a BOM without interfering with the
          Message 4 of 25 , May 5, 2008
          • 0 Attachment
            On 06/05/08 04:58, T.P.S.Nakagawa wrote:
            > Sorry, Tony.
            >
            > But I pleasure of report next thing of this problem.
            >
            > 2008-05-05 23:48 (JST) , Tony Mechelynck sent follow message:
            > > If what you said above is exact, it's a Notepad bug: a UTF-8 BOM is
            > > three bytes, a UTF-16 BOM (also used for UCS-2) is two bytes, a UTF-32
            >
            > Oh yes. I delete 2 bytes , that displayed in unix UTF-8 console.
            > But by shown "od -xc" command, notepad attach 3 bytes of BOM. sorry.
            >
            >
            > Then, I report more deep for this problem.
            > Vim read UTF-8 + BOM , if fileencodings setted, allways display by UTF-8.
            > so Windows Japanese version ( must display cp932 )
            > so unix console setted ja_JP.eucJP.

            If your 'fileencodings' starts with "ucs-bom", Vim ought to detect
            correctly any Unicode encoding when there is a BOM without interfering
            with the detection of other encodings, unless they may start with one or
            more of the following codes and contain not a single invalid byte (or
            invalid sequence of bytes) for the corresponding Unicode encoding (I
            know that many combinations of bytes higher than 0x7F are invalid in
            UTF-8; I'm less sure about the other):

            EF BB BF UTF-8
            FE FF UTF-16be
            FF FE UTF-16le
            00 00 FE FF UTF-32be
            FF FE 00 00 UTF-32le

            Notice that Vim (and any other program with BOM detection) may "guess
            wrong" if a file in UTF-16le with BOM starts with a NULL; but I suppose
            that such a case is so rare it may be safely ignored.

            Notes:
            - Even if editing cp932 files, you may set 'encoding' to utf-8
            - In GUI mode, anything that 'encoding' can represent, can be displayed
            if your 'guifont' has a glyph for it. Characters for which your
            'guifont' has no glyph may be represented by a "placeholder" question
            mark or hollow box etc.; but if you use the GTK2 GUI (X11 only, thus not
            on Windows) it may, in some cases, be clever enough to find an
            appropriate glyph in a different font.
            - Even if your terminal display is set to accept cp932 output, you may
            still set 'encoding' to utf-8 in Console mode if 'termencoding' is set
            to cp932, but of course in that case if you edit Unicode (or other
            non-cp932) files containing characters which cannot be represented in
            cp932, you will get a "placeholder" display (possibly a question mark or
            a hollow box) at that position even though the actual contents of the
            file are correct.
            - The above applies also, of course, with "cp932" replaced everywhere by
            "euc-jp".

            >
            > That's all of reason , bad display.
            >
            > I read 1 hour sources, around *p_fencs setting, but I sleeped.
            > It's hard of read part of big source.

            Yes, especially when you're lacking sleep. ;-)

            >
            >
            > Best regard, by yaemon.
            >
            >
            > P.S. now, download page of libiconv is
            > http://www.kikansha.jp/~yaemon/misc/libiconv
            > --
            > NAKAGAWA Tsuneo (a.k.a. yaemon ) mailto:yaemon@...
            > Web site ( Japanese ony ) http://www.kikansha.jp/~yaemon/

            Best regards,
            Tony.
            --
            "The Good Ship Enterprise" (to the tune of "The Good Ship Lollipop")

            On the good ship Enterprise
            Every week there's a new surprise
            Where the Romulans lurk
            And the Klingons often go berserk.

            Yes, the good ship Enterprise
            There's excitement anywhere it flies
            Where Tribbles play
            And Nurse Chapel never gets her way.

            See Captain Kirk standing on the bridge,
            Mr. Spock is at his side.
            The weekly menace, ooh-ooh
            It gets fried, scattered far and wide.

            It's the good ship Enterprise
            Heading out where danger lies
            And you live in dread
            If you're wearing a shirt that's red.
            -- Doris Robin and Karen Trimble of The L.A. Filkharmonics

            --~--~---------~--~----~------------~-------~--~----~
            You received this message from the "vim_multibyte" maillist.
            For more information, visit http://www.vim.org/maillist.php
            -~----------~----~----~----~------~----~------~--~---
          • T.P.S.Nakagawa
            Thanks all. ... Oh!? Did I forget write? After libiconv version up to 1.11 , Vim detect all time correct charset. ( If I confidence set fileencoding? ) And,
            Message 5 of 25 , May 7, 2008
            • 0 Attachment
              Thanks all.


              2008-05-06 13:30 (JST) , Tony Mechelynck sent follow message:
              > If your 'fileencodings' starts with "ucs-bom", Vim ought to detect
              > correctly any Unicode encoding when there is a BOM without interfering
              > with the detection of other encodings, unless they may start with one or
              > more of the following codes and contain not a single invalid byte (or
              > invalid sequence of bytes) for the corresponding Unicode encoding (I
              > know that many combinations of bytes higher than 0x7F are invalid in
              > UTF-8; I'm less sure about the other):

              Oh!? Did I forget write?
              After libiconv version up to 1.11 , Vim detect all time correct charset.
              ( If I confidence "set fileencoding?" )

              And, if I wrote in _vimrc "set fileencodings=" ( set null string ),
              or started by ucs-bomb,
              or overwrite by "set fileencodings=ucs-bomb" ( that's default of Win32),
              at all case gvim try diaplay by UTF-8.

              Is it only Windows Japanese version's ?
              By latin-1 charset, if you wrote (c) _copyright by one character_ or accent + e
              by notepad and save by UTF-8, gvim display as you want?


              Best regard, by yaemon

              --
              NAKAGAWA Tsuneo (a.k.a. yaemon ) mailto:yaemon@...
              Web site ( Japanese ony ) http://www.kikansha.jp/~yaemon/

              --~--~---------~--~----~------------~-------~--~----~
              You received this message from the "vim_multibyte" maillist.
              For more information, visit http://www.vim.org/maillist.php
              -~----------~----~----~----~------~----~------~--~---
            Your message has been successfully submitted and would be delivered to recipients shortly.