Loading ...
Sorry, an error occurred while loading the content.
 

Re: Unicode conversion bug?

Expand Messages
  • Tony Mechelynck
    ... No problem. [...] ... Make sure to have iconv.dll (and possibly all GnuWin32 executables) in the PATH (which, on Windows, is a semicolon-separated list).
    Message 1 of 25 , May 3, 2008
      On 03/05/08 20:30, T.P.S.Nakagawa wrote:
      > Thank you Tony, and excuse me too late reply.

      No problem.

      [...]
      > > Libiconv 1.9.2 is available precompiled from GnuWin32 from the page
      >
      > O.K. I install all GnuWin32 and add path to it's bin.
      > I erase old iconv.dll, result of all test pattern is same to old iconv.dll.

      Make sure to have iconv.dll (and possibly all GnuWin32 executables) in
      the PATH (which, on Windows, is a semicolon-separated list). How to set
      that depends on your Windows version. IIRC, in XP it is at "Control
      Panel -> System -> Advanced -> Environment Variables" or something similar.

      >
      > # And Thank you webmaster of http://www.vim.org/ (is Mr. Bram? ) to link change.
      [...]

      I think it is Bram, yes.

      Best regards,
      Tony.
      --
      "I am ready to meet my Maker. Whether my Maker is prepared for the
      great ordeal of meeting me is another matter."
      -- Winston Churchill

      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_multibyte" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---
    • T.P.S.Nakagawa
      Thank you Tony. ... Oh , Yes. ( My english is too poor, but not begenner of PC ) I see :fileencodings on gvim , and my _vimrc set ...
      Message 2 of 25 , May 3, 2008
        Thank you Tony.


        2008-05-04 8:08 (JST) , Tony Mechelynck sent follow message:
        >> > Libiconv 1.9.2 is available precompiled from GnuWin32 from the page
        >>
        >> O.K. I install all GnuWin32 and add path to it's bin.
        >> I erase old iconv.dll, result of all test pattern is same to old iconv.dll.
        >
        > Make sure to have iconv.dll (and possibly all GnuWin32 executables) in
        > the PATH (which, on Windows, is a semicolon-separated list). How to set
        > that depends on your Windows version. IIRC, in XP it is at "Control
        > Panel -> System -> Advanced -> Environment Variables" or something similar.

        Oh , Yes. ( My english is too poor, but not begenner of PC )

        I see :fileencodings on gvim , and my _vimrc set

        > if has('iconv')
        > set
        fileencodings=ascii,iso-2022-jp,utf-8,euc-jp,utf-16,cp932,java,ucs-2-internal,euc-jis0213,utf-16,ISO-8859-1
        > endif

        if not path success, `:fileencodings?` will return another value, is'not it?


        best regards,
        Nakagawa, a.k.a yaemon


        --
        NAKAGAWA Tsuneo mailto:yaemon@...
        Web site ( Japanese ony ) http://www.kikansha.jp/~yaemon/

        --~--~---------~--~----~------------~-------~--~----~
        You received this message from the "vim_multibyte" maillist.
        For more information, visit http://www.vim.org/maillist.php
        -~----------~----~----~----~------~----~------~--~---
      • Tony Mechelynck
        ... Yes, a Vim build compiled with +iconv/dyn will act as -iconv (has( iconv ) == 0) if it cannot establish contact with _any_ iconv.dll so the other :if
        Message 3 of 25 , May 3, 2008
          On 04/05/08 03:06, T.P.S.Nakagawa wrote:
          > Thank you Tony.
          >
          >
          > 2008-05-04 8:08 (JST) , Tony Mechelynck sent follow message:
          > >> > Libiconv 1.9.2 is available precompiled from GnuWin32 from the page
          > >>
          > >> O.K. I install all GnuWin32 and add path to it's bin.
          > >> I erase old iconv.dll, result of all test pattern is same to old iconv.dll.
          > >
          > > Make sure to have iconv.dll (and possibly all GnuWin32 executables) in
          > > the PATH (which, on Windows, is a semicolon-separated list). How to set
          > > that depends on your Windows version. IIRC, in XP it is at "Control
          > > Panel -> System -> Advanced -> Environment Variables" or something similar.
          >
          > Oh , Yes. ( My english is too poor, but not begenner of PC )
          >
          > I see :fileencodings on gvim , and my _vimrc set
          >
          > > if has('iconv')
          > > set
          > fileencodings=ascii,iso-2022-jp,utf-8,euc-jp,utf-16,cp932,java,ucs-2-internal,euc-jis0213,utf-16,ISO-8859-1
          > > endif
          >
          > if not path success, `:fileencodings?` will return another value, is'not it?
          >
          >
          > best regards,
          > Nakagawa, a.k.a yaemon
          >
          >

          Yes, a Vim build compiled with +iconv/dyn will act as -iconv
          (has("iconv") == 0) if it cannot establish contact with _any_ iconv.dll
          so the other ":if" branch will be followed, and ":set fileencodings?"
          (not ":fileencodings" which returns an error) will display a different
          value.

          That 'fileencodings' setting makes me wonder:
          - Shouldn't it start with "ucs-bom" to detect those Unicode files which
          have a BOM?
          - Can "ascii" give a fail signal (doesn't Vim treat it as an alias for
          "latin1")? -- If it can, then it's OK there, but otherwise not.
          - What is "ucs-2-internal"? Won't it be detected as "utf-16" (which is a
          superset of UCS-2) by the "utf-16" entry three steps earlier?
          - Why is "utf-16" mentioned twice? (The second entry does no harm, but
          will never be used.)
          - Won't you ever receive UCS-2/UTF-16 files in little-endian ordering
          (which is standard on Intel ix86 and therefore on Windows)?


          Best regards,
          Tony.
          --
          If everybody minded their own business, the world would go
          around a deal faster.
          -- The Duchess, "Through the Looking Glass"

          --~--~---------~--~----~------------~-------~--~----~
          You received this message from the "vim_multibyte" maillist.
          For more information, visit http://www.vim.org/maillist.php
          -~----------~----~----~----~------~----~------~--~---
        • T.P.S.Nakagawa
          Sorry every typo and send my dirty setting by historical reason. ... Thank you. I correct this sort. ...
          Message 4 of 25 , May 3, 2008
            Sorry every typo and send my dirty setting by historical reason.


            2008-05-04 11:24 (JST) , Tony Mechelynck sent follow message:
            > That 'fileencodings' setting makes me wonder:
            > - Shouldn't it start with "ucs-bom" to detect those Unicode files which
            > have a BOM?
            > - Can "ascii" give a fail signal (doesn't Vim treat it as an alias for
            > "latin1")? -- If it can, then it's OK there, but otherwise not.
            > - What is "ucs-2-internal"? Won't it be detected as "utf-16" (which is a
            > superset of UCS-2) by the "utf-16" entry three steps earlier?
            > - Why is "utf-16" mentioned twice? (The second entry does no harm, but
            > will never be used.)
            > - Won't you ever receive UCS-2/UTF-16 files in little-endian ordering
            > (which is standard on Intel ix86 and therefore on Windows)?

            Thank you. I correct this sort.

            > set
            fileencodings=ascii,iso-2022-jp,utf-8,euc-jp,java,utf-16,ucs-bomb,cp932,utf-16LE,euc-jis0213,ISO-8859-1


            First is "ascii" , that's my intention.
            ( Is it really alias of latin1? latin1 have another code of ascii ).

            If not exist this first, all ascii code ( ex. programming source ) detected
            iso-2022-jp and Can't add multibyte comment by UTF-8.

            I have a trick in this.


            ----- last of vimrc -------
            if has( 'autocmd' )
            source $HOME/.vim/mine/filetype.vim " about filetype
            source $HOME/.vim/mine/encode.vim " about encoding to save
            endif

            ------ cat encode.vim -----
            " $Id: encode.vim,v 1.6 2007/11/08 05:03:18 yaemon Exp $
            " set fileencode to save
            "

            :autocmd BufNewFile,BufRead *.jis set fileencoding=iso-2022-jp
            :autocmd BufNewFile,BufRead *.sjis set fileencoding=shift-jis
            :autocmd BufNewFile,BufRead *.euc-jp set fileencoding=euc-jp
            :autocmd BufNewFile,BufRead *.mozex set fileencoding=utf-8
            :autocmd BufNewFile,BufRead *.elm set fileencoding=utf-8
            :autocmd BufNewFile,BufRead cddbread.* set fileencoding=utf-8

            :autocmd BufNewFile,BufRead * call s:DefaultSaveCode()

            function s:DefaultSaveCode()
            if ( ( &fileencoding == "" ) || ( &fileencoding == "ascii" ) )
            let &fileencoding = "utf-8"
            endif
            endfunction
            ----------------------------------


            Isn't is elegant for add UTF-8 to ascii file?



            Best, Best, regard
            yaemon

            --
            NAKAGAWA Tsuneo (a.k.a. yaemon ) mailto:yaemon@...
            Web site ( Japanese ony ) http://www.kikansha.jp/~yaemon/

            --~--~---------~--~----~------------~-------~--~----~
            You received this message from the "vim_multibyte" maillist.
            For more information, visit http://www.vim.org/maillist.php
            -~----------~----~----~----~------~----~------~--~---
          • T.P.S.Nakagawa
            Good morning ( it s 8:25 in Japan ) ... Today, I success compile iconv-1.12 for Windows. by cross compile on unix box, mingw32. If you try this, please get
            Message 5 of 25 , May 4, 2008
              Good morning ( it's 8:25 in Japan )


              2008-05-04 3:30 (JST) , I sent follow message:
              > I think, in this time, return to need install new version of iconv.dll for
              > Windows.
              >
              > My Win box is too poor.
              > If I try, need cross compile on Free BSD box. But I didn't try cross compile, too.

              Today, I success compile iconv-1.12 for Windows.
              by cross compile on unix box, mingw32.

              If you try this, please get here
              http://www.kikansha.jp/~yaemon/mingw/libiconv-1.12-mingw.zip


              ...but, I can't correctry open file edited and saved UTF-8 by notepad :(


              Best regard

              --
              NAKAGAWA Tsuneo (a.k.a. yaemon ) mailto:yaemon@...
              Web site ( Japanese ony ) http://www.kikansha.jp/~yaemon/

              --~--~---------~--~----~------------~-------~--~----~
              You received this message from the "vim_multibyte" maillist.
              For more information, visit http://www.vim.org/maillist.php
              -~----------~----~----~----~------~----~------~--~---
            • T.P.S.Nakagawa
              Hello ... ... I see! On Windows XP, file encoding detect success or not, gvim can t display UTF8 + BOM, if &fileencodings setted. lush-up way is that. ...
              Message 6 of 25 , May 5, 2008
                Hello


                2008-05-05 8:25 (JST) , I sent follow message:
                > Today, I success compile iconv-1.12 for Windows.
                <>
                > ...but, I can't correctry open file edited and saved UTF-8 by notepad :(

                I see!

                On Windows XP, file encoding detect success or not, gvim can't display
                UTF8 + BOM, if &fileencodings setted.


                lush-up way is that.

                :set binary
                2x
                :set fenc=
                :w
                :set nobinary
                :e!


                cursed notepad!
                but isn't it bug of gvim?


                regard
                --

                NAKAGAWA Tsuneo (a.k.a. yaemon ) mailto:yaemon@...
                Web site ( Japanese ony ) http://www.kikansha.jp/~yaemon/

                --~--~---------~--~----~------------~-------~--~----~
                You received this message from the "vim_multibyte" maillist.
                For more information, visit http://www.vim.org/maillist.php
                -~----------~----~----~----~------~----~------~--~---
              • Tony Mechelynck
                ... If what you said above is exact, it s a Notepad bug: a UTF-8 BOM is three bytes, a UTF-16 BOM (also used for UCS-2) is two bytes, a UTF-32 BOM is four
                Message 7 of 25 , May 5, 2008
                  On 05/05/08 09:17, T.P.S.Nakagawa wrote:
                  > Hello
                  >
                  >
                  > 2008-05-05 8:25 (JST) , I sent follow message:
                  > > Today, I success compile iconv-1.12 for Windows.
                  > <>
                  > > ...but, I can't correctry open file edited and saved UTF-8 by notepad :(
                  >
                  > I see!
                  >
                  > On Windows XP, file encoding detect success or not, gvim can't display
                  > UTF8 + BOM, if&fileencodings setted.
                  >
                  >
                  > lush-up way is that.
                  >
                  > :set binary
                  > 2x
                  > :set fenc=
                  > :w
                  > :set nobinary
                  > :e!
                  >
                  >
                  > cursed notepad!
                  > but isn't it bug of gvim?
                  >
                  >
                  > regard

                  If what you said above is exact, it's a Notepad bug: a UTF-8 BOM is
                  three bytes, a UTF-16 BOM (also used for UCS-2) is two bytes, a UTF-32
                  BOM is four bytes. If there was a two-byte BOM on a UTF-8 file, it's a
                  bug in the program which produced that file.

                  When a Windows program (such as WordPad) saves a file as "Unicode text",
                  it's usually UTF-16le with BOM, which means that the first two bytes are
                  FF FE and that after that, even bytes are often null bytes.

                  Best regards,
                  Tony.
                  --
                  Don't take life too seriously -- you'll never get out of it alive.

                  --~--~---------~--~----~------------~-------~--~----~
                  You received this message from the "vim_multibyte" maillist.
                  For more information, visit http://www.vim.org/maillist.php
                  -~----------~----~----~----~------~----~------~--~---
                • T.P.S.Nakagawa
                  Sorry, Tony. But I pleasure of report next thing of this problem. ... Oh yes. I delete 2 bytes , that displayed in unix UTF-8 console. But by shown od -xc
                  Message 8 of 25 , May 5, 2008
                    Sorry, Tony.

                    But I pleasure of report next thing of this problem.

                    2008-05-05 23:48 (JST) , Tony Mechelynck sent follow message:
                    > If what you said above is exact, it's a Notepad bug: a UTF-8 BOM is
                    > three bytes, a UTF-16 BOM (also used for UCS-2) is two bytes, a UTF-32

                    Oh yes. I delete 2 bytes , that displayed in unix UTF-8 console.
                    But by shown "od -xc" command, notepad attach 3 bytes of BOM. sorry.


                    Then, I report more deep for this problem.
                    Vim read UTF-8 + BOM , if fileencodings setted, allways display by UTF-8.
                    so Windows Japanese version ( must display cp932 )
                    so unix console setted ja_JP.eucJP.

                    That's all of reason , bad display.

                    I read 1 hour sources, around *p_fencs setting, but I sleeped.
                    It's hard of read part of big source.


                    Best regard, by yaemon.


                    P.S. now, download page of libiconv is
                    http://www.kikansha.jp/~yaemon/misc/libiconv
                    --
                    NAKAGAWA Tsuneo (a.k.a. yaemon ) mailto:yaemon@...
                    Web site ( Japanese ony ) http://www.kikansha.jp/~yaemon/

                    --~--~---------~--~----~------------~-------~--~----~
                    You received this message from the "vim_multibyte" maillist.
                    For more information, visit http://www.vim.org/maillist.php
                    -~----------~----~----~----~------~----~------~--~---
                  • Tony Mechelynck
                    ... If your fileencodings starts with ucs-bom , Vim ought to detect correctly any Unicode encoding when there is a BOM without interfering with the
                    Message 9 of 25 , May 5, 2008
                      On 06/05/08 04:58, T.P.S.Nakagawa wrote:
                      > Sorry, Tony.
                      >
                      > But I pleasure of report next thing of this problem.
                      >
                      > 2008-05-05 23:48 (JST) , Tony Mechelynck sent follow message:
                      > > If what you said above is exact, it's a Notepad bug: a UTF-8 BOM is
                      > > three bytes, a UTF-16 BOM (also used for UCS-2) is two bytes, a UTF-32
                      >
                      > Oh yes. I delete 2 bytes , that displayed in unix UTF-8 console.
                      > But by shown "od -xc" command, notepad attach 3 bytes of BOM. sorry.
                      >
                      >
                      > Then, I report more deep for this problem.
                      > Vim read UTF-8 + BOM , if fileencodings setted, allways display by UTF-8.
                      > so Windows Japanese version ( must display cp932 )
                      > so unix console setted ja_JP.eucJP.

                      If your 'fileencodings' starts with "ucs-bom", Vim ought to detect
                      correctly any Unicode encoding when there is a BOM without interfering
                      with the detection of other encodings, unless they may start with one or
                      more of the following codes and contain not a single invalid byte (or
                      invalid sequence of bytes) for the corresponding Unicode encoding (I
                      know that many combinations of bytes higher than 0x7F are invalid in
                      UTF-8; I'm less sure about the other):

                      EF BB BF UTF-8
                      FE FF UTF-16be
                      FF FE UTF-16le
                      00 00 FE FF UTF-32be
                      FF FE 00 00 UTF-32le

                      Notice that Vim (and any other program with BOM detection) may "guess
                      wrong" if a file in UTF-16le with BOM starts with a NULL; but I suppose
                      that such a case is so rare it may be safely ignored.

                      Notes:
                      - Even if editing cp932 files, you may set 'encoding' to utf-8
                      - In GUI mode, anything that 'encoding' can represent, can be displayed
                      if your 'guifont' has a glyph for it. Characters for which your
                      'guifont' has no glyph may be represented by a "placeholder" question
                      mark or hollow box etc.; but if you use the GTK2 GUI (X11 only, thus not
                      on Windows) it may, in some cases, be clever enough to find an
                      appropriate glyph in a different font.
                      - Even if your terminal display is set to accept cp932 output, you may
                      still set 'encoding' to utf-8 in Console mode if 'termencoding' is set
                      to cp932, but of course in that case if you edit Unicode (or other
                      non-cp932) files containing characters which cannot be represented in
                      cp932, you will get a "placeholder" display (possibly a question mark or
                      a hollow box) at that position even though the actual contents of the
                      file are correct.
                      - The above applies also, of course, with "cp932" replaced everywhere by
                      "euc-jp".

                      >
                      > That's all of reason , bad display.
                      >
                      > I read 1 hour sources, around *p_fencs setting, but I sleeped.
                      > It's hard of read part of big source.

                      Yes, especially when you're lacking sleep. ;-)

                      >
                      >
                      > Best regard, by yaemon.
                      >
                      >
                      > P.S. now, download page of libiconv is
                      > http://www.kikansha.jp/~yaemon/misc/libiconv
                      > --
                      > NAKAGAWA Tsuneo (a.k.a. yaemon ) mailto:yaemon@...
                      > Web site ( Japanese ony ) http://www.kikansha.jp/~yaemon/

                      Best regards,
                      Tony.
                      --
                      "The Good Ship Enterprise" (to the tune of "The Good Ship Lollipop")

                      On the good ship Enterprise
                      Every week there's a new surprise
                      Where the Romulans lurk
                      And the Klingons often go berserk.

                      Yes, the good ship Enterprise
                      There's excitement anywhere it flies
                      Where Tribbles play
                      And Nurse Chapel never gets her way.

                      See Captain Kirk standing on the bridge,
                      Mr. Spock is at his side.
                      The weekly menace, ooh-ooh
                      It gets fried, scattered far and wide.

                      It's the good ship Enterprise
                      Heading out where danger lies
                      And you live in dread
                      If you're wearing a shirt that's red.
                      -- Doris Robin and Karen Trimble of The L.A. Filkharmonics

                      --~--~---------~--~----~------------~-------~--~----~
                      You received this message from the "vim_multibyte" maillist.
                      For more information, visit http://www.vim.org/maillist.php
                      -~----------~----~----~----~------~----~------~--~---
                    • T.P.S.Nakagawa
                      Thanks all. ... Oh!? Did I forget write? After libiconv version up to 1.11 , Vim detect all time correct charset. ( If I confidence set fileencoding? ) And,
                      Message 10 of 25 , May 7, 2008
                        Thanks all.


                        2008-05-06 13:30 (JST) , Tony Mechelynck sent follow message:
                        > If your 'fileencodings' starts with "ucs-bom", Vim ought to detect
                        > correctly any Unicode encoding when there is a BOM without interfering
                        > with the detection of other encodings, unless they may start with one or
                        > more of the following codes and contain not a single invalid byte (or
                        > invalid sequence of bytes) for the corresponding Unicode encoding (I
                        > know that many combinations of bytes higher than 0x7F are invalid in
                        > UTF-8; I'm less sure about the other):

                        Oh!? Did I forget write?
                        After libiconv version up to 1.11 , Vim detect all time correct charset.
                        ( If I confidence "set fileencoding?" )

                        And, if I wrote in _vimrc "set fileencodings=" ( set null string ),
                        or started by ucs-bomb,
                        or overwrite by "set fileencodings=ucs-bomb" ( that's default of Win32),
                        at all case gvim try diaplay by UTF-8.

                        Is it only Windows Japanese version's ?
                        By latin-1 charset, if you wrote (c) _copyright by one character_ or accent + e
                        by notepad and save by UTF-8, gvim display as you want?


                        Best regard, by yaemon

                        --
                        NAKAGAWA Tsuneo (a.k.a. yaemon ) mailto:yaemon@...
                        Web site ( Japanese ony ) http://www.kikansha.jp/~yaemon/

                        --~--~---------~--~----~------------~-------~--~----~
                        You received this message from the "vim_multibyte" maillist.
                        For more information, visit http://www.vim.org/maillist.php
                        -~----------~----~----~----~------~----~------~--~---
                      Your message has been successfully submitted and would be delivered to recipients shortly.