  • Tony Mechelynck
    May 5, 2008
      On 06/05/08 04:58, T.P.S.Nakagawa wrote:
      > Sorry, Tony.
      > But I pleasure of report next thing of this problem.
      > 2008-05-05 23:48 (JST) , Tony Mechelynck sent follow message:
      > > If what you said above is exact, it's a Notepad bug: a UTF-8 BOM is
      > > three bytes, a UTF-16 BOM (also used for UCS-2) is two bytes, a UTF-32
      > Oh yes. I delete 2 bytes , that displayed in unix UTF-8 console.
      > But by shown "od -xc" command, notepad attach 3 bytes of BOM. sorry.
      > Then, I report more deep for this problem.
      > Vim read UTF-8 + BOM , if fileencodings setted, allways display by UTF-8.
      > so Windows Japanese version ( must display cp932 )
      > so unix console setted ja_JP.eucJP.

      If your 'fileencodings' starts with "ucs-bom", Vim ought to detect
      correctly any Unicode encoding when there is a BOM without interfering
      with the detection of other encodings, unless they may start with one or
      more of the following codes and contain not a single invalid byte (or
      invalid sequence of bytes) for the corresponding Unicode encoding (I
      know that many combinations of bytes higher than 0x7F are invalid in
      UTF-8; I'm less sure about the other):

      EF BB BF UTF-8
      FE FF UTF-16be
      FF FE UTF-16le
      00 00 FE FF UTF-32be
      FF FE 00 00 UTF-32le

      Notice that Vim (and any other program with BOM detection) may "guess
      wrong" if a file in UTF-16le with BOM starts with a NULL; but I suppose
      that such a case is so rare it may be safely ignored.

      - Even if editing cp932 files, you may set 'encoding' to utf-8
      - In GUI mode, anything that 'encoding' can represent, can be displayed
      if your 'guifont' has a glyph for it. Characters for which your
      'guifont' has no glyph may be represented by a "placeholder" question
      mark or hollow box etc.; but if you use the GTK2 GUI (X11 only, thus not
      on Windows) it may, in some cases, be clever enough to find an
      appropriate glyph in a different font.
      - Even if your terminal display is set to accept cp932 output, you may
      still set 'encoding' to utf-8 in Console mode if 'termencoding' is set
      to cp932, but of course in that case if you edit Unicode (or other
      non-cp932) files containing characters which cannot be represented in
      cp932, you will get a "placeholder" display (possibly a question mark or
      a hollow box) at that position even though the actual contents of the
      file are correct.
      - The above applies also, of course, with "cp932" replaced everywhere by

      > That's all of reason , bad display.
      > I read 1 hour sources, around *p_fencs setting, but I sleeped.
      > It's hard of read part of big source.

      Yes, especially when you're lacking sleep. ;-)

      > Best regard, by yaemon.
      > P.S. now, download page of libiconv is
      > http://www.kikansha.jp/~yaemon/misc/libiconv
      > --
      > NAKAGAWA Tsuneo (a.k.a. yaemon ) mailto:yaemon@...
      > Web site ( Japanese ony ) http://www.kikansha.jp/~yaemon/

      Best regards,
