  • Tony Mechelynck
    Message 1 of 4 , Apr 23, 2013
      On 23/04/13 17:19, Taro MURAOKA wrote:
      > Hi list.
      > When 'enc' is "utf-8" and 'fencs' includes "ucs-2",
      > and open a file which is not "ucs-2" encoding,
      > then fencs trial is terminated at "ucs-2" unexpectedly.
      > For example:
      > :set enc=utf-8
      > :set fencs=ucs-2
      > :e abc.txt
      > It is failed when opening attached "abc.txt".
      > I wrote an attached patch to fix this.
      > Please check it.
      > Best.

      - Especially when 'encoding' is utf-8, it is recommended to start
      'fileencodings' with ucs-bom.
      - It is always recommended to end the 'fileencodings' with some 8-bit
      encoding, which will serve as default
      - It is useless to put more than one 8-bit encoding in 'fileencodings',
      nothing after the first 8-bit encoding will ever be tried
      - ucs-2 is obsolete, utf-16 should be used instead. (UTF-16 can
      represent codepoints up to U+10FFFF, using surrogate pairs for anything
      above U+FFFF. UCS-2 cannot go further up than U+FFFF and surrogates are
      invalid when using it.)
      - For ucs-something and utf-something other than utf-8 (and utf-7 which
      is also obsolete), big-endian is assumed unless you explicitly specify
      little-endian, even when running on a little-endian machine. So, for
      Vim, utf-16 is the same as utf-16be, not utf-16le, even on Intel x86
      - It is very hard to detect utf-16 (and the obsolete ucs-2) correctly
      unless there is a BOM (in which case ucs-bom will handle it)
      - In recent versions of Vim (including all patchlevels of 7.3),
      ++enc=something completely bypasses the 'fileencodings' heuristics,
      forcing the charset you mentioned. You may get � or hollow-box wildcards
      if the file contents are invalid for that encoding.

      For "Western" locales, I recommend

      :set fencs=ucs-bom,utf-8,latin1

      For East-Asian locales there is a script somewhere that improves on the
      'fileencodings' heuristic (trying to discriminate as best as possible
      between the common encodings used for the various CJK languages) but I
      don't know the details.

      Best regards,
