Loading ...
Sorry, an error occurred while loading the content.

88869Re: utf8 BOM

Expand Messages
  • Tony Mechelynck
    Apr 1, 2008
    • 0 Attachment
      bill lam wrote:
      > Gene Kwiecinski wrote:
      >> turned on. Might be wrong, though, and it might assume/require
      >> little-endian multibyte chars per the utf8 spec (ie, a BOM would be
      >> redundant), and iirr the BOM is only really required with utf16/ucs2
      >> encoding.
      >
      > Gene,
      > Thanks. Just for the record, utf8 and hence its BOM is endian neutral.
      >
      > regards,

      With UTF-8, a BOM does not specify endianness but it does specify that
      the file is in UTF-8 rather than UTF-16 or UTF-32. For instance, Windows
      XP WordPad, which cannot _write_ Unicode files except in UTF-16le (with
      BOM), can _read_ UTF-8 files if they have a BOM.

      Therefore, IMHO the appellation "BOM" (Byte Order Mark) is a misnomer; I
      would have preferred "Unicode encoding marker" or some such. It can take
      the following hex values:

      EF BB BF UTF-8
      FE FF UTF-16be
      FF FE UTF-16le
      00 00 FE FF UTF-32be
      FF FE 00 00 UTF-32le

      As can be seen, uniqueness assumes that UTF-16le files don't begin with
      a NULL.

      Best regards,
      Tony.
      --
      I've given up reading books; I find it takes my mind off myself.

      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_use" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---
    • Show all 9 messages in this topic