Loading ...
Sorry, an error occurred while loading the content.

57479Re: "flexwiki" ftplugin causing problems ('bomb')

Expand Messages
  • Tony Mechelynck
    Jun 27, 2010
      On 03/05/10 23:45, Lech Lorens wrote:
      [...]
      > I might be totally wrong basing my understanding of BOM and character
      > sets mainly on Wikipedia, but I thought that setting 'bomb' for utf-8
      > encoded files (which does not pose a risk of misinterpreting the
      > contents due to endianness difference) didn't make much sense. For
      > utf-16 that would be another thing.
      >
      > http://en.wikipedia.org/wiki/Byte-order_mark
      >

      Notwithstanding its name, the BOM provides more than just endianness
      detection. Actually, it is an "encoding signal" which allows detecting
      all five of the following encodings, assuming a UTF-16le file won't
      start with a NULL:

      utf-16be FE FF
      utf-16le FF FE
      utf-8 EF BB BF
      utf-32be 00 00 FE FF
      utf-32le FF FE 00 00

      For instance, when I was still on XP, I noticed that WordPad could read
      UTF-8 files but only if they started with a BOM. When writing what it
      called "Unicode", what it produced was UTF-16le with BOM.

      Any file starting 0xEF 0xBB 0xBF can be assumed to be in UTF-8.
      Distinguishing UTF-8 from Latin1 or Windows-1252 would otherwise require
      scanning the whole file, checking for invalid UTF-8 byte sequences.


      Best regards,
      Tony.
      --
      Life is a gift, living is an art. (Bram Moolenaar)

      --
      You received this message from the "vim_dev" maillist.
      Do not top-post! Type your reply below the text you are replying to.
      For more information, visit http://www.vim.org/maillist.php
    • Show all 17 messages in this topic