57479Re: "flexwiki" ftplugin causing problems ('bomb')
- Jun 27, 2010
On 03/05/10 23:45, Lech Lorens wrote:
> I might be totally wrong basing my understanding of BOM and character
> sets mainly on Wikipedia, but I thought that setting 'bomb' for utf-8
> encoded files (which does not pose a risk of misinterpreting the
> contents due to endianness difference) didn't make much sense. For
> utf-16 that would be another thing.
Notwithstanding its name, the BOM provides more than just endianness
detection. Actually, it is an "encoding signal" which allows detecting
all five of the following encodings, assuming a UTF-16le file won't
start with a NULL:
utf-16be FE FF
utf-16le FF FE
utf-8 EF BB BF
utf-32be 00 00 FE FF
utf-32le FF FE 00 00
For instance, when I was still on XP, I noticed that WordPad could read
UTF-8 files but only if they started with a BOM. When writing what it
called "Unicode", what it produced was UTF-16le with BOM.
Any file starting 0xEF 0xBB 0xBF can be assumed to be in UTF-8.
Distinguishing UTF-8 from Latin1 or Windows-1252 would otherwise require
scanning the whole file, checking for invalid UTF-8 byte sequences.
Life is a gift, living is an art. (Bram Moolenaar)
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
- << Previous post in topic Next post in topic >>