Loading ...
Sorry, an error occurred while loading the content.

"flexwiki" ftplugin causing problems ('bomb')

Expand Messages
  • ron
    I have recently started editing files with a .wiki extension, and rather than getting the wikipedia filetype, they pick up the flexwiki type. That s
    Message 1 of 17 , Apr 28, 2010
    • 0 Attachment
      I have recently started editing files with a '.wiki' extension, and
      rather than getting the 'wikipedia' filetype, they pick up the
      'flexwiki' type. That's not the problem.

      The problem is that the 'flexwiki' filetype handler sets "bomb",
      resulting in extra characters at the front of my utf8 files -- this
      has caused problems with other software which reads those files (I
      never have 'bomb' set).

      Scanning the ftplugins, it seems 'flexwiki' is the only one which sets
      'bomb'. Is it an ok thing for it to do? Also, why is "flexwiki" the
      handler, when wikipedia is probably the most widely used wiki? But
      that's a quibble.

      --
      You received this message from the "vim_dev" maillist.
      Do not top-post! Type your reply below the text you are replying to.
      For more information, visit http://www.vim.org/maillist.php
    • Maxim Kim
      ... I had the very same problem with it. Check
      Message 2 of 17 , Apr 29, 2010
      • 0 Attachment
        On 29 апр, 10:48, ron <r...@...> wrote:
        > I have recently started editing files with a '.wiki' extension, and
        > rather than getting the 'wikipedia' filetype, they pick up  the
        > 'flexwiki' type.  That's not the problem.
        >
        > The problem is that the 'flexwiki' filetype handler sets "bomb",
        > resulting in extra characters at the front of my utf8 files -- this
        > has caused problems with other software which reads those files (I
        > never have 'bomb' set).
        >
        > Scanning the ftplugins, it seems 'flexwiki' is the only one which sets
        > 'bomb'.  Is it an ok thing for it to do?  Also, why is "flexwiki" the
        > handler, when wikipedia is probably the most widely used wiki?  But
        > that's a quibble.

        I had the very same problem with it.

        Check http://groups.google.com/group/vim_use/browse_thread/thread/23f33912b2c0d292/90ba2a0b3a80a1fc?lnk=gst&q=flexwiki#90ba2a0b3a80a1fc

        In my plugin I just do

        augroup filetypedetect
        " clear FlexWiki's stuff
        au! * *.wiki
        augroup end

        --
        You received this message from the "vim_dev" maillist.
        Do not top-post! Type your reply below the text you are replying to.
        For more information, visit http://www.vim.org/maillist.php
      • Tony Mechelynck
        ... A simpler solution would be to create a script /ftplugin/flexwiki.vim with being an entry which comes after $VIMRUNTIME in the
        Message 3 of 17 , Apr 29, 2010
        • 0 Attachment
          On 29/04/10 16:55, Maxim Kim wrote:
          > On 29 апр, 10:48, ron<r...@...> wrote:
          >> I have recently started editing files with a '.wiki' extension, and
          >> rather than getting the 'wikipedia' filetype, they pick up the
          >> 'flexwiki' type. That's not the problem.
          >>
          >> The problem is that the 'flexwiki' filetype handler sets "bomb",
          >> resulting in extra characters at the front of my utf8 files -- this
          >> has caused problems with other software which reads those files (I
          >> never have 'bomb' set).
          >>
          >> Scanning the ftplugins, it seems 'flexwiki' is the only one which sets
          >> 'bomb'. Is it an ok thing for it to do? Also, why is "flexwiki" the
          >> handler, when wikipedia is probably the most widely used wiki? But
          >> that's a quibble.
          >
          > I had the very same problem with it.
          >
          > Check http://groups.google.com/group/vim_use/browse_thread/thread/23f33912b2c0d292/90ba2a0b3a80a1fc?lnk=gst&q=flexwiki#90ba2a0b3a80a1fc
          >
          > In my plugin I just do
          >
          > augroup filetypedetect
          > " clear FlexWiki's stuff
          > au! * *.wiki
          > augroup end
          >

          A simpler solution would be to create a script
          <something>/ftplugin/flexwiki.vim with <something> being an entry which
          comes after $VIMRUNTIME in the 'runtimepath' option (i.e.
          $VIM/vimfiles/after on any platform, or also $HOME/vimfiles/after on
          Windows, $HOME/.vim/after on Unix), with the following contents (create
          the file and/or directories if they don't yet exist; append to the file
          if it does exist):

          setlocal nobomb

          You can also override other "obnoxious" settings there in the same way
          if you want.

          BTW, I usually have 'bomb' set myself; I clear it for files (such as
          anything starting with #! on its first line) which will be handled by
          software that doesn't know about the BOM. (At least, for HTML and, I
          think, CSS, the BOM is an official part of how encodings get recognized.)


          Best regards,
          Tony.
          --
          If you think last Tuesday was a drag, wait till you see what happens
          tomorrow!

          --
          You received this message from the "vim_dev" maillist.
          Do not top-post! Type your reply below the text you are replying to.
          For more information, visit http://www.vim.org/maillist.php
        • Maxim Kim
          On 30 апр, 06:35, Tony Mechelynck ... In my case the problem with this solution was in changed state of the buffer after
          Message 4 of 17 , Apr 29, 2010
          • 0 Attachment
            On 30 апр, 06:35, Tony Mechelynck <antoine.mechely...@...>
            wrote:
            > On 29/04/10 16:55, Maxim Kim wrote:
            > > I had the very same problem with it.
            > > In my plugin I just do
            >
            > > augroup filetypedetect
            > >    " clear FlexWiki's stuff
            > >    au! * *.wiki
            > > augroup end
            >
            > A simpler solution would be to create a script
            > <something>/ftplugin/flexwiki.vim with <something> being an entry which
            > comes after $VIMRUNTIME in the 'runtimepath' option (i.e.
            > $VIM/vimfiles/after on any platform, or also $HOME/vimfiles/after on
            > Windows, $HOME/.vim/after on Unix), with the following contents (create
            > the file and/or directories if they don't yet exist; append to the file
            > if it does exist):
            >
            >         setlocal nobomb
            In my case the problem with this solution was in changed state of the
            buffer after switching bomb/nobomb.

            --
            You received this message from the "vim_dev" maillist.
            Do not top-post! Type your reply below the text you are replying to.
            For more information, visit http://www.vim.org/maillist.php
          • Patrick Texier
            Le Fri, 30 Apr 2010 04:35:30 +0200, Tony Mechelynck a écrit dans le ... It s your choice, I hate BOMs: files are no more ASCII compatible. A filetype pluging
            Message 5 of 17 , Apr 30, 2010
            • 0 Attachment
              Le Fri, 30 Apr 2010 04:35:30 +0200, Tony Mechelynck a écrit dans le
              message <4BDA41F2.4030409@...> :

              > BTW, I usually have 'bomb' set myself; I clear it for files (such as
              > anything starting with #! on its first line) which will be handled by
              > software that doesn't know about the BOM. (At least, for HTML and, I
              > think, CSS, the BOM is an official part of how encodings get recognized.)

              It's your choice, I hate BOMs: files are no more ASCII compatible.

              A filetype pluging should *never* force UTF-8 encoding or write a
              (Windows) BOM. I don't know result if Vim is not compiled with
              multibyte.
              --
              Patrick Texier

              --
              You received this message from the "vim_dev" maillist.
              Do not top-post! Type your reply below the text you are replying to.
              For more information, visit http://www.vim.org/maillist.php
            • Bram Moolenaar
              ... Setting bomb is weird. Unless the filetype requires the file to be written in utf-8 for the file to be working properly. George, is setting bomb
              Message 6 of 17 , May 2, 2010
              • 0 Attachment
                Ron Aaron wrote:

                > I have recently started editing files with a '.wiki' extension, and
                > rather than getting the 'wikipedia' filetype, they pick up the
                > 'flexwiki' type. That's not the problem.
                >
                > The problem is that the 'flexwiki' filetype handler sets "bomb",
                > resulting in extra characters at the front of my utf8 files -- this
                > has caused problems with other software which reads those files (I
                > never have 'bomb' set).
                >
                > Scanning the ftplugins, it seems 'flexwiki' is the only one which sets
                > 'bomb'. Is it an ok thing for it to do? Also, why is "flexwiki" the
                > handler, when wikipedia is probably the most widely used wiki? But
                > that's a quibble.

                Setting 'bomb' is weird. Unless the filetype requires the file to be
                written in utf-8 for the file to be working properly.

                George, is setting 'bomb' really required? If so, how can we avoid that
                this happens when the flexwiki filetype is detected when it's actually
                another kind of file?

                As a guard, 'bomb' should only be set when 'encoding' is utf-8.
                This applies to 'fileencoding' as well.

                At the time of writing the www.flexwiki.com site was not available, thus
                I could not check any specification there.

                --
                A)bort, R)etry, P)lease don't bother me again

                /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
                \\\ download, build and distribute -- http://www.A-A-P.org ///
                \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

                --
                You received this message from the "vim_dev" maillist.
                Do not top-post! Type your reply below the text you are replying to.
                For more information, visit http://www.vim.org/maillist.php
              • George V. Reilly
                ... I haven t used Flexwiki in 3 years, so I forget the details of why bomb was needed. I think Flexwiki is dead or nearly so. It seems like mapping .wiki
                Message 7 of 17 , May 2, 2010
                • 0 Attachment
                  On Sun, May 2, 2010 at 10:37 AM, Bram Moolenaar <Bram@...> wrote:
                  > Scanning the ftplugins, it seems 'flexwiki' is the only one which sets
                  > 'bomb'.  Is it an ok thing for it to do?  Also, why is "flexwiki" the
                  > handler, when wikipedia is probably the most widely used wiki?  But
                  > that's a quibble.

                  Setting 'bomb' is weird.  Unless the filetype requires the file to be
                  written in utf-8 for the file to be working properly.

                  George, is setting 'bomb' really required?  If so, how can we avoid that
                  this happens when the flexwiki filetype is detected when it's actually
                  another kind of file?

                  As a guard, 'bomb' should only be set when 'encoding' is utf-8.
                  This applies to 'fileencoding' as well.

                  At the time of writing the www.flexwiki.com site was not available, thus
                  I could not check any specification there.

                  I haven't used Flexwiki in 3 years, so I forget the details of why 'bomb' was needed.

                  I think Flexwiki is dead or nearly so. It seems like mapping .wiki files to wikipedia (MediaWiki?) rather than Flexwiki is far more useful. Feel free to remove the flexwiki mapping from ftplugin.vim.
                  -- 
                  /George V. Reilly  george@...  Twitter: @georgevreilly
                  http://www.georgevreilly.com/blog  http://blogs.cozi.com/tech

                  --
                  You received this message from the "vim_dev" maillist.
                  Do not top-post! Type your reply below the text you are replying to.
                  For more information, visit http://www.vim.org/maillist.php
                • Bram Moolenaar
                  ... What I did now is to disable recognizing .wiki files as flexwiki. Someone still using these files can re-enable it when needed. I can t find another file
                  Message 8 of 17 , May 3, 2010
                  • 0 Attachment
                    George V. Reilly wrote:

                    > On Sun, May 2, 2010 at 10:37 AM, Bram Moolenaar <Bram@...> wrote:
                    >
                    > > > Scanning the ftplugins, it seems 'flexwiki' is the only one which sets
                    > > > 'bomb'. Is it an ok thing for it to do? Also, why is "flexwiki" the
                    > > > handler, when wikipedia is probably the most widely used wiki? But
                    > > > that's a quibble.
                    > >
                    > > Setting 'bomb' is weird. Unless the filetype requires the file to be
                    > > written in utf-8 for the file to be working properly.
                    > >
                    > > George, is setting 'bomb' really required? If so, how can we avoid that
                    > > this happens when the flexwiki filetype is detected when it's actually
                    > > another kind of file?
                    > >
                    > > As a guard, 'bomb' should only be set when 'encoding' is utf-8.
                    > > This applies to 'fileencoding' as well.
                    > >
                    > > At the time of writing the www.flexwiki.com site was not available, thus
                    > > I could not check any specification there.
                    >
                    >
                    > I haven't used Flexwiki in 3 years, so I forget the details of why 'bomb'
                    > was needed.
                    >
                    > I think Flexwiki is dead or nearly so. It seems like mapping .wiki files to
                    > wikipedia (MediaWiki?) rather than Flexwiki is far more useful. Feel free to
                    > remove the flexwiki mapping from ftplugin.vim.

                    What I did now is to disable recognizing .wiki files as flexwiki.
                    Someone still using these files can re-enable it when needed.

                    I can't find another file format that uses the .wiki extension.
                    Mediawiki uses .mw.

                    --
                    The goal of science is to build better mousetraps.
                    The goal of nature is to build better mice.

                    /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                    /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
                    \\\ download, build and distribute -- http://www.A-A-P.org ///
                    \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

                    --
                    You received this message from the "vim_dev" maillist.
                    Do not top-post! Type your reply below the text you are replying to.
                    For more information, visit http://www.vim.org/maillist.php
                  • Lech Lorens
                    ... [...] ... I might be totally wrong basing my understanding of BOM and character sets mainly on Wikipedia, but I thought that setting bomb for utf-8
                    Message 9 of 17 , May 3, 2010
                    • 0 Attachment
                      On 02-May-2010 Bram Moolenaar <Bram@...> wrote:
                      >
                      > Ron Aaron wrote:
                      >
                      > > I have recently started editing files with a '.wiki' extension, and
                      > > rather than getting the 'wikipedia' filetype, they pick up the
                      > > 'flexwiki' type. That's not the problem.
                      > >
                      > > The problem is that the 'flexwiki' filetype handler sets "bomb",
                      > > resulting in extra characters at the front of my utf8 files -- this
                      > > has caused problems with other software which reads those files (I
                      > > never have 'bomb' set).
                      > >
                      > > Scanning the ftplugins, it seems 'flexwiki' is the only one which sets
                      > > 'bomb'. Is it an ok thing for it to do? Also, why is "flexwiki" the
                      > > handler, when wikipedia is probably the most widely used wiki? But
                      > > that's a quibble.
                      >
                      > Setting 'bomb' is weird. Unless the filetype requires the file to be
                      > written in utf-8 for the file to be working properly.
                      [...]
                      > As a guard, 'bomb' should only be set when 'encoding' is utf-8.
                      > This applies to 'fileencoding' as well.

                      I might be totally wrong basing my understanding of BOM and character
                      sets mainly on Wikipedia, but I thought that setting 'bomb' for utf-8
                      encoded files (which does not pose a risk of misinterpreting the
                      contents due to endianness difference) didn't make much sense. For
                      utf-16 that would be another thing.

                      http://en.wikipedia.org/wiki/Byte-order_mark

                      --
                      Cheers,
                      Lech

                      --
                      You received this message from the "vim_dev" maillist.
                      Do not top-post! Type your reply below the text you are replying to.
                      For more information, visit http://www.vim.org/maillist.php
                    • Ron Aaron
                      ... It s common to use the .wiki for any wiki text file; so making both it and .mw load MediaWiki syntax makes sense. -- For privacy, my GPG key signature
                      Message 10 of 17 , May 3, 2010
                      • 0 Attachment
                        On Monday 03 May 2010 23:12:42 Bram Moolenaar wrote:

                        > What I did now is to disable recognizing .wiki files as flexwiki.
                        > Someone still using these files can re-enable it when needed.
                        >
                        > I can't find another file format that uses the .wiki extension.
                        > Mediawiki uses .mw.

                        It's common to use the '.wiki' for any wiki text file; so making both it and
                        '.mw' load MediaWiki syntax makes sense.

                        --
                        For privacy, my GPG key signature is: AD29415D
                      • Bram Moolenaar
                        ... There is no MediaWiki syntax file. -- hundred-and-one symptoms of being an internet addict: 9. All your daydreaming is preoccupied with getting a faster
                        Message 11 of 17 , May 4, 2010
                        • 0 Attachment
                          Ron Aaron wrote:

                          > On Monday 03 May 2010 23:12:42 Bram Moolenaar wrote:
                          >
                          > > What I did now is to disable recognizing .wiki files as flexwiki.
                          > > Someone still using these files can re-enable it when needed.
                          > >
                          > > I can't find another file format that uses the .wiki extension.
                          > > Mediawiki uses .mw.
                          >
                          > It's common to use the '.wiki' for any wiki text file; so making both
                          > it and '.mw' load MediaWiki syntax makes sense.

                          There is no MediaWiki syntax file.

                          --
                          hundred-and-one symptoms of being an internet addict:
                          9. All your daydreaming is preoccupied with getting a faster connection to the
                          net: 28.8...ISDN...cable modem...T1...T3.

                          /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                          /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
                          \\\ download, build and distribute -- http://www.A-A-P.org ///
                          \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

                          --
                          You received this message from the "vim_dev" maillist.
                          Do not top-post! Type your reply below the text you are replying to.
                          For more information, visit http://www.vim.org/maillist.php
                        • Ron Aaron
                          ... Sorry, it s called Wikipedia. -- Sending me something private? Use my GPG public key: AD29415D
                          Message 12 of 17 , May 4, 2010
                          • 0 Attachment
                            On Tuesday 04 May 2010 21:52:54 Bram Moolenaar wrote:
                            >

                            > There is no MediaWiki syntax file.

                            Sorry, it's called Wikipedia.

                            --
                            Sending me something private?
                            Use my GPG public key: AD29415D
                          • Charles Campbell
                            ... Hello! Ron, Bram was wanting a Wikipedia syntax file. I can t vouch for it, but perhaps you mean the one in:
                            Message 13 of 17 , May 4, 2010
                            • 0 Attachment
                              Ron Aaron wrote:
                              > On Tuesday 04 May 2010 21:52:54 Bram Moolenaar wrote:
                              >
                              >
                              >
                              >> There is no MediaWiki syntax file.
                              >>
                              >
                              > Sorry, it's called Wikipedia.
                              >
                              >
                              Hello!

                              Ron, Bram was wanting a Wikipedia syntax file. I can't vouch for it,
                              but perhaps you mean the one in:

                              http://www.vim.org/scripts/script.php?script_id=1787

                              Regards,
                              Chip Campbell

                              --
                              You received this message from the "vim_dev" maillist.
                              Do not top-post! Type your reply below the text you are replying to.
                              For more information, visit http://www.vim.org/maillist.php
                            • ron
                              On May 4, 11:57 pm, Charles Campbell ... I think that s the one I use, yes. -- You received this message from the vim_dev
                              Message 14 of 17 , May 4, 2010
                              • 0 Attachment
                                On May 4, 11:57 pm, Charles Campbell <Charles.E.Campb...@...>
                                wrote:

                                > Ron, Bram was wanting a Wikipedia syntax file.  I can't vouch for it,
                                > but perhaps you mean the one in:
                                >
                                > http://www.vim.org/scripts/script.php?script_id=1787

                                I think that's the one I use, yes.

                                --
                                You received this message from the "vim_dev" maillist.
                                Do not top-post! Type your reply below the text you are replying to.
                                For more information, visit http://www.vim.org/maillist.php
                              • Tony Mechelynck
                                On 03/05/10 23:45, Lech Lorens wrote: [...] ... Notwithstanding its name, the BOM provides more than just endianness detection. Actually, it is an encoding
                                Message 15 of 17 , Jun 27 8:28 AM
                                • 0 Attachment
                                  On 03/05/10 23:45, Lech Lorens wrote:
                                  [...]
                                  > I might be totally wrong basing my understanding of BOM and character
                                  > sets mainly on Wikipedia, but I thought that setting 'bomb' for utf-8
                                  > encoded files (which does not pose a risk of misinterpreting the
                                  > contents due to endianness difference) didn't make much sense. For
                                  > utf-16 that would be another thing.
                                  >
                                  > http://en.wikipedia.org/wiki/Byte-order_mark
                                  >

                                  Notwithstanding its name, the BOM provides more than just endianness
                                  detection. Actually, it is an "encoding signal" which allows detecting
                                  all five of the following encodings, assuming a UTF-16le file won't
                                  start with a NULL:

                                  utf-16be FE FF
                                  utf-16le FF FE
                                  utf-8 EF BB BF
                                  utf-32be 00 00 FE FF
                                  utf-32le FF FE 00 00

                                  For instance, when I was still on XP, I noticed that WordPad could read
                                  UTF-8 files but only if they started with a BOM. When writing what it
                                  called "Unicode", what it produced was UTF-16le with BOM.

                                  Any file starting 0xEF 0xBB 0xBF can be assumed to be in UTF-8.
                                  Distinguishing UTF-8 from Latin1 or Windows-1252 would otherwise require
                                  scanning the whole file, checking for invalid UTF-8 byte sequences.


                                  Best regards,
                                  Tony.
                                  --
                                  Life is a gift, living is an art. (Bram Moolenaar)

                                  --
                                  You received this message from the "vim_dev" maillist.
                                  Do not top-post! Type your reply below the text you are replying to.
                                  For more information, visit http://www.vim.org/maillist.php
                                • Benjamin R. Haskell
                                  ... Quoting the same Wikipedia article Lech mentioned: While [the] Unicode standard allows BOM in UTF-8, it does not require or recommend it. and
                                  Message 16 of 17 , Jun 27 12:21 PM
                                  • 0 Attachment
                                    On Sun, 27 Jun 2010, Tony Mechelynck wrote:

                                    > On 03/05/10 23:45, Lech Lorens wrote:
                                    > [...]
                                    > > I might be totally wrong basing my understanding of BOM and
                                    > > character sets mainly on Wikipedia, but I thought that setting
                                    > > 'bomb' for utf-8 encoded files (which does not pose a risk of
                                    > > misinterpreting the contents due to endianness difference) didn't
                                    > > make much sense. For utf-16 that would be another thing.
                                    > >
                                    > > http://en.wikipedia.org/wiki/Byte-order_mark
                                    > >
                                    >
                                    > Notwithstanding its name, the BOM provides more than just endianness
                                    > detection. Actually, it is an "encoding signal" which allows detecting
                                    > all five of the following encodings, assuming a UTF-16le file won't
                                    > start with a NULL:
                                    >
                                    > utf-16be FE FF
                                    > utf-16le FF FE
                                    > utf-8 EF BB BF
                                    > utf-32be 00 00 FE FF
                                    > utf-32le FF FE 00 00
                                    >
                                    > For instance, when I was still on XP, I noticed that WordPad could
                                    > read UTF-8 files but only if they started with a BOM. When writing
                                    > what it called "Unicode", what it produced was UTF-16le with BOM.
                                    >
                                    > Any file starting 0xEF 0xBB 0xBF can be assumed to be in UTF-8.
                                    > Distinguishing UTF-8 from Latin1 or Windows-1252 would otherwise
                                    > require scanning the whole file, checking for invalid UTF-8 byte
                                    > sequences.

                                    Quoting the same Wikipedia article Lech mentioned:

                                    "While [the] Unicode standard allows BOM in UTF-8, it does not require
                                    or recommend it."

                                    and paraphrasing the rest of that paragraph:

                                    Using a BOM as the first character of a UTF-8-encoded file can cause
                                    problems with the shebang line[1] in Unix-like systems. And
                                    UTF-8-capable software is often written to assume UTF-8 unless otherwise
                                    directed, so the U+FEFF character at the start of the stream is often
                                    interpreted incorrectly.

                                    The Unicode UTF-{8,16,32} & BOM FAQ probably worded it better than
                                    Wikipedia or I[2].

                                    --
                                    Best,
                                    Ben

                                    [1] http://en.wikipedia.org/wiki/Shebang_(Unix)
                                    [2] http://unicode.org/faq/utf_bom.html#bom5

                                    --
                                    You received this message from the "vim_dev" maillist.
                                    Do not top-post! Type your reply below the text you are replying to.
                                    For more information, visit http://www.vim.org/maillist.php
                                  • Tony Mechelynck
                                    ... Yes, a UTF-8 BOM will interfere with any software that has no knowledge of Unicode and expects some particular magic bytes at the start, or simply won t
                                    Message 17 of 17 , Jun 27 6:02 PM
                                    • 0 Attachment
                                      On 27/06/10 21:21, Benjamin R. Haskell wrote:
                                      > On Sun, 27 Jun 2010, Tony Mechelynck wrote:
                                      >
                                      >> On 03/05/10 23:45, Lech Lorens wrote:
                                      >> [...]
                                      >>> I might be totally wrong basing my understanding of BOM and
                                      >>> character sets mainly on Wikipedia, but I thought that setting
                                      >>> 'bomb' for utf-8 encoded files (which does not pose a risk of
                                      >>> misinterpreting the contents due to endianness difference) didn't
                                      >>> make much sense. For utf-16 that would be another thing.
                                      >>>
                                      >>> http://en.wikipedia.org/wiki/Byte-order_mark
                                      >>>
                                      >>
                                      >> Notwithstanding its name, the BOM provides more than just endianness
                                      >> detection. Actually, it is an "encoding signal" which allows detecting
                                      >> all five of the following encodings, assuming a UTF-16le file won't
                                      >> start with a NULL:
                                      >>
                                      >> utf-16be FE FF
                                      >> utf-16le FF FE
                                      >> utf-8 EF BB BF
                                      >> utf-32be 00 00 FE FF
                                      >> utf-32le FF FE 00 00
                                      >>
                                      >> For instance, when I was still on XP, I noticed that WordPad could
                                      >> read UTF-8 files but only if they started with a BOM. When writing
                                      >> what it called "Unicode", what it produced was UTF-16le with BOM.
                                      >>
                                      >> Any file starting 0xEF 0xBB 0xBF can be assumed to be in UTF-8.
                                      >> Distinguishing UTF-8 from Latin1 or Windows-1252 would otherwise
                                      >> require scanning the whole file, checking for invalid UTF-8 byte
                                      >> sequences.
                                      >
                                      > Quoting the same Wikipedia article Lech mentioned:
                                      >
                                      > "While [the] Unicode standard allows BOM in UTF-8, it does not require
                                      > or recommend it."
                                      >
                                      > and paraphrasing the rest of that paragraph:
                                      >
                                      > Using a BOM as the first character of a UTF-8-encoded file can cause
                                      > problems with the shebang line[1] in Unix-like systems. And
                                      > UTF-8-capable software is often written to assume UTF-8 unless otherwise
                                      > directed, so the U+FEFF character at the start of the stream is often
                                      > interpreted incorrectly.
                                      >
                                      > The Unicode UTF-{8,16,32}& BOM FAQ probably worded it better than
                                      > Wikipedia or I[2].
                                      >

                                      Yes, a UTF-8 BOM will interfere with any software that has no knowledge
                                      of Unicode and expects some particular "magic bytes" at the start, or
                                      simply won't accept 0xEF 0xBB 0xBF at the start of a document. The #!
                                      shebang is just one example.

                                      OTOH, in filetypes where UTF-8 is but one possibility among many, the
                                      BOM is useful to specify the encoding or to confirm what was set
                                      otherwise. Examples:

                                      - HTML charset can be set by the HTTP "Content-Type" header (in an HTTP
                                      or HTTPS transaction extrernal to the file), in a <meta
                                      http-equiv="Content-Type" content="text/html; charset=something"> tag
                                      (replacing "something" by the charset) within the <head> section, or by
                                      a BOM. There are even official priority rules that tell browsers what to
                                      do when two or three of the above are present (and they are necessary,
                                      because -I'm told- some braindead hosts will send "Content-Type:
                                      text/html; charset=iso-8859-1" for any *.htm or *.html file regardless
                                      of BOM or <meta> tags).

                                      - CSS charset can be set by a BOM.

                                      - XML charset can be set (IIRC) by a <? header line or by a BOM

                                      - XHTML is both HTML and XML so the methods of both apply to it.

                                      Personally I use the following rules of thumb:

                                      - Add a BOM to Unicode files meant for use by a browser.
                                      - Don't add it to UTF-8 files mostly in US-ASCII (possibly with
                                      codepoints above 0x7F in literals and comments) if they're meant for use
                                      by a shell, the 'make' utility, or a compiler.
                                      - Some Windows programs won't read UTF-8 correctly unless a BOM is present.
                                      - On Windows, when a system file is said to be in 'Unicode' that usually
                                      means UTF-16le with BOM.
                                      - Vim helpfiles in a single directory must either all have a BOM, or
                                      (recommended) all lack a BOM. If some have one and others not, the
                                      ":helptags" command will abort with an error.

                                      This does not explicitly cover all cases; when it doesn't (or in the
                                      cases where some of the above rules conflict), I proceed by analogy and
                                      by trial and error.


                                      Best regards,
                                      Tony.
                                      --
                                      One man's brain plus one other will produce one half as many ideas as
                                      one man would have produced alone. These two plus two more will
                                      produce half again as many ideas. These four plus four more begin to
                                      represent a creative meeting, and the ratio changes to one quarter as
                                      many ...
                                      -- Anthony Chevins

                                      --
                                      You received this message from the "vim_dev" maillist.
                                      Do not top-post! Type your reply below the text you are replying to.
                                      For more information, visit http://www.vim.org/maillist.php
                                    Your message has been successfully submitted and would be delivered to recipients shortly.