Loading ...
Sorry, an error occurred while loading the content.

Re: "flexwiki" ftplugin causing problems ('bomb')

Expand Messages
  • Lech Lorens
    ... [...] ... I might be totally wrong basing my understanding of BOM and character sets mainly on Wikipedia, but I thought that setting bomb for utf-8
    Message 1 of 17 , May 3 2:45 PM
    • 0 Attachment
      On 02-May-2010 Bram Moolenaar <Bram@...> wrote:
      >
      > Ron Aaron wrote:
      >
      > > I have recently started editing files with a '.wiki' extension, and
      > > rather than getting the 'wikipedia' filetype, they pick up the
      > > 'flexwiki' type. That's not the problem.
      > >
      > > The problem is that the 'flexwiki' filetype handler sets "bomb",
      > > resulting in extra characters at the front of my utf8 files -- this
      > > has caused problems with other software which reads those files (I
      > > never have 'bomb' set).
      > >
      > > Scanning the ftplugins, it seems 'flexwiki' is the only one which sets
      > > 'bomb'. Is it an ok thing for it to do? Also, why is "flexwiki" the
      > > handler, when wikipedia is probably the most widely used wiki? But
      > > that's a quibble.
      >
      > Setting 'bomb' is weird. Unless the filetype requires the file to be
      > written in utf-8 for the file to be working properly.
      [...]
      > As a guard, 'bomb' should only be set when 'encoding' is utf-8.
      > This applies to 'fileencoding' as well.

      I might be totally wrong basing my understanding of BOM and character
      sets mainly on Wikipedia, but I thought that setting 'bomb' for utf-8
      encoded files (which does not pose a risk of misinterpreting the
      contents due to endianness difference) didn't make much sense. For
      utf-16 that would be another thing.

      http://en.wikipedia.org/wiki/Byte-order_mark

      --
      Cheers,
      Lech

      --
      You received this message from the "vim_dev" maillist.
      Do not top-post! Type your reply below the text you are replying to.
      For more information, visit http://www.vim.org/maillist.php
    • Ron Aaron
      ... It s common to use the .wiki for any wiki text file; so making both it and .mw load MediaWiki syntax makes sense. -- For privacy, my GPG key signature
      Message 2 of 17 , May 3 11:49 PM
      • 0 Attachment
        On Monday 03 May 2010 23:12:42 Bram Moolenaar wrote:

        > What I did now is to disable recognizing .wiki files as flexwiki.
        > Someone still using these files can re-enable it when needed.
        >
        > I can't find another file format that uses the .wiki extension.
        > Mediawiki uses .mw.

        It's common to use the '.wiki' for any wiki text file; so making both it and
        '.mw' load MediaWiki syntax makes sense.

        --
        For privacy, my GPG key signature is: AD29415D
      • Bram Moolenaar
        ... There is no MediaWiki syntax file. -- hundred-and-one symptoms of being an internet addict: 9. All your daydreaming is preoccupied with getting a faster
        Message 3 of 17 , May 4 11:52 AM
        • 0 Attachment
          Ron Aaron wrote:

          > On Monday 03 May 2010 23:12:42 Bram Moolenaar wrote:
          >
          > > What I did now is to disable recognizing .wiki files as flexwiki.
          > > Someone still using these files can re-enable it when needed.
          > >
          > > I can't find another file format that uses the .wiki extension.
          > > Mediawiki uses .mw.
          >
          > It's common to use the '.wiki' for any wiki text file; so making both
          > it and '.mw' load MediaWiki syntax makes sense.

          There is no MediaWiki syntax file.

          --
          hundred-and-one symptoms of being an internet addict:
          9. All your daydreaming is preoccupied with getting a faster connection to the
          net: 28.8...ISDN...cable modem...T1...T3.

          /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
          /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
          \\\ download, build and distribute -- http://www.A-A-P.org ///
          \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

          --
          You received this message from the "vim_dev" maillist.
          Do not top-post! Type your reply below the text you are replying to.
          For more information, visit http://www.vim.org/maillist.php
        • Ron Aaron
          ... Sorry, it s called Wikipedia. -- Sending me something private? Use my GPG public key: AD29415D
          Message 4 of 17 , May 4 12:15 PM
          • 0 Attachment
            On Tuesday 04 May 2010 21:52:54 Bram Moolenaar wrote:
            >

            > There is no MediaWiki syntax file.

            Sorry, it's called Wikipedia.

            --
            Sending me something private?
            Use my GPG public key: AD29415D
          • Charles Campbell
            ... Hello! Ron, Bram was wanting a Wikipedia syntax file. I can t vouch for it, but perhaps you mean the one in:
            Message 5 of 17 , May 4 1:57 PM
            • 0 Attachment
              Ron Aaron wrote:
              > On Tuesday 04 May 2010 21:52:54 Bram Moolenaar wrote:
              >
              >
              >
              >> There is no MediaWiki syntax file.
              >>
              >
              > Sorry, it's called Wikipedia.
              >
              >
              Hello!

              Ron, Bram was wanting a Wikipedia syntax file. I can't vouch for it,
              but perhaps you mean the one in:

              http://www.vim.org/scripts/script.php?script_id=1787

              Regards,
              Chip Campbell

              --
              You received this message from the "vim_dev" maillist.
              Do not top-post! Type your reply below the text you are replying to.
              For more information, visit http://www.vim.org/maillist.php
            • ron
              On May 4, 11:57 pm, Charles Campbell ... I think that s the one I use, yes. -- You received this message from the vim_dev
              Message 6 of 17 , May 4 8:33 PM
              • 0 Attachment
                On May 4, 11:57 pm, Charles Campbell <Charles.E.Campb...@...>
                wrote:

                > Ron, Bram was wanting a Wikipedia syntax file.  I can't vouch for it,
                > but perhaps you mean the one in:
                >
                > http://www.vim.org/scripts/script.php?script_id=1787

                I think that's the one I use, yes.

                --
                You received this message from the "vim_dev" maillist.
                Do not top-post! Type your reply below the text you are replying to.
                For more information, visit http://www.vim.org/maillist.php
              • Tony Mechelynck
                On 03/05/10 23:45, Lech Lorens wrote: [...] ... Notwithstanding its name, the BOM provides more than just endianness detection. Actually, it is an encoding
                Message 7 of 17 , Jun 27, 2010
                • 0 Attachment
                  On 03/05/10 23:45, Lech Lorens wrote:
                  [...]
                  > I might be totally wrong basing my understanding of BOM and character
                  > sets mainly on Wikipedia, but I thought that setting 'bomb' for utf-8
                  > encoded files (which does not pose a risk of misinterpreting the
                  > contents due to endianness difference) didn't make much sense. For
                  > utf-16 that would be another thing.
                  >
                  > http://en.wikipedia.org/wiki/Byte-order_mark
                  >

                  Notwithstanding its name, the BOM provides more than just endianness
                  detection. Actually, it is an "encoding signal" which allows detecting
                  all five of the following encodings, assuming a UTF-16le file won't
                  start with a NULL:

                  utf-16be FE FF
                  utf-16le FF FE
                  utf-8 EF BB BF
                  utf-32be 00 00 FE FF
                  utf-32le FF FE 00 00

                  For instance, when I was still on XP, I noticed that WordPad could read
                  UTF-8 files but only if they started with a BOM. When writing what it
                  called "Unicode", what it produced was UTF-16le with BOM.

                  Any file starting 0xEF 0xBB 0xBF can be assumed to be in UTF-8.
                  Distinguishing UTF-8 from Latin1 or Windows-1252 would otherwise require
                  scanning the whole file, checking for invalid UTF-8 byte sequences.


                  Best regards,
                  Tony.
                  --
                  Life is a gift, living is an art. (Bram Moolenaar)

                  --
                  You received this message from the "vim_dev" maillist.
                  Do not top-post! Type your reply below the text you are replying to.
                  For more information, visit http://www.vim.org/maillist.php
                • Benjamin R. Haskell
                  ... Quoting the same Wikipedia article Lech mentioned: While [the] Unicode standard allows BOM in UTF-8, it does not require or recommend it. and
                  Message 8 of 17 , Jun 27, 2010
                  • 0 Attachment
                    On Sun, 27 Jun 2010, Tony Mechelynck wrote:

                    > On 03/05/10 23:45, Lech Lorens wrote:
                    > [...]
                    > > I might be totally wrong basing my understanding of BOM and
                    > > character sets mainly on Wikipedia, but I thought that setting
                    > > 'bomb' for utf-8 encoded files (which does not pose a risk of
                    > > misinterpreting the contents due to endianness difference) didn't
                    > > make much sense. For utf-16 that would be another thing.
                    > >
                    > > http://en.wikipedia.org/wiki/Byte-order_mark
                    > >
                    >
                    > Notwithstanding its name, the BOM provides more than just endianness
                    > detection. Actually, it is an "encoding signal" which allows detecting
                    > all five of the following encodings, assuming a UTF-16le file won't
                    > start with a NULL:
                    >
                    > utf-16be FE FF
                    > utf-16le FF FE
                    > utf-8 EF BB BF
                    > utf-32be 00 00 FE FF
                    > utf-32le FF FE 00 00
                    >
                    > For instance, when I was still on XP, I noticed that WordPad could
                    > read UTF-8 files but only if they started with a BOM. When writing
                    > what it called "Unicode", what it produced was UTF-16le with BOM.
                    >
                    > Any file starting 0xEF 0xBB 0xBF can be assumed to be in UTF-8.
                    > Distinguishing UTF-8 from Latin1 or Windows-1252 would otherwise
                    > require scanning the whole file, checking for invalid UTF-8 byte
                    > sequences.

                    Quoting the same Wikipedia article Lech mentioned:

                    "While [the] Unicode standard allows BOM in UTF-8, it does not require
                    or recommend it."

                    and paraphrasing the rest of that paragraph:

                    Using a BOM as the first character of a UTF-8-encoded file can cause
                    problems with the shebang line[1] in Unix-like systems. And
                    UTF-8-capable software is often written to assume UTF-8 unless otherwise
                    directed, so the U+FEFF character at the start of the stream is often
                    interpreted incorrectly.

                    The Unicode UTF-{8,16,32} & BOM FAQ probably worded it better than
                    Wikipedia or I[2].

                    --
                    Best,
                    Ben

                    [1] http://en.wikipedia.org/wiki/Shebang_(Unix)
                    [2] http://unicode.org/faq/utf_bom.html#bom5

                    --
                    You received this message from the "vim_dev" maillist.
                    Do not top-post! Type your reply below the text you are replying to.
                    For more information, visit http://www.vim.org/maillist.php
                  • Tony Mechelynck
                    ... Yes, a UTF-8 BOM will interfere with any software that has no knowledge of Unicode and expects some particular magic bytes at the start, or simply won t
                    Message 9 of 17 , Jun 27, 2010
                    • 0 Attachment
                      On 27/06/10 21:21, Benjamin R. Haskell wrote:
                      > On Sun, 27 Jun 2010, Tony Mechelynck wrote:
                      >
                      >> On 03/05/10 23:45, Lech Lorens wrote:
                      >> [...]
                      >>> I might be totally wrong basing my understanding of BOM and
                      >>> character sets mainly on Wikipedia, but I thought that setting
                      >>> 'bomb' for utf-8 encoded files (which does not pose a risk of
                      >>> misinterpreting the contents due to endianness difference) didn't
                      >>> make much sense. For utf-16 that would be another thing.
                      >>>
                      >>> http://en.wikipedia.org/wiki/Byte-order_mark
                      >>>
                      >>
                      >> Notwithstanding its name, the BOM provides more than just endianness
                      >> detection. Actually, it is an "encoding signal" which allows detecting
                      >> all five of the following encodings, assuming a UTF-16le file won't
                      >> start with a NULL:
                      >>
                      >> utf-16be FE FF
                      >> utf-16le FF FE
                      >> utf-8 EF BB BF
                      >> utf-32be 00 00 FE FF
                      >> utf-32le FF FE 00 00
                      >>
                      >> For instance, when I was still on XP, I noticed that WordPad could
                      >> read UTF-8 files but only if they started with a BOM. When writing
                      >> what it called "Unicode", what it produced was UTF-16le with BOM.
                      >>
                      >> Any file starting 0xEF 0xBB 0xBF can be assumed to be in UTF-8.
                      >> Distinguishing UTF-8 from Latin1 or Windows-1252 would otherwise
                      >> require scanning the whole file, checking for invalid UTF-8 byte
                      >> sequences.
                      >
                      > Quoting the same Wikipedia article Lech mentioned:
                      >
                      > "While [the] Unicode standard allows BOM in UTF-8, it does not require
                      > or recommend it."
                      >
                      > and paraphrasing the rest of that paragraph:
                      >
                      > Using a BOM as the first character of a UTF-8-encoded file can cause
                      > problems with the shebang line[1] in Unix-like systems. And
                      > UTF-8-capable software is often written to assume UTF-8 unless otherwise
                      > directed, so the U+FEFF character at the start of the stream is often
                      > interpreted incorrectly.
                      >
                      > The Unicode UTF-{8,16,32}& BOM FAQ probably worded it better than
                      > Wikipedia or I[2].
                      >

                      Yes, a UTF-8 BOM will interfere with any software that has no knowledge
                      of Unicode and expects some particular "magic bytes" at the start, or
                      simply won't accept 0xEF 0xBB 0xBF at the start of a document. The #!
                      shebang is just one example.

                      OTOH, in filetypes where UTF-8 is but one possibility among many, the
                      BOM is useful to specify the encoding or to confirm what was set
                      otherwise. Examples:

                      - HTML charset can be set by the HTTP "Content-Type" header (in an HTTP
                      or HTTPS transaction extrernal to the file), in a <meta
                      http-equiv="Content-Type" content="text/html; charset=something"> tag
                      (replacing "something" by the charset) within the <head> section, or by
                      a BOM. There are even official priority rules that tell browsers what to
                      do when two or three of the above are present (and they are necessary,
                      because -I'm told- some braindead hosts will send "Content-Type:
                      text/html; charset=iso-8859-1" for any *.htm or *.html file regardless
                      of BOM or <meta> tags).

                      - CSS charset can be set by a BOM.

                      - XML charset can be set (IIRC) by a <? header line or by a BOM

                      - XHTML is both HTML and XML so the methods of both apply to it.

                      Personally I use the following rules of thumb:

                      - Add a BOM to Unicode files meant for use by a browser.
                      - Don't add it to UTF-8 files mostly in US-ASCII (possibly with
                      codepoints above 0x7F in literals and comments) if they're meant for use
                      by a shell, the 'make' utility, or a compiler.
                      - Some Windows programs won't read UTF-8 correctly unless a BOM is present.
                      - On Windows, when a system file is said to be in 'Unicode' that usually
                      means UTF-16le with BOM.
                      - Vim helpfiles in a single directory must either all have a BOM, or
                      (recommended) all lack a BOM. If some have one and others not, the
                      ":helptags" command will abort with an error.

                      This does not explicitly cover all cases; when it doesn't (or in the
                      cases where some of the above rules conflict), I proceed by analogy and
                      by trial and error.


                      Best regards,
                      Tony.
                      --
                      One man's brain plus one other will produce one half as many ideas as
                      one man would have produced alone. These two plus two more will
                      produce half again as many ideas. These four plus four more begin to
                      represent a creative meeting, and the ratio changes to one quarter as
                      many ...
                      -- Anthony Chevins

                      --
                      You received this message from the "vim_dev" maillist.
                      Do not top-post! Type your reply below the text you are replying to.
                      For more information, visit http://www.vim.org/maillist.php
                    Your message has been successfully submitted and would be delivered to recipients shortly.