Loading ...
Sorry, an error occurred while loading the content.

Re: "flexwiki" ftplugin causing problems ('bomb')

Expand Messages
  • Bram Moolenaar
    ... Setting bomb is weird. Unless the filetype requires the file to be written in utf-8 for the file to be working properly. George, is setting bomb
    Message 1 of 17 , May 2, 2010
    View Source
    • 0 Attachment
      Ron Aaron wrote:

      > I have recently started editing files with a '.wiki' extension, and
      > rather than getting the 'wikipedia' filetype, they pick up the
      > 'flexwiki' type. That's not the problem.
      >
      > The problem is that the 'flexwiki' filetype handler sets "bomb",
      > resulting in extra characters at the front of my utf8 files -- this
      > has caused problems with other software which reads those files (I
      > never have 'bomb' set).
      >
      > Scanning the ftplugins, it seems 'flexwiki' is the only one which sets
      > 'bomb'. Is it an ok thing for it to do? Also, why is "flexwiki" the
      > handler, when wikipedia is probably the most widely used wiki? But
      > that's a quibble.

      Setting 'bomb' is weird. Unless the filetype requires the file to be
      written in utf-8 for the file to be working properly.

      George, is setting 'bomb' really required? If so, how can we avoid that
      this happens when the flexwiki filetype is detected when it's actually
      another kind of file?

      As a guard, 'bomb' should only be set when 'encoding' is utf-8.
      This applies to 'fileencoding' as well.

      At the time of writing the www.flexwiki.com site was not available, thus
      I could not check any specification there.

      --
      A)bort, R)etry, P)lease don't bother me again

      /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
      /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
      \\\ download, build and distribute -- http://www.A-A-P.org ///
      \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

      --
      You received this message from the "vim_dev" maillist.
      Do not top-post! Type your reply below the text you are replying to.
      For more information, visit http://www.vim.org/maillist.php
    • George V. Reilly
      ... I haven t used Flexwiki in 3 years, so I forget the details of why bomb was needed. I think Flexwiki is dead or nearly so. It seems like mapping .wiki
      Message 2 of 17 , May 2, 2010
      View Source
      • 0 Attachment
        On Sun, May 2, 2010 at 10:37 AM, Bram Moolenaar <Bram@...> wrote:
        > Scanning the ftplugins, it seems 'flexwiki' is the only one which sets
        > 'bomb'.  Is it an ok thing for it to do?  Also, why is "flexwiki" the
        > handler, when wikipedia is probably the most widely used wiki?  But
        > that's a quibble.

        Setting 'bomb' is weird.  Unless the filetype requires the file to be
        written in utf-8 for the file to be working properly.

        George, is setting 'bomb' really required?  If so, how can we avoid that
        this happens when the flexwiki filetype is detected when it's actually
        another kind of file?

        As a guard, 'bomb' should only be set when 'encoding' is utf-8.
        This applies to 'fileencoding' as well.

        At the time of writing the www.flexwiki.com site was not available, thus
        I could not check any specification there.

        I haven't used Flexwiki in 3 years, so I forget the details of why 'bomb' was needed.

        I think Flexwiki is dead or nearly so. It seems like mapping .wiki files to wikipedia (MediaWiki?) rather than Flexwiki is far more useful. Feel free to remove the flexwiki mapping from ftplugin.vim.
        -- 
        /George V. Reilly  george@...  Twitter: @georgevreilly
        http://www.georgevreilly.com/blog  http://blogs.cozi.com/tech

        --
        You received this message from the "vim_dev" maillist.
        Do not top-post! Type your reply below the text you are replying to.
        For more information, visit http://www.vim.org/maillist.php
      • Bram Moolenaar
        ... What I did now is to disable recognizing .wiki files as flexwiki. Someone still using these files can re-enable it when needed. I can t find another file
        Message 3 of 17 , May 3, 2010
        View Source
        • 0 Attachment
          George V. Reilly wrote:

          > On Sun, May 2, 2010 at 10:37 AM, Bram Moolenaar <Bram@...> wrote:
          >
          > > > Scanning the ftplugins, it seems 'flexwiki' is the only one which sets
          > > > 'bomb'. Is it an ok thing for it to do? Also, why is "flexwiki" the
          > > > handler, when wikipedia is probably the most widely used wiki? But
          > > > that's a quibble.
          > >
          > > Setting 'bomb' is weird. Unless the filetype requires the file to be
          > > written in utf-8 for the file to be working properly.
          > >
          > > George, is setting 'bomb' really required? If so, how can we avoid that
          > > this happens when the flexwiki filetype is detected when it's actually
          > > another kind of file?
          > >
          > > As a guard, 'bomb' should only be set when 'encoding' is utf-8.
          > > This applies to 'fileencoding' as well.
          > >
          > > At the time of writing the www.flexwiki.com site was not available, thus
          > > I could not check any specification there.
          >
          >
          > I haven't used Flexwiki in 3 years, so I forget the details of why 'bomb'
          > was needed.
          >
          > I think Flexwiki is dead or nearly so. It seems like mapping .wiki files to
          > wikipedia (MediaWiki?) rather than Flexwiki is far more useful. Feel free to
          > remove the flexwiki mapping from ftplugin.vim.

          What I did now is to disable recognizing .wiki files as flexwiki.
          Someone still using these files can re-enable it when needed.

          I can't find another file format that uses the .wiki extension.
          Mediawiki uses .mw.

          --
          The goal of science is to build better mousetraps.
          The goal of nature is to build better mice.

          /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
          /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
          \\\ download, build and distribute -- http://www.A-A-P.org ///
          \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

          --
          You received this message from the "vim_dev" maillist.
          Do not top-post! Type your reply below the text you are replying to.
          For more information, visit http://www.vim.org/maillist.php
        • Lech Lorens
          ... [...] ... I might be totally wrong basing my understanding of BOM and character sets mainly on Wikipedia, but I thought that setting bomb for utf-8
          Message 4 of 17 , May 3, 2010
          View Source
          • 0 Attachment
            On 02-May-2010 Bram Moolenaar <Bram@...> wrote:
            >
            > Ron Aaron wrote:
            >
            > > I have recently started editing files with a '.wiki' extension, and
            > > rather than getting the 'wikipedia' filetype, they pick up the
            > > 'flexwiki' type. That's not the problem.
            > >
            > > The problem is that the 'flexwiki' filetype handler sets "bomb",
            > > resulting in extra characters at the front of my utf8 files -- this
            > > has caused problems with other software which reads those files (I
            > > never have 'bomb' set).
            > >
            > > Scanning the ftplugins, it seems 'flexwiki' is the only one which sets
            > > 'bomb'. Is it an ok thing for it to do? Also, why is "flexwiki" the
            > > handler, when wikipedia is probably the most widely used wiki? But
            > > that's a quibble.
            >
            > Setting 'bomb' is weird. Unless the filetype requires the file to be
            > written in utf-8 for the file to be working properly.
            [...]
            > As a guard, 'bomb' should only be set when 'encoding' is utf-8.
            > This applies to 'fileencoding' as well.

            I might be totally wrong basing my understanding of BOM and character
            sets mainly on Wikipedia, but I thought that setting 'bomb' for utf-8
            encoded files (which does not pose a risk of misinterpreting the
            contents due to endianness difference) didn't make much sense. For
            utf-16 that would be another thing.

            http://en.wikipedia.org/wiki/Byte-order_mark

            --
            Cheers,
            Lech

            --
            You received this message from the "vim_dev" maillist.
            Do not top-post! Type your reply below the text you are replying to.
            For more information, visit http://www.vim.org/maillist.php
          • Ron Aaron
            ... It s common to use the .wiki for any wiki text file; so making both it and .mw load MediaWiki syntax makes sense. -- For privacy, my GPG key signature
            Message 5 of 17 , May 3, 2010
            View Source
            • 0 Attachment
              On Monday 03 May 2010 23:12:42 Bram Moolenaar wrote:

              > What I did now is to disable recognizing .wiki files as flexwiki.
              > Someone still using these files can re-enable it when needed.
              >
              > I can't find another file format that uses the .wiki extension.
              > Mediawiki uses .mw.

              It's common to use the '.wiki' for any wiki text file; so making both it and
              '.mw' load MediaWiki syntax makes sense.

              --
              For privacy, my GPG key signature is: AD29415D
            • Bram Moolenaar
              ... There is no MediaWiki syntax file. -- hundred-and-one symptoms of being an internet addict: 9. All your daydreaming is preoccupied with getting a faster
              Message 6 of 17 , May 4, 2010
              View Source
              • 0 Attachment
                Ron Aaron wrote:

                > On Monday 03 May 2010 23:12:42 Bram Moolenaar wrote:
                >
                > > What I did now is to disable recognizing .wiki files as flexwiki.
                > > Someone still using these files can re-enable it when needed.
                > >
                > > I can't find another file format that uses the .wiki extension.
                > > Mediawiki uses .mw.
                >
                > It's common to use the '.wiki' for any wiki text file; so making both
                > it and '.mw' load MediaWiki syntax makes sense.

                There is no MediaWiki syntax file.

                --
                hundred-and-one symptoms of being an internet addict:
                9. All your daydreaming is preoccupied with getting a faster connection to the
                net: 28.8...ISDN...cable modem...T1...T3.

                /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
                \\\ download, build and distribute -- http://www.A-A-P.org ///
                \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

                --
                You received this message from the "vim_dev" maillist.
                Do not top-post! Type your reply below the text you are replying to.
                For more information, visit http://www.vim.org/maillist.php
              • Ron Aaron
                ... Sorry, it s called Wikipedia. -- Sending me something private? Use my GPG public key: AD29415D
                Message 7 of 17 , May 4, 2010
                View Source
                • 0 Attachment
                  On Tuesday 04 May 2010 21:52:54 Bram Moolenaar wrote:
                  >

                  > There is no MediaWiki syntax file.

                  Sorry, it's called Wikipedia.

                  --
                  Sending me something private?
                  Use my GPG public key: AD29415D
                • Charles Campbell
                  ... Hello! Ron, Bram was wanting a Wikipedia syntax file. I can t vouch for it, but perhaps you mean the one in:
                  Message 8 of 17 , May 4, 2010
                  View Source
                  • 0 Attachment
                    Ron Aaron wrote:
                    > On Tuesday 04 May 2010 21:52:54 Bram Moolenaar wrote:
                    >
                    >
                    >
                    >> There is no MediaWiki syntax file.
                    >>
                    >
                    > Sorry, it's called Wikipedia.
                    >
                    >
                    Hello!

                    Ron, Bram was wanting a Wikipedia syntax file. I can't vouch for it,
                    but perhaps you mean the one in:

                    http://www.vim.org/scripts/script.php?script_id=1787

                    Regards,
                    Chip Campbell

                    --
                    You received this message from the "vim_dev" maillist.
                    Do not top-post! Type your reply below the text you are replying to.
                    For more information, visit http://www.vim.org/maillist.php
                  • ron
                    On May 4, 11:57 pm, Charles Campbell ... I think that s the one I use, yes. -- You received this message from the vim_dev
                    Message 9 of 17 , May 4, 2010
                    View Source
                    • 0 Attachment
                      On May 4, 11:57 pm, Charles Campbell <Charles.E.Campb...@...>
                      wrote:

                      > Ron, Bram was wanting a Wikipedia syntax file.  I can't vouch for it,
                      > but perhaps you mean the one in:
                      >
                      > http://www.vim.org/scripts/script.php?script_id=1787

                      I think that's the one I use, yes.

                      --
                      You received this message from the "vim_dev" maillist.
                      Do not top-post! Type your reply below the text you are replying to.
                      For more information, visit http://www.vim.org/maillist.php
                    • Tony Mechelynck
                      On 03/05/10 23:45, Lech Lorens wrote: [...] ... Notwithstanding its name, the BOM provides more than just endianness detection. Actually, it is an encoding
                      Message 10 of 17 , Jun 27, 2010
                      View Source
                      • 0 Attachment
                        On 03/05/10 23:45, Lech Lorens wrote:
                        [...]
                        > I might be totally wrong basing my understanding of BOM and character
                        > sets mainly on Wikipedia, but I thought that setting 'bomb' for utf-8
                        > encoded files (which does not pose a risk of misinterpreting the
                        > contents due to endianness difference) didn't make much sense. For
                        > utf-16 that would be another thing.
                        >
                        > http://en.wikipedia.org/wiki/Byte-order_mark
                        >

                        Notwithstanding its name, the BOM provides more than just endianness
                        detection. Actually, it is an "encoding signal" which allows detecting
                        all five of the following encodings, assuming a UTF-16le file won't
                        start with a NULL:

                        utf-16be FE FF
                        utf-16le FF FE
                        utf-8 EF BB BF
                        utf-32be 00 00 FE FF
                        utf-32le FF FE 00 00

                        For instance, when I was still on XP, I noticed that WordPad could read
                        UTF-8 files but only if they started with a BOM. When writing what it
                        called "Unicode", what it produced was UTF-16le with BOM.

                        Any file starting 0xEF 0xBB 0xBF can be assumed to be in UTF-8.
                        Distinguishing UTF-8 from Latin1 or Windows-1252 would otherwise require
                        scanning the whole file, checking for invalid UTF-8 byte sequences.


                        Best regards,
                        Tony.
                        --
                        Life is a gift, living is an art. (Bram Moolenaar)

                        --
                        You received this message from the "vim_dev" maillist.
                        Do not top-post! Type your reply below the text you are replying to.
                        For more information, visit http://www.vim.org/maillist.php
                      • Benjamin R. Haskell
                        ... Quoting the same Wikipedia article Lech mentioned: While [the] Unicode standard allows BOM in UTF-8, it does not require or recommend it. and
                        Message 11 of 17 , Jun 27, 2010
                        View Source
                        • 0 Attachment
                          On Sun, 27 Jun 2010, Tony Mechelynck wrote:

                          > On 03/05/10 23:45, Lech Lorens wrote:
                          > [...]
                          > > I might be totally wrong basing my understanding of BOM and
                          > > character sets mainly on Wikipedia, but I thought that setting
                          > > 'bomb' for utf-8 encoded files (which does not pose a risk of
                          > > misinterpreting the contents due to endianness difference) didn't
                          > > make much sense. For utf-16 that would be another thing.
                          > >
                          > > http://en.wikipedia.org/wiki/Byte-order_mark
                          > >
                          >
                          > Notwithstanding its name, the BOM provides more than just endianness
                          > detection. Actually, it is an "encoding signal" which allows detecting
                          > all five of the following encodings, assuming a UTF-16le file won't
                          > start with a NULL:
                          >
                          > utf-16be FE FF
                          > utf-16le FF FE
                          > utf-8 EF BB BF
                          > utf-32be 00 00 FE FF
                          > utf-32le FF FE 00 00
                          >
                          > For instance, when I was still on XP, I noticed that WordPad could
                          > read UTF-8 files but only if they started with a BOM. When writing
                          > what it called "Unicode", what it produced was UTF-16le with BOM.
                          >
                          > Any file starting 0xEF 0xBB 0xBF can be assumed to be in UTF-8.
                          > Distinguishing UTF-8 from Latin1 or Windows-1252 would otherwise
                          > require scanning the whole file, checking for invalid UTF-8 byte
                          > sequences.

                          Quoting the same Wikipedia article Lech mentioned:

                          "While [the] Unicode standard allows BOM in UTF-8, it does not require
                          or recommend it."

                          and paraphrasing the rest of that paragraph:

                          Using a BOM as the first character of a UTF-8-encoded file can cause
                          problems with the shebang line[1] in Unix-like systems. And
                          UTF-8-capable software is often written to assume UTF-8 unless otherwise
                          directed, so the U+FEFF character at the start of the stream is often
                          interpreted incorrectly.

                          The Unicode UTF-{8,16,32} & BOM FAQ probably worded it better than
                          Wikipedia or I[2].

                          --
                          Best,
                          Ben

                          [1] http://en.wikipedia.org/wiki/Shebang_(Unix)
                          [2] http://unicode.org/faq/utf_bom.html#bom5

                          --
                          You received this message from the "vim_dev" maillist.
                          Do not top-post! Type your reply below the text you are replying to.
                          For more information, visit http://www.vim.org/maillist.php
                        • Tony Mechelynck
                          ... Yes, a UTF-8 BOM will interfere with any software that has no knowledge of Unicode and expects some particular magic bytes at the start, or simply won t
                          Message 12 of 17 , Jun 27, 2010
                          View Source
                          • 0 Attachment
                            On 27/06/10 21:21, Benjamin R. Haskell wrote:
                            > On Sun, 27 Jun 2010, Tony Mechelynck wrote:
                            >
                            >> On 03/05/10 23:45, Lech Lorens wrote:
                            >> [...]
                            >>> I might be totally wrong basing my understanding of BOM and
                            >>> character sets mainly on Wikipedia, but I thought that setting
                            >>> 'bomb' for utf-8 encoded files (which does not pose a risk of
                            >>> misinterpreting the contents due to endianness difference) didn't
                            >>> make much sense. For utf-16 that would be another thing.
                            >>>
                            >>> http://en.wikipedia.org/wiki/Byte-order_mark
                            >>>
                            >>
                            >> Notwithstanding its name, the BOM provides more than just endianness
                            >> detection. Actually, it is an "encoding signal" which allows detecting
                            >> all five of the following encodings, assuming a UTF-16le file won't
                            >> start with a NULL:
                            >>
                            >> utf-16be FE FF
                            >> utf-16le FF FE
                            >> utf-8 EF BB BF
                            >> utf-32be 00 00 FE FF
                            >> utf-32le FF FE 00 00
                            >>
                            >> For instance, when I was still on XP, I noticed that WordPad could
                            >> read UTF-8 files but only if they started with a BOM. When writing
                            >> what it called "Unicode", what it produced was UTF-16le with BOM.
                            >>
                            >> Any file starting 0xEF 0xBB 0xBF can be assumed to be in UTF-8.
                            >> Distinguishing UTF-8 from Latin1 or Windows-1252 would otherwise
                            >> require scanning the whole file, checking for invalid UTF-8 byte
                            >> sequences.
                            >
                            > Quoting the same Wikipedia article Lech mentioned:
                            >
                            > "While [the] Unicode standard allows BOM in UTF-8, it does not require
                            > or recommend it."
                            >
                            > and paraphrasing the rest of that paragraph:
                            >
                            > Using a BOM as the first character of a UTF-8-encoded file can cause
                            > problems with the shebang line[1] in Unix-like systems. And
                            > UTF-8-capable software is often written to assume UTF-8 unless otherwise
                            > directed, so the U+FEFF character at the start of the stream is often
                            > interpreted incorrectly.
                            >
                            > The Unicode UTF-{8,16,32}& BOM FAQ probably worded it better than
                            > Wikipedia or I[2].
                            >

                            Yes, a UTF-8 BOM will interfere with any software that has no knowledge
                            of Unicode and expects some particular "magic bytes" at the start, or
                            simply won't accept 0xEF 0xBB 0xBF at the start of a document. The #!
                            shebang is just one example.

                            OTOH, in filetypes where UTF-8 is but one possibility among many, the
                            BOM is useful to specify the encoding or to confirm what was set
                            otherwise. Examples:

                            - HTML charset can be set by the HTTP "Content-Type" header (in an HTTP
                            or HTTPS transaction extrernal to the file), in a <meta
                            http-equiv="Content-Type" content="text/html; charset=something"> tag
                            (replacing "something" by the charset) within the <head> section, or by
                            a BOM. There are even official priority rules that tell browsers what to
                            do when two or three of the above are present (and they are necessary,
                            because -I'm told- some braindead hosts will send "Content-Type:
                            text/html; charset=iso-8859-1" for any *.htm or *.html file regardless
                            of BOM or <meta> tags).

                            - CSS charset can be set by a BOM.

                            - XML charset can be set (IIRC) by a <? header line or by a BOM

                            - XHTML is both HTML and XML so the methods of both apply to it.

                            Personally I use the following rules of thumb:

                            - Add a BOM to Unicode files meant for use by a browser.
                            - Don't add it to UTF-8 files mostly in US-ASCII (possibly with
                            codepoints above 0x7F in literals and comments) if they're meant for use
                            by a shell, the 'make' utility, or a compiler.
                            - Some Windows programs won't read UTF-8 correctly unless a BOM is present.
                            - On Windows, when a system file is said to be in 'Unicode' that usually
                            means UTF-16le with BOM.
                            - Vim helpfiles in a single directory must either all have a BOM, or
                            (recommended) all lack a BOM. If some have one and others not, the
                            ":helptags" command will abort with an error.

                            This does not explicitly cover all cases; when it doesn't (or in the
                            cases where some of the above rules conflict), I proceed by analogy and
                            by trial and error.


                            Best regards,
                            Tony.
                            --
                            One man's brain plus one other will produce one half as many ideas as
                            one man would have produced alone. These two plus two more will
                            produce half again as many ideas. These four plus four more begin to
                            represent a creative meeting, and the ratio changes to one quarter as
                            many ...
                            -- Anthony Chevins

                            --
                            You received this message from the "vim_dev" maillist.
                            Do not top-post! Type your reply below the text you are replying to.
                            For more information, visit http://www.vim.org/maillist.php
                          Your message has been successfully submitted and would be delivered to recipients shortly.