Loading ...
Sorry, an error occurred while loading the content.
 

RE: utf8 BOM

Expand Messages
  • Gene Kwiecinski
    ... Rarely if ever poke utf8 files (usually find the weirdo chars and turn em into named entities instead), but iirr, forcing a BOM is done with ... forcing
    Message 1 of 9 , Apr 1, 2008
      >I use utf8, and I would like to know how to save a file with BOM
      >1. always added
      >2. always removed
      >3. unchanged
      >and what is the default behaviour of vim?

      Rarely if ever poke utf8 files (usually find the weirdo chars and turn
      'em into named entities instead), but iirr, forcing a BOM is done with

      :set bomb

      forcing it off with

      :set nobomb

      and the default behavior is whatever the file has when you start to edit
      it, with or without.

      Iirr, any nonascii utf8 char (>1byte) will normally force the BOM to be
      turned on. Might be wrong, though, and it might assume/require
      little-endian multibyte chars per the utf8 spec (ie, a BOM would be
      redundant), and iirr the BOM is only really required with utf16/ucs2
      encoding.

      Tony M's the one to ask, though. :D


      Then again, there's always

      :help bomb

      etc. Lis, haven't poked around with this kind of thing but on rare
      occasion.

      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_use" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---
    • Tony Mechelynck
      ... The default behaviour for existing files is to leave it unchanged. The default for new files depends on the global setting of bomb (which is off by
      Message 2 of 9 , Apr 1, 2008
        bill lam wrote:
        > hello,
        > I use utf8, and I would like to know how to save a file with BOM
        > 1. always added
        > 2. always removed
        > 3. unchanged
        >
        > and what is the default behaviour of vim?
        >
        > thank in advance.

        The default behaviour for existing files is to leave it unchanged. The
        default for new files depends on the global setting of 'bomb' (which is
        off by default, but I use

        setglobal bomb

        in my vimrc to make it on by default).

        To always add it, even on existing files:

        au BufReadPost setlocal bomb

        To always remove it, even on existing files:

        au BufReadPost setlocal nobomb

        The above autocommands allow you to change the setting manually for one
        file and it will be saved that way -- unless you read that file again,
        of course.

        Note that setting 'bomb' or 'nobomb' also sets 'modified'.

        To normally leave it unchanged you don't need to do anything, then you
        can use ":setlocal bomb" or ":setlocal nobomb" to change it for the
        current file. (Use ":setlocal" in that case, not ":set", to avoid
        changing the default for new files).

        The 'bomb' setting has no effect when a file is saved in any
        'fileencoding' other than utf-8, ucs-2, ucs-4, utf-16, utf-32 (and the
        le and be variants of the latter four).

        see
        :help 'bomb'
        :help :setglobal
        :help :setlocal
        :help Unicode
        http://vim.wikia.org/wiki/Working_with_Unicode


        Best regards,
        Tony.
        --
        Expense Accounts, n.:
        Corporate food stamps.

        --~--~---------~--~----~------------~-------~--~----~
        You received this message from the "vim_use" maillist.
        For more information, visit http://www.vim.org/maillist.php
        -~----------~----~----~----~------~----~------~--~---
      • bill lam
        ... Gene, Thanks. Just for the record, utf8 and hence its BOM is endian neutral. regards, --~--~---------~--~----~------------~-------~--~----~ You received
        Message 3 of 9 , Apr 1, 2008
          Gene Kwiecinski wrote:
          > turned on. Might be wrong, though, and it might assume/require
          > little-endian multibyte chars per the utf8 spec (ie, a BOM would be
          > redundant), and iirr the BOM is only really required with utf16/ucs2
          > encoding.

          Gene,
          Thanks. Just for the record, utf8 and hence its BOM is endian neutral.

          regards,


          --~--~---------~--~----~------------~-------~--~----~
          You received this message from the "vim_use" maillist.
          For more information, visit http://www.vim.org/maillist.php
          -~----------~----~----~----~------~----~------~--~---
        • bill lam
          ... Tony, Thank you for detail answer. I need to write BOM only in some particular cases so that I ll just leaved it as setglobal nobomb. regards,
          Message 4 of 9 , Apr 1, 2008
            Tony Mechelynck wrote:
            > The default behaviour for existing files is to leave it unchanged. The
            > default for new files depends on the global setting of 'bomb' (which is
            > off by default, but I use
            >
            > setglobal bomb

            Tony,
            Thank you for detail answer. I need to write BOM only in some particular cases
            so that I'll just leaved it as setglobal nobomb.

            regards,



            --~--~---------~--~----~------------~-------~--~----~
            You received this message from the "vim_use" maillist.
            For more information, visit http://www.vim.org/maillist.php
            -~----------~----~----~----~------~----~------~--~---
          • Tony Mechelynck
            ... With UTF-8, a BOM does not specify endianness but it does specify that the file is in UTF-8 rather than UTF-16 or UTF-32. For instance, Windows XP WordPad,
            Message 5 of 9 , Apr 1, 2008
              bill lam wrote:
              > Gene Kwiecinski wrote:
              >> turned on. Might be wrong, though, and it might assume/require
              >> little-endian multibyte chars per the utf8 spec (ie, a BOM would be
              >> redundant), and iirr the BOM is only really required with utf16/ucs2
              >> encoding.
              >
              > Gene,
              > Thanks. Just for the record, utf8 and hence its BOM is endian neutral.
              >
              > regards,

              With UTF-8, a BOM does not specify endianness but it does specify that
              the file is in UTF-8 rather than UTF-16 or UTF-32. For instance, Windows
              XP WordPad, which cannot _write_ Unicode files except in UTF-16le (with
              BOM), can _read_ UTF-8 files if they have a BOM.

              Therefore, IMHO the appellation "BOM" (Byte Order Mark) is a misnomer; I
              would have preferred "Unicode encoding marker" or some such. It can take
              the following hex values:

              EF BB BF UTF-8
              FE FF UTF-16be
              FF FE UTF-16le
              00 00 FE FF UTF-32be
              FF FE 00 00 UTF-32le

              As can be seen, uniqueness assumes that UTF-16le files don't begin with
              a NULL.

              Best regards,
              Tony.
              --
              I've given up reading books; I find it takes my mind off myself.

              --~--~---------~--~----~------------~-------~--~----~
              You received this message from the "vim_use" maillist.
              For more information, visit http://www.vim.org/maillist.php
              -~----------~----~----~----~------~----~------~--~---
            • Tony Mechelynck
              ... Since I don t need to avoid the BOM except in special cases (such as executable scripts, where the #! shebang isn t recognised if it is preceded by a BOM,
              Message 6 of 9 , Apr 1, 2008
                bill lam wrote:
                > Tony Mechelynck wrote:
                >> The default behaviour for existing files is to leave it unchanged. The
                >> default for new files depends on the global setting of 'bomb' (which is
                >> off by default, but I use
                >>
                >> setglobal bomb
                >
                > Tony,
                > Thank you for detail answer. I need to write BOM only in some particular cases
                > so that I'll just leaved it as setglobal nobomb.
                >
                > regards,

                Since I don't need to avoid the BOM except in special cases (such as
                executable scripts, where the #! shebang isn't recognised if it is
                preceded by a BOM, and most of my scripts are in Latin1 anyway) I leave
                it on for UTF-8 files. My custom statusline (which displays, in addition
                to the usual stuff, the file's encoding and BOM status) thus
                discriminates between:

                - files in 7-bit US-ASCII, seen as UTF-8 without BOM
                - files in Latin1 with characters above 0x7F, seen as Latin1
                - "true" UTF-8 files, seen as UTF-8 with BOM.

                Best regards,
                Tony.
                --
                The sum of the Universe is zero.

                --~--~---------~--~----~------------~-------~--~----~
                You received this message from the "vim_use" maillist.
                For more information, visit http://www.vim.org/maillist.php
                -~----------~----~----~----~------~----~------~--~---
              • Don M
                Tony, Can you show how you set your custom status line to include that additional info? Thanks for all of the information. Don On Apr 1, 11:25 pm, Tony
                Message 7 of 9 , Apr 2, 2008
                  Tony,

                  Can you show how you set your custom status line to include that
                  additional info? Thanks for all of the information.

                  Don

                  On Apr 1, 11:25 pm, Tony Mechelynck <antoine.mechely...@...>
                  wrote:
                  > bill lam wrote:
                  > > Tony Mechelynck wrote:
                  > >> The default behaviour for existing files is to leave it unchanged. The
                  > >> default for new files depends on the global setting of 'bomb' (which is
                  > >> off by default, but I use
                  >
                  > >> setglobal bomb
                  >
                  > > Tony,
                  > > Thank you for detail answer. I need to write BOM only in some particular cases
                  > > so that I'll just leaved it as setglobal nobomb.
                  >
                  > > regards,
                  >
                  > Since I don't need to avoid the BOM except in special cases (such as
                  > executable scripts, where the #! shebang isn't recognised if it is
                  > preceded by a BOM, and most of my scripts are in Latin1 anyway) I leave
                  > it on for UTF-8 files. My custom statusline (which displays, in addition
                  > to the usual stuff, the file's encoding and BOM status) thus
                  > discriminates between:
                  >
                  > - files in 7-bit US-ASCII, seen as UTF-8 without BOM
                  > - files in Latin1 with characters above 0x7F, seen as Latin1
                  > - "true" UTF-8 files, seen as UTF-8 with BOM.
                  >
                  > Best regards,
                  > Tony.
                  > --
                  > The sum of the Universe is zero.
                  --~--~---------~--~----~------------~-------~--~----~
                  You received this message from the "vim_use" maillist.
                  For more information, visit http://www.vim.org/maillist.php
                  -~----------~----~----~----~------~----~------~--~---
                • Tony Mechelynck
                  ... set laststatus=2 if has( statusline ) set statusline=%
                  Message 8 of 9 , Apr 2, 2008
                    Don M wrote:
                    > Tony,
                    >
                    > Can you show how you set your custom status line to include that
                    > additional info? Thanks for all of the information.
                    >
                    > Don

                    set laststatus=2
                    if has("statusline")
                    set statusline=%<%f\ %h%m%r%=%k[%{(&fenc\ ==\
                    \"\"?&enc:&fenc).(&bomb?\",BOM\":\"\")}]\ %-12.(%l,%c%V%)\ %P
                    endif

                    These are four lines (set, if, set, endif). If your mailer or mine added
                    a spurious linebreak, it may or may not be at a space character. To see
                    the difference, remember that spaces (and quotes) in a ":set" option
                    value must be backslash-escaped.

                    The file's encoding (and BOM, if set) appear at the right-hand side of
                    the status line, left of the line and column numbers.

                    Of course, it requires the +statusline feature, which normally means a
                    "Normal" version of Vim or larger. If you put the above snippet in your
                    vimrc and then run a "tiny" or "small" Vim, there will be no error, but
                    you'll just get the same status line as you always do with that version.

                    See ":help 'statusline'"


                    Best regards,
                    Tony.
                    --
                    Ingrate, n.:
                    A man who bites the hand that feeds him, and then complains of
                    indigestion.

                    --~--~---------~--~----~------------~-------~--~----~
                    You received this message from the "vim_use" maillist.
                    For more information, visit http://www.vim.org/maillist.php
                    -~----------~----~----~----~------~----~------~--~---
                  Your message has been successfully submitted and would be delivered to recipients shortly.