Loading ...
Sorry, an error occurred while loading the content.

Re: utf8 BOM

Expand Messages
  • bill lam
    ... Gene, Thanks. Just for the record, utf8 and hence its BOM is endian neutral. regards, --~--~---------~--~----~------------~-------~--~----~ You received
    Message 1 of 9 , Apr 1 7:25 PM
    View Source
    • 0 Attachment
      Gene Kwiecinski wrote:
      > turned on. Might be wrong, though, and it might assume/require
      > little-endian multibyte chars per the utf8 spec (ie, a BOM would be
      > redundant), and iirr the BOM is only really required with utf16/ucs2
      > encoding.

      Gene,
      Thanks. Just for the record, utf8 and hence its BOM is endian neutral.

      regards,


      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_use" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---
    • bill lam
      ... Tony, Thank you for detail answer. I need to write BOM only in some particular cases so that I ll just leaved it as setglobal nobomb. regards,
      Message 2 of 9 , Apr 1 7:34 PM
      View Source
      • 0 Attachment
        Tony Mechelynck wrote:
        > The default behaviour for existing files is to leave it unchanged. The
        > default for new files depends on the global setting of 'bomb' (which is
        > off by default, but I use
        >
        > setglobal bomb

        Tony,
        Thank you for detail answer. I need to write BOM only in some particular cases
        so that I'll just leaved it as setglobal nobomb.

        regards,



        --~--~---------~--~----~------------~-------~--~----~
        You received this message from the "vim_use" maillist.
        For more information, visit http://www.vim.org/maillist.php
        -~----------~----~----~----~------~----~------~--~---
      • Tony Mechelynck
        ... With UTF-8, a BOM does not specify endianness but it does specify that the file is in UTF-8 rather than UTF-16 or UTF-32. For instance, Windows XP WordPad,
        Message 3 of 9 , Apr 1 11:11 PM
        View Source
        • 0 Attachment
          bill lam wrote:
          > Gene Kwiecinski wrote:
          >> turned on. Might be wrong, though, and it might assume/require
          >> little-endian multibyte chars per the utf8 spec (ie, a BOM would be
          >> redundant), and iirr the BOM is only really required with utf16/ucs2
          >> encoding.
          >
          > Gene,
          > Thanks. Just for the record, utf8 and hence its BOM is endian neutral.
          >
          > regards,

          With UTF-8, a BOM does not specify endianness but it does specify that
          the file is in UTF-8 rather than UTF-16 or UTF-32. For instance, Windows
          XP WordPad, which cannot _write_ Unicode files except in UTF-16le (with
          BOM), can _read_ UTF-8 files if they have a BOM.

          Therefore, IMHO the appellation "BOM" (Byte Order Mark) is a misnomer; I
          would have preferred "Unicode encoding marker" or some such. It can take
          the following hex values:

          EF BB BF UTF-8
          FE FF UTF-16be
          FF FE UTF-16le
          00 00 FE FF UTF-32be
          FF FE 00 00 UTF-32le

          As can be seen, uniqueness assumes that UTF-16le files don't begin with
          a NULL.

          Best regards,
          Tony.
          --
          I've given up reading books; I find it takes my mind off myself.

          --~--~---------~--~----~------------~-------~--~----~
          You received this message from the "vim_use" maillist.
          For more information, visit http://www.vim.org/maillist.php
          -~----------~----~----~----~------~----~------~--~---
        • Tony Mechelynck
          ... Since I don t need to avoid the BOM except in special cases (such as executable scripts, where the #! shebang isn t recognised if it is preceded by a BOM,
          Message 4 of 9 , Apr 1 11:25 PM
          View Source
          • 0 Attachment
            bill lam wrote:
            > Tony Mechelynck wrote:
            >> The default behaviour for existing files is to leave it unchanged. The
            >> default for new files depends on the global setting of 'bomb' (which is
            >> off by default, but I use
            >>
            >> setglobal bomb
            >
            > Tony,
            > Thank you for detail answer. I need to write BOM only in some particular cases
            > so that I'll just leaved it as setglobal nobomb.
            >
            > regards,

            Since I don't need to avoid the BOM except in special cases (such as
            executable scripts, where the #! shebang isn't recognised if it is
            preceded by a BOM, and most of my scripts are in Latin1 anyway) I leave
            it on for UTF-8 files. My custom statusline (which displays, in addition
            to the usual stuff, the file's encoding and BOM status) thus
            discriminates between:

            - files in 7-bit US-ASCII, seen as UTF-8 without BOM
            - files in Latin1 with characters above 0x7F, seen as Latin1
            - "true" UTF-8 files, seen as UTF-8 with BOM.

            Best regards,
            Tony.
            --
            The sum of the Universe is zero.

            --~--~---------~--~----~------------~-------~--~----~
            You received this message from the "vim_use" maillist.
            For more information, visit http://www.vim.org/maillist.php
            -~----------~----~----~----~------~----~------~--~---
          • Don M
            Tony, Can you show how you set your custom status line to include that additional info? Thanks for all of the information. Don On Apr 1, 11:25 pm, Tony
            Message 5 of 9 , Apr 2 10:17 AM
            View Source
            • 0 Attachment
              Tony,

              Can you show how you set your custom status line to include that
              additional info? Thanks for all of the information.

              Don

              On Apr 1, 11:25 pm, Tony Mechelynck <antoine.mechely...@...>
              wrote:
              > bill lam wrote:
              > > Tony Mechelynck wrote:
              > >> The default behaviour for existing files is to leave it unchanged. The
              > >> default for new files depends on the global setting of 'bomb' (which is
              > >> off by default, but I use
              >
              > >> setglobal bomb
              >
              > > Tony,
              > > Thank you for detail answer. I need to write BOM only in some particular cases
              > > so that I'll just leaved it as setglobal nobomb.
              >
              > > regards,
              >
              > Since I don't need to avoid the BOM except in special cases (such as
              > executable scripts, where the #! shebang isn't recognised if it is
              > preceded by a BOM, and most of my scripts are in Latin1 anyway) I leave
              > it on for UTF-8 files. My custom statusline (which displays, in addition
              > to the usual stuff, the file's encoding and BOM status) thus
              > discriminates between:
              >
              > - files in 7-bit US-ASCII, seen as UTF-8 without BOM
              > - files in Latin1 with characters above 0x7F, seen as Latin1
              > - "true" UTF-8 files, seen as UTF-8 with BOM.
              >
              > Best regards,
              > Tony.
              > --
              > The sum of the Universe is zero.
              --~--~---------~--~----~------------~-------~--~----~
              You received this message from the "vim_use" maillist.
              For more information, visit http://www.vim.org/maillist.php
              -~----------~----~----~----~------~----~------~--~---
            • Tony Mechelynck
              ... set laststatus=2 if has( statusline ) set statusline=%
              Message 6 of 9 , Apr 2 12:50 PM
              View Source
              • 0 Attachment
                Don M wrote:
                > Tony,
                >
                > Can you show how you set your custom status line to include that
                > additional info? Thanks for all of the information.
                >
                > Don

                set laststatus=2
                if has("statusline")
                set statusline=%<%f\ %h%m%r%=%k[%{(&fenc\ ==\
                \"\"?&enc:&fenc).(&bomb?\",BOM\":\"\")}]\ %-12.(%l,%c%V%)\ %P
                endif

                These are four lines (set, if, set, endif). If your mailer or mine added
                a spurious linebreak, it may or may not be at a space character. To see
                the difference, remember that spaces (and quotes) in a ":set" option
                value must be backslash-escaped.

                The file's encoding (and BOM, if set) appear at the right-hand side of
                the status line, left of the line and column numbers.

                Of course, it requires the +statusline feature, which normally means a
                "Normal" version of Vim or larger. If you put the above snippet in your
                vimrc and then run a "tiny" or "small" Vim, there will be no error, but
                you'll just get the same status line as you always do with that version.

                See ":help 'statusline'"


                Best regards,
                Tony.
                --
                Ingrate, n.:
                A man who bites the hand that feeds him, and then complains of
                indigestion.

                --~--~---------~--~----~------------~-------~--~----~
                You received this message from the "vim_use" maillist.
                For more information, visit http://www.vim.org/maillist.php
                -~----------~----~----~----~------~----~------~--~---
              Your message has been successfully submitted and would be delivered to recipients shortly.