Loading ...
Sorry, an error occurred while loading the content.

RE: opening a Unicode file

Expand Messages
  • John (Eljay) Love-Jensen
    Hi Tony, ... Thanks Tony! I ve been wondering how to do that! Note: if the utf-16 file contains a BOM (which, often, it should/will), then it should not be
    Message 1 of 6 , Sep 14, 2007
    • 0 Attachment
      Hi Tony,

      > :e ++enc=utf-16 filename

      Thanks Tony! I've been wondering how to do that!

      Note: if the utf-16 file contains a BOM (which, often, it should/will), then it should not be necessary to specify utf-16le or utf-16be explicitly (and, indeed, would be incorrect according to Unicode standards to do so -- Vim probably does the friendly thing anyway).

      I say this not for Tony's edification, because I'm sure that he already knows this, but for everyone else who may be in msorens's situation.

      Also if you need to make sure the file is written with BOM you can use:

      :set bomb

      Or without the BOM:

      :set nobomb

      For some light reading on Unicode 5.0:

      http://www.amazon.com/dp/0321480910/

      HTH,
      --Eljay

      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_multibyte" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---
    • Tony Mechelynck
      ... If any Unicode file (here I mean UTF-8, UTF16le, UTF-16be, UTF-32le or UTF-32be -- I ll leave out GB18030 for the moment) starts with a BOM, Vim will
      Message 2 of 6 , Sep 14, 2007
      • 0 Attachment
        John (Eljay) Love-Jensen wrote:
        > Hi Tony,
        >
        >> :e ++enc=utf-16 filename
        >
        > Thanks Tony! I've been wondering how to do that!
        >
        > Note: if the utf-16 file contains a BOM (which, often, it should/will), then it should not be necessary to specify utf-16le or utf-16be explicitly (and, indeed, would be incorrect according to Unicode standards to do so -- Vim probably does the friendly thing anyway).

        If any Unicode file (here I mean UTF-8, UTF16le, UTF-16be, UTF-32le or
        UTF-32be -- I'll leave out GB18030 for the moment) starts with a BOM, Vim will
        recognise it _provided_ that your 'fileencodings' (plural) starts with
        "ucs-bom". In order for it to work properly, though, 'encoding' should already
        be UTF-8 (or UTF-16 or UTF-32, which Vim handles internally as UTF-8 to avoid
        problems with null bytes terminating C strings).

        Specifying explicitly that a file is, for instance, UTF-16le is IMHO not
        "wrong" (unless the file is actually in some other encoding, of course); it is
        just "unnecessary" if the file starts with a BOM.

        >
        > I say this not for Tony's edification, because I'm sure that he already knows this, but for everyone else who may be in msorens's situation.

        :-)

        >
        > Also if you need to make sure the file is written with BOM you can use:
        >
        > :set bomb
        >
        > Or without the BOM:
        >
        > :set nobomb

        ...and if you want to make sure that "newly created" Unicode files will (or
        won't) have a BOM by default you can write

        setglobal bomb
        or
        setglobal nobomb

        in your vimrc. (I use ":setglobal bomb" but YMMV.) This setting has no
        influence on non-Unicode files such as those in Latin1.

        >
        > For some light reading on Unicode 5.0:
        >
        > http://www.amazon.com/dp/0321480910/

        For serious reading, see also http://www.unicode.org/ -- and others.

        >
        > HTH,
        > --Eljay

        Best regards,
        Tony.
        --
        99 blocks of crud on the disk,
        99 blocks of crud!
        You patch a bug, and dump it again:
        100 blocks of crud on the disk!

        100 blocks of crud on the disk,
        100 blocks of crud!
        You patch a bug, and dump it again:
        101 blocks of crud on the disk! ...

        --~--~---------~--~----~------------~-------~--~----~
        You received this message from the "vim_multibyte" maillist.
        For more information, visit http://www.vim.org/maillist.php
        -~----------~----~----~----~------~----~------~--~---
      • mbbill
        àÅ ... --~--~---------~--~----~------------~-------~--~----~ You received this message from the vim_multibyte maillist. For more information, visit
        Message 3 of 6 , Sep 14, 2007
        • 0 Attachment




          >John (Eljay) Love-Jensen wrote:
          >> Hi Tony,
          >>
          >>> :e ++enc=utf-16 filename
          >>
          >> Thanks Tony! I've been wondering how to do that!
          >>
          >> Note: if the utf-16 file contains a BOM (which, often, it should/will), then it should not be necessary to specify utf-16le or utf-16be explicitly (and, indeed, would be incorrect according to Unicode standards to do so -- Vim probably does the friendly thing anyway).
          >
          >If any Unicode file (here I mean UTF-8, UTF16le, UTF-16be, UTF-32le or
          >UTF-32be -- I'll leave out GB18030 for the moment) starts with a BOM, Vim will
          >recognise it _provided_ that your 'fileencodings' (plural) starts with
          >"ucs-bom". In order for it to work properly, though, 'encoding' should already
          >be UTF-8 (or UTF-16 or UTF-32, which Vim handles internally as UTF-8 to avoid
          >problems with null bytes terminating C strings).
          >
          >Specifying explicitly that a file is, for instance, UTF-16le is IMHO not
          >"wrong" (unless the file is actually in some other encoding, of course); it is
          >just "unnecessary" if the file starts with a BOM.
          >
          >>
          >> I say this not for Tony's edification, because I'm sure that he already knows this, but for everyone else who may be in msorens's situation.
          >
          >:-)
          >
          >>
          >> Also if you need to make sure the file is written with BOM you can use:
          >>
          >> :set bomb
          >>
          >> Or without the BOM:
          >>
          >> :set nobomb
          >
          >....and if you want to make sure that "newly created" Unicode files will (or
          >won't) have a BOM by default you can write
          >
          > setglobal bomb
          >or
          > setglobal nobomb
          >
          >in your vimrc. (I use ":setglobal bomb" but YMMV.) This setting has no
          >influence on non-Unicode files such as those in Latin1.
          >
          >>
          >> For some light reading on Unicode 5.0:
          >>
          >> http://www.amazon.com/dp/0321480910/
          >
          >For serious reading, see also http://www.unicode.org/ -- and others.
          >
          >>
          >> HTH,
          >> --Eljay
          >
          >Best regards,
          >Tony.
          >--
          >99 blocks of crud on the disk,
          >99 blocks of crud!
          >You patch a bug, and dump it again:
          >100 blocks of crud on the disk!
          >
          >100 blocks of crud on the disk,
          >100 blocks of crud!
          >You patch a bug, and dump it again:
          >101 blocks of crud on the disk! ...
          >
          >>

          --~--~---------~--~----~------------~-------~--~----~
          You received this message from the "vim_multibyte" maillist.
          For more information, visit http://www.vim.org/maillist.php
          -~----------~----~----~----~------~----~------~--~---
        • Camillo Särs
          ... Beware, though, that if your environment defaults to utf-8 file encoding, then setting bomb will cause the BOM to be written to all new files. This can
          Message 4 of 6 , Sep 15, 2007
          • 0 Attachment
            Tony Mechelynck wrote:
            > ...and if you want to make sure that "newly created" Unicode files will (or
            > won't) have a BOM by default you can write
            >
            > setglobal bomb
            > or
            > setglobal nobomb
            >
            > in your vimrc. (I use ":setglobal bomb" but YMMV.) This setting has no
            > influence on non-Unicode files such as those in Latin1.

            Beware, though, that if your environment defaults to utf-8 file
            encoding, then setting "bomb" will cause the BOM to be written to all
            new files. This can become a problem when dealing with some legacy
            applications that don't expect to see those extra bytes at the
            beginning. Examples range from *nix shells and hashbang (#!) processing
            to Windows .ini file headings [...].

            So this setting may indeed cause some legacy apps to "bomb" on you.
            Pardon the pun, but I thought it was hilarious once I got over the "duh"
            factor after debugging.

            Regards,
            Camillo
            --
            Camillo Särs <ged@...> Aim for the impossible and you
            http://www.ged.fi will achieve the improbable

            --~--~---------~--~----~------------~-------~--~----~
            You received this message from the "vim_multibyte" maillist.
            For more information, visit http://www.vim.org/maillist.php
            -~----------~----~----~----~------~----~------~--~---
          Your message has been successfully submitted and would be delivered to recipients shortly.