Loading ...
Sorry, an error occurred while loading the content.

Re: [RFC] Default 'encoding' to UTF-8

Expand Messages
  • James Vega
    ... According to the help, utf-8 hasn t been made the default for encoding in GTK2 builds to prevent different behavior of the terminal and GUI versions.
    Message 1 of 21 , Mar 2, 2009
    • 0 Attachment
      On Tue, Mar 03, 2009 at 03:32:45AM +0100, Tony Mechelynck wrote:
      >
      > On 03/03/09 01:40, James Vega wrote:
      > > ...
      > > 3) File encoding detection ('fencs') defaults to a value that is
      > > unlikely to correctly work with most interesting (non-ascii) files.
      > >
      > > Defaulting 'enc' to UTF-8 helps address these problems.
      > >
      > > ...
      > > 3) File encoding detection now has a sane default value which means new
      > > users are less likely to encounter problems when editing files of
      > > various encodings.
      > > ...
      >
      > 1) When using gvim with GTK2 GUI, setting 'encoding' to UTF-8 is the
      > preferred option, though not enforced. However in that case,
      > 'termencoding' is fixed as UTF-8 (unchangeable) in the GUI. I wonder
      > whether it is possible to configure a GTK2 build with --disable-multibyte.

      According to the help, "utf-8" hasn't been made the default for
      'encoding' in GTK2 builds to prevent different behavior of the terminal
      and GUI versions. Since supporting multibyte is pretty much standard on
      any relatively recent OS, trending towards UTF-8 instead of the other
      way around seems more logical.

      > 2) Vim compiled with the --disable-multibyte configure option cannot use
      > UTF-8, or any other multibyte encoding; in fact it doesn't even accept
      > the 'encoding' option as valid.

      Is there a reason to allow building Vim without multibyte support?
      Always having multibyte support would make the code simpler/smaller.

      > 3) 'termencoding' (the encoding used for the keyboard and, in Console
      > mode, for the display) defaults to empty (which means, fall back to
      > 'encoding') except when running in GUI mode with GTK2. This means that,
      > by default, communication between Vim and the user is done in the system
      > locale.

      Unless 'encoding' is set in the user's ~/.vimrc, which in my experience is
      pretty common. I'm not sure how closely that aligns with the overall usage
      patterns, though.

      > 4) It _is_ possible to set 'encoding' to UTF-8 in the vimrc, with
      > appropriate safeguards, if used at the right spot in the "chronology" of
      > successive actions (and in particular, before defining mappings or
      > setting string option values including characters above 0x7F).

      As per my response to your previous point, 'termencoding' is less likely to
      be based on their locale even though it should always be based on their
      locale.

      > On this Linux box, my locale encoding is UTF-8, but that was not the
      > case when I acquired a serious interest in Vim: the latest version at
      > the time was some patchlevel of Vim 6.1 and I was using Win98. A
      > compelling reason for doing so would be a desire to create or edit
      > files using characters not supported by your system locale, for
      > instance multi-charset files in UTF-8 when the Windows locale is
      > Windows-1252, as it was (IIRC) on that W98 system mentioned above.

      Right, point 3 from my initial mail.

      > OTOH, changing the 'encoding' _after_ the end of startup, when you
      > already have one or more buffers loaded, is not something I would
      > recommend; it may lead to dataloss or file data corruption, depending on
      > how you do it.

      Exactly.

      > However, I believe that forbidding it by means of something in the C
      > code would probably be too harsh, and how would you do it? It _is_
      > useful to test the value of 'encoding' at any time, or to use the
      > value to set something else (IOW, to use &encoding in an expression),
      > so the option should still exist after startup.

      I'm not suggesting removing read access to the option. I'm purely
      suggesting that write access is disabled after the startup scripts are
      sourced. Making this change to the source would be fairly trivial,
      especially if support for using :lockvar on options were implemented.

      > I don't think there is a precedent (is there?) for an option that can
      > be changed, but only until the last VimEnter autocommand (if any)
      > terminates.

      No, there isn't yet but 'encoding' seems like a good one to set the
      precedent.

      --
      James
      GPG Key: 1024D/61326D40 2003-09-02 James Vega <jamessan@...>
    • Tony Mechelynck
      ... UTF-8 support is pretty much standard on any recent Unix-like OS, though its use by default is not necessarily universal. I don t know about Vista, but on
      Message 2 of 21 , Mar 2, 2009
      • 0 Attachment
        On 03/03/09 06:40, James Vega wrote:
        > On Tue, Mar 03, 2009 at 03:32:45AM +0100, Tony Mechelynck wrote:
        >> On 03/03/09 01:40, James Vega wrote:
        >>> ...
        >>> 3) File encoding detection ('fencs') defaults to a value that is
        >>> unlikely to correctly work with most interesting (non-ascii) files.
        >>>
        >>> Defaulting 'enc' to UTF-8 helps address these problems.
        >>>
        >>> ...
        >>> 3) File encoding detection now has a sane default value which means new
        >>> users are less likely to encounter problems when editing files of
        >>> various encodings.
        >>> ...
        >> 1) When using gvim with GTK2 GUI, setting 'encoding' to UTF-8 is the
        >> preferred option, though not enforced. However in that case,
        >> 'termencoding' is fixed as UTF-8 (unchangeable) in the GUI. I wonder
        >> whether it is possible to configure a GTK2 build with --disable-multibyte.
        >
        > According to the help, "utf-8" hasn't been made the default for
        > 'encoding' in GTK2 builds to prevent different behavior of the terminal
        > and GUI versions. Since supporting multibyte is pretty much standard on
        > any relatively recent OS, trending towards UTF-8 instead of the other
        > way around seems more logical.

        UTF-8 support is pretty much standard on any recent Unix-like OS, though
        its use by default is not necessarily universal. I don't know about
        Vista, but on XP the default was _not_ to have UTF-8 as the system
        default encoding.

        >
        >> 2) Vim compiled with the --disable-multibyte configure option cannot use
        >> UTF-8, or any other multibyte encoding; in fact it doesn't even accept
        >> the 'encoding' option as valid.
        >
        > Is there a reason to allow building Vim without multibyte support?
        > Always having multibyte support would make the code simpler/smaller.

        With +multi_byte is always bigger than -multi_byte: one reason could be
        making the Vim binary really "lean and mean". Personally I keep two Vim
        builds on this computer: a Huge build named vim, with GTK2/Gnome2 GUI
        (and +multi_byte), used via softlinks for most possible executable
        names, and a Tiny build named vi (with no GUI and -multi_byte).

        >
        >> 3) 'termencoding' (the encoding used for the keyboard and, in Console
        >> mode, for the display) defaults to empty (which means, fall back to
        >> 'encoding') except when running in GUI mode with GTK2. This means that,
        >> by default, communication between Vim and the user is done in the system
        >> locale.
        >
        > Unless 'encoding' is set in the user's ~/.vimrc, which in my experience is
        > pretty common. I'm not sure how closely that aligns with the overall usage
        > patterns, though.

        I recommend it for users who need or want to use various encodings, and
        possibly plurilingual files mixing them. Users with simpler needs may
        quite validly leave 'encoding' at whatever their OS locale sets, and
        never stray away from it.

        >
        >> 4) It _is_ possible to set 'encoding' to UTF-8 in the vimrc, with
        >> appropriate safeguards, if used at the right spot in the "chronology" of
        >> successive actions (and in particular, before defining mappings or
        >> setting string option values including characters above 0x7F).
        >
        > As per my response to your previous point, 'termencoding' is less likely to
        > be based on their locale even though it should always be based on their
        > locale.
        >
        >> On this Linux box, my locale encoding is UTF-8, but that was not the
        >> case when I acquired a serious interest in Vim: the latest version at
        >> the time was some patchlevel of Vim 6.1 and I was using Win98. A
        >> compelling reason for doing so would be a desire to create or edit
        >> files using characters not supported by your system locale, for
        >> instance multi-charset files in UTF-8 when the Windows locale is
        >> Windows-1252, as it was (IIRC) on that W98 system mentioned above.
        >
        > Right, point 3 from my initial mail.
        >
        >> OTOH, changing the 'encoding' _after_ the end of startup, when you
        >> already have one or more buffers loaded, is not something I would
        >> recommend; it may lead to dataloss or file data corruption, depending on
        >> how you do it.
        >
        > Exactly.
        >
        >> However, I believe that forbidding it by means of something in the C
        >> code would probably be too harsh, and how would you do it? It _is_
        >> useful to test the value of 'encoding' at any time, or to use the
        >> value to set something else (IOW, to use&encoding in an expression),
        >> so the option should still exist after startup.
        >
        > I'm not suggesting removing read access to the option. I'm purely
        > suggesting that write access is disabled after the startup scripts are
        > sourced. Making this change to the source would be fairly trivial,
        > especially if support for using :lockvar on options were implemented.
        >
        >> I don't think there is a precedent (is there?) for an option that can
        >> be changed, but only until the last VimEnter autocommand (if any)
        >> terminates.
        >
        > No, there isn't yet but 'encoding' seems like a good one to set the
        > precedent.
        >

        Hm, to use one of your earlier arguments, it might make the code more
        complex, and thus add some bloat and possibly some bugs, where the
        present code cannot really be said to be malfunctioning. "If it ain'
        broke, don' fix it."


        Best regards,
        Tony.
        --
        Why is "abbreviation" such a long word?

        --~--~---------~--~----~------------~-------~--~----~
        You received this message from the "vim_dev" maillist.
        For more information, visit http://www.vim.org/maillist.php
        -~----------~----~----~----~------~----~------~--~---
      • Dennis Benzinger
        Hi! ... It would make the code smaller but compiling without multibyte support probably makes the resulting binary smaller. That can make a big difference for
        Message 3 of 21 , Mar 2, 2009
        • 0 Attachment
          Hi!

          Am 03.03.2009 06:40, James Vega schrieb:
          > [...]
          >> 2) Vim compiled with the --disable-multibyte configure option cannot use
          >> UTF-8, or any other multibyte encoding; in fact it doesn't even accept
          >> the 'encoding' option as valid.
          >
          > Is there a reason to allow building Vim without multibyte support?
          > Always having multibyte support would make the code simpler/smaller.

          It would make the code smaller but compiling without multibyte support
          probably makes the resulting binary smaller. That can make a big
          difference for users on resource constrained systems.

          >> 3) 'termencoding' (the encoding used for the keyboard and, in Console
          >> mode, for the display) defaults to empty (which means, fall back to
          >> 'encoding') except when running in GUI mode with GTK2. This means that,
          >> by default, communication between Vim and the user is done in the system
          >> locale.
          >
          > Unless 'encoding' is set in the user's ~/.vimrc, which in my experience is
          > pretty common. I'm not sure how closely that aligns with the overall usage
          > patterns, though.
          > [...]

          FWIW, I don't explicitly set it in my .vimrc. My Ubuntu (8.10) system
          uses an UTF-8 locale and Vim detects it. Because this just works I
          suppose it's not that common to set it explicitly.


          Dennis Benzinger

          --~--~---------~--~----~------------~-------~--~----~
          You received this message from the "vim_dev" maillist.
          For more information, visit http://www.vim.org/maillist.php
          -~----------~----~----~----~------~----~------~--~---
        • Markus Heidelberg
          ... What do you mean exactly with resource constrained systems ? On an old PC, Vim with multibyte should still run fast. On embedded devices people normally
          Message 4 of 21 , Mar 3, 2009
          • 0 Attachment
            Dennis Benzinger, 03.03.2009:
            >
            > Hi!
            >
            > Am 03.03.2009 06:40, James Vega schrieb:
            > > [...]
            > >> 2) Vim compiled with the --disable-multibyte configure option cannot use
            > >> UTF-8, or any other multibyte encoding; in fact it doesn't even accept
            > >> the 'encoding' option as valid.
            > >
            > > Is there a reason to allow building Vim without multibyte support?
            > > Always having multibyte support would make the code simpler/smaller.
            >
            > It would make the code smaller but compiling without multibyte support
            > probably makes the resulting binary smaller. That can make a big
            > difference for users on resource constrained systems.

            What do you mean exactly with "resource constrained systems"?
            On an old PC, Vim with multibyte should still run fast.
            On embedded devices people normally use vi from the busybox package.
            Development is not done on this devices, mostly just editing config
            files. No need for a featureful editor like Vim.

            But now that multibyte support is optional and people are using versions
            without it, it should of course not be thrown out unnecessarily.

            Markus


            --~--~---------~--~----~------------~-------~--~----~
            You received this message from the "vim_dev" maillist.
            For more information, visit http://www.vim.org/maillist.php
            -~----------~----~----~----~------~----~------~--~---
          • Markus Heidelberg
            ... Why the tiny build without multibyte? Is this only a fallback in case of system problems, when root has to edit config files, where you know, they don t
            Message 5 of 21 , Mar 3, 2009
            • 0 Attachment
              Tony Mechelynck, 03.03.2009:
              >
              > On 03/03/09 06:40, James Vega wrote:
              > > On Tue, Mar 03, 2009 at 03:32:45AM +0100, Tony Mechelynck wrote:
              > >> 2) Vim compiled with the --disable-multibyte configure option cannot use
              > >> UTF-8, or any other multibyte encoding; in fact it doesn't even accept
              > >> the 'encoding' option as valid.
              > >
              > > Is there a reason to allow building Vim without multibyte support?
              > > Always having multibyte support would make the code simpler/smaller.
              >
              > With +multi_byte is always bigger than -multi_byte: one reason could be
              > making the Vim binary really "lean and mean". Personally I keep two Vim
              > builds on this computer: a Huge build named vim, with GTK2/Gnome2 GUI
              > (and +multi_byte), used via softlinks for most possible executable
              > names, and a Tiny build named vi (with no GUI and -multi_byte).

              Why the tiny build without multibyte? Is this only a fallback in case of
              system problems, when root has to edit config files, where you know,
              they don't contain multibyte characters?

              Markus


              --~--~---------~--~----~------------~-------~--~----~
              You received this message from the "vim_dev" maillist.
              For more information, visit http://www.vim.org/maillist.php
              -~----------~----~----~----~------~----~------~--~---
            • Dennis Benzinger
              Hi Markus! ... I meant systems which have or can use only a small amount of memory. For example (16bit) MS-DOS where you can only use 640KB. These systems may
              Message 6 of 21 , Mar 3, 2009
              • 0 Attachment
                Hi Markus!

                Am 03.03.2009 11:14, Markus Heidelberg schrieb:
                > Dennis Benzinger, 03.03.2009:
                >>
                >> Hi!
                >>
                >> Am 03.03.2009 06:40, James Vega schrieb:
                >> > [...]
                >> >> 2) Vim compiled with the --disable-multibyte configure option cannot use
                >> >> UTF-8, or any other multibyte encoding; in fact it doesn't even accept
                >> >> the 'encoding' option as valid.
                >> >
                >> > Is there a reason to allow building Vim without multibyte support?
                >> > Always having multibyte support would make the code simpler/smaller.
                >>
                >> It would make the code smaller but compiling without multibyte support
                >> probably makes the resulting binary smaller. That can make a big
                >> difference for users on resource constrained systems.
                >
                > What do you mean exactly with "resource constrained systems"?
                > On an old PC, Vim with multibyte should still run fast.
                > [...]

                I meant systems which have or can use only a small amount of memory. For
                example (16bit) MS-DOS where you can only use 640KB. These systems may
                be rare nowadays but if you'll encounter one you'd probably be happy to
                be able to minimize the size of the binary. But I didn't try it out how
                much the size differs between a multibyte and a non-multibyte build.
                Therefore I wrote "_probably_ makes the resulting binary smaller" ;-)

                So if ripping out non-multibyte support does not make the code much
                simpler or smaller I'd simply keep it. Do you have any idea much simpler
                or smaller the code would be?


                Dennis Benzinger

                --~--~---------~--~----~------------~-------~--~----~
                You received this message from the "vim_dev" maillist.
                For more information, visit http://www.vim.org/maillist.php
                -~----------~----~----~----~------~----~------~--~---
              • Markus Heidelberg
                ... No, that s for sure :) ... Not sure, a lot of #ifdef would vanish. Markus --~--~---------~--~----~------------~-------~--~----~ You received this message
                Message 7 of 21 , Mar 3, 2009
                • 0 Attachment
                  Dennis Benzinger, 03.03.2009:
                  >
                  > Hi Markus!
                  >
                  > Am 03.03.2009 11:14, Markus Heidelberg schrieb:
                  > > Dennis Benzinger, 03.03.2009:
                  > >>
                  > >> Hi!
                  > >>
                  > >> Am 03.03.2009 06:40, James Vega schrieb:
                  > >> > [...]
                  > >> >> 2) Vim compiled with the --disable-multibyte configure option cannot use
                  > >> >> UTF-8, or any other multibyte encoding; in fact it doesn't even accept
                  > >> >> the 'encoding' option as valid.
                  > >> >
                  > >> > Is there a reason to allow building Vim without multibyte support?
                  > >> > Always having multibyte support would make the code simpler/smaller.
                  > >>
                  > >> It would make the code smaller but compiling without multibyte support
                  > >> probably makes the resulting binary smaller. That can make a big
                  > >> difference for users on resource constrained systems.
                  > >
                  > > What do you mean exactly with "resource constrained systems"?
                  > > On an old PC, Vim with multibyte should still run fast.
                  > > [...]
                  >
                  > I meant systems which have or can use only a small amount of memory. For
                  > example (16bit) MS-DOS where you can only use 640KB. These systems may
                  > be rare nowadays but if you'll encounter one you'd probably be happy to
                  > be able to minimize the size of the binary. But I didn't try it out how
                  > much the size differs between a multibyte and a non-multibyte build.
                  > Therefore I wrote "_probably_ makes the resulting binary smaller" ;-)

                  No, that's for sure :)

                  > So if ripping out non-multibyte support does not make the code much
                  > simpler or smaller I'd simply keep it. Do you have any idea much simpler
                  > or smaller the code would be?

                  Not sure, a lot of #ifdef would vanish.

                  Markus


                  --~--~---------~--~----~------------~-------~--~----~
                  You received this message from the "vim_dev" maillist.
                  For more information, visit http://www.vim.org/maillist.php
                  -~----------~----~----~----~------~----~------~--~---
                • Tony Mechelynck
                  ... That, and also a sanity check that the latest patches work also with a minimal config, so if they don t I can warn Bram immediately. Once I was very
                  Message 8 of 21 , Mar 3, 2009
                  • 0 Attachment
                    On 03/03/09 11:20, Markus Heidelberg wrote:
                    > Tony Mechelynck, 03.03.2009:
                    >> On 03/03/09 06:40, James Vega wrote:
                    >>> On Tue, Mar 03, 2009 at 03:32:45AM +0100, Tony Mechelynck wrote:
                    >>>> 2) Vim compiled with the --disable-multibyte configure option cannot use
                    >>>> UTF-8, or any other multibyte encoding; in fact it doesn't even accept
                    >>>> the 'encoding' option as valid.
                    >>> Is there a reason to allow building Vim without multibyte support?
                    >>> Always having multibyte support would make the code simpler/smaller.
                    >> With +multi_byte is always bigger than -multi_byte: one reason could be
                    >> making the Vim binary really "lean and mean". Personally I keep two Vim
                    >> builds on this computer: a Huge build named vim, with GTK2/Gnome2 GUI
                    >> (and +multi_byte), used via softlinks for most possible executable
                    >> names, and a Tiny build named vi (with no GUI and -multi_byte).
                    >
                    > Why the tiny build without multibyte? Is this only a fallback in case of
                    > system problems, when root has to edit config files, where you know,
                    > they don't contain multibyte characters?
                    >
                    > Markus

                    That, and also a "sanity check" that the latest patches work also with a
                    minimal config, so if they don't I can warn Bram immediately. Once I was
                    very happy to have it, in order to be able to intervene halfway a system
                    install run, when my Huge GTK2/Gnome2 build wouldn't load because of
                    missing libraries.


                    Best regards,
                    Tony.
                    --
                    "Even nowadays a man can't step up and kill a woman without feeling
                    just a bit unchivalrous ..."
                    -- Robert Benchley

                    --~--~---------~--~----~------------~-------~--~----~
                    You received this message from the "vim_dev" maillist.
                    For more information, visit http://www.vim.org/maillist.php
                    -~----------~----~----~----~------~----~------~--~---
                  • Tony Mechelynck
                    ... I did try: - vim (gvim with all bells and whistles except +mzscheme) 3370388 bytes. - vi (vim with minimal features) 508048 bytes 6.63 times smaller Both
                    Message 9 of 21 , Mar 3, 2009
                    • 0 Attachment
                      On 03/03/09 13:12, Dennis Benzinger wrote:
                      > Hi Markus!
                      >
                      > Am 03.03.2009 11:14, Markus Heidelberg schrieb:
                      >> Dennis Benzinger, 03.03.2009:
                      >>> Hi!
                      >>>
                      >>> Am 03.03.2009 06:40, James Vega schrieb:
                      >>>> [...]
                      >>>>> 2) Vim compiled with the --disable-multibyte configure option cannot use
                      >>>>> UTF-8, or any other multibyte encoding; in fact it doesn't even accept
                      >>>>> the 'encoding' option as valid.
                      >>>> Is there a reason to allow building Vim without multibyte support?
                      >>>> Always having multibyte support would make the code simpler/smaller.
                      >>> It would make the code smaller but compiling without multibyte support
                      >>> probably makes the resulting binary smaller. That can make a big
                      >>> difference for users on resource constrained systems.
                      >> What do you mean exactly with "resource constrained systems"?
                      >> On an old PC, Vim with multibyte should still run fast.
                      >> [...]
                      >
                      > I meant systems which have or can use only a small amount of memory. For
                      > example (16bit) MS-DOS where you can only use 640KB. These systems may
                      > be rare nowadays but if you'll encounter one you'd probably be happy to
                      > be able to minimize the size of the binary. But I didn't try it out how
                      > much the size differs between a multibyte and a non-multibyte build.
                      > Therefore I wrote "_probably_ makes the resulting binary smaller" ;-)
                      >
                      > So if ripping out non-multibyte support does not make the code much
                      > simpler or smaller I'd simply keep it. Do you have any idea much simpler
                      > or smaller the code would be?
                      >
                      >
                      > Dennis Benzinger

                      I did try:
                      - vim (gvim with all bells and whistles except +mzscheme) 3370388 bytes.
                      - vi (vim with minimal features) 508048 bytes
                      6.63 times smaller

                      Both compiled on the same Linux-i686 system with the same 7.2.130
                      sources (but different config options), and both binaries "stripped" of
                      their debug info. The difference consists not only of +multi_byte but of
                      everything which I knew how to enable/disable at compile-time. These are
                      32-bit binaries; I suspect 16-bit builds would be smaller -- hopefully
                      they would, because 508k is still big for a Dos machine without Extended
                      Memory.


                      Best regards,
                      Tony.
                      --
                      Remember, UNIX spelled backwards is XINU.

                      --~--~---------~--~----~------------~-------~--~----~
                      You received this message from the "vim_dev" maillist.
                      For more information, visit http://www.vim.org/maillist.php
                      -~----------~----~----~----~------~----~------~--~---
                    • Tony Mechelynck
                      ... [...] ... Making the source smaller and simpler, but not the object code since false ifdef sections are removed before parsing the resulting C code. Best
                      Message 10 of 21 , Mar 3, 2009
                      • 0 Attachment
                        On 04/03/09 02:57, Markus Heidelberg wrote:
                        > Dennis Benzinger, 03.03.2009:
                        [...]
                        >> So if ripping out non-multibyte support does not make the code much
                        >> simpler or smaller I'd simply keep it. Do you have any idea much simpler
                        >> or smaller the code would be?
                        >
                        > Not sure, a lot of #ifdef would vanish.
                        >
                        > Markus

                        Making the source smaller and simpler, but not the object code since
                        "false ifdef" sections are removed before parsing the resulting C code.

                        Best regards,
                        Tony.
                        --
                        The shortest distance between two points is under construction.
                        -- Noelie Alito

                        --~--~---------~--~----~------------~-------~--~----~
                        You received this message from the "vim_dev" maillist.
                        For more information, visit http://www.vim.org/maillist.php
                        -~----------~----~----~----~------~----~------~--~---
                      • James Vega
                        ... Indeed, but there are currently checks that prevent Vim from building with multibyte support on such systems (ints that are smaller than 32 bit). I guess
                        Message 11 of 21 , Mar 3, 2009
                        • 0 Attachment
                          On Tue, Mar 03, 2009 at 01:12:36PM +0100, Dennis Benzinger wrote:
                          >
                          > Hi Markus!
                          >
                          > Am 03.03.2009 11:14, Markus Heidelberg schrieb:
                          > > Dennis Benzinger, 03.03.2009:
                          > >>
                          > >> Hi!
                          > >>
                          > >> Am 03.03.2009 06:40, James Vega schrieb:
                          > >> > [...]
                          > >> >> 2) Vim compiled with the --disable-multibyte configure option cannot use
                          > >> >> UTF-8, or any other multibyte encoding; in fact it doesn't even accept
                          > >> >> the 'encoding' option as valid.
                          > >> >
                          > >> > Is there a reason to allow building Vim without multibyte support?
                          > >> > Always having multibyte support would make the code simpler/smaller.
                          > >>
                          > >> It would make the code smaller but compiling without multibyte support
                          > >> probably makes the resulting binary smaller. That can make a big
                          > >> difference for users on resource constrained systems.
                          > >
                          > > What do you mean exactly with "resource constrained systems"?
                          > > On an old PC, Vim with multibyte should still run fast.
                          > > [...]
                          >
                          > I meant systems which have or can use only a small amount of memory. For
                          > example (16bit) MS-DOS where you can only use 640KB. These systems may
                          > be rare nowadays but if you'll encounter one you'd probably be happy to
                          > be able to minimize the size of the binary.

                          Indeed, but there are currently checks that prevent Vim from building
                          with multibyte support on such systems (ints that are smaller than 32
                          bit). I guess supporting such OSes would be a reason not to disallow
                          building without multibyte entirely.

                          That does raise the question of where the trade-off between keeping
                          legacy, mostly unused code versus dropping support occurs.

                          > But I didn't try it out how
                          > much the size differs between a multibyte and a non-multibyte build.
                          > Therefore I wrote "_probably_ makes the resulting binary smaller" ;-)
                          >
                          > So if ripping out non-multibyte support does not make the code much
                          > simpler or smaller I'd simply keep it. Do you have any idea much simpler
                          > or smaller the code would be?

                          Well, since supporting 16bit systems is still desirable, there'd be no
                          change in code size.

                          Just for the sake of argument, though, it would remove 933
                          '#ifdef FEAT_MBYTE' (or equivalent) conditional parts of the code and 4
                          '#ifndef FEAT_MBYTE' (or equivalent). How many of the #ifdef scenarios
                          have a paired #else would require more investigation than I'm willing to
                          do for the sake of argument. :)

                          As for the resulting binary sizes:

                          features=tiny, with multibyte: 560.9k
                          features=tiny, w/out multibyte: 493.4k
                          67k or 12% saving

                          features=small, with multibyte: 618.7k
                          features=small, w/out multibyte: 551.1k
                          67k or 11% saving

                          features=normal, with multibyte: 1390.3k
                          features=normal, w/out multibyte: 1279.0k
                          111k or 8% saving

                          --
                          James
                          GPG Key: 1024D/61326D40 2003-09-02 James Vega <jamessan@...>
                        • James Vega
                          ... Actually, according to , the 16-bit DOS executable stopped being provided as of Vim 7.2 because 7.2 was too large for DOS
                          Message 12 of 21 , Mar 3, 2009
                          • 0 Attachment
                            On Wed, Mar 04, 2009 at 01:27:29AM -0500, James Vega wrote:
                            > On Tue, Mar 03, 2009 at 01:12:36PM +0100, Dennis Benzinger wrote:
                            > > I meant systems which have or can use only a small amount of memory. For
                            > > example (16bit) MS-DOS where you can only use 640KB. These systems may
                            > > be rare nowadays but if you'll encounter one you'd probably be happy to
                            > > be able to minimize the size of the binary.
                            >
                            > Indeed, but there are currently checks that prevent Vim from building
                            > with multibyte support on such systems (ints that are smaller than 32
                            > bit). I guess supporting such OSes would be a reason not to disallow
                            > building without multibyte entirely.
                            >
                            > That does raise the question of where the trade-off between keeping
                            > legacy, mostly unused code versus dropping support occurs.

                            Actually, according to <http://www.vim.org/download.php>, the 16-bit DOS
                            executable stopped being provided as of Vim 7.2 because 7.2 was too
                            large for DOS' memory model.

                            > > But I didn't try it out how
                            > > much the size differs between a multibyte and a non-multibyte build.
                            > > Therefore I wrote "_probably_ makes the resulting binary smaller" ;-)
                            > >
                            > > So if ripping out non-multibyte support does not make the code much
                            > > simpler or smaller I'd simply keep it. Do you have any idea much simpler
                            > > or smaller the code would be?
                            >
                            > Well, since supporting 16bit systems is still desirable, there'd be no
                            > change in code size.

                            Since 16-bit DOS is out of the picture, are there any other supported
                            OSes which *don't* have 32-bit integers? If so, that changes the weight
                            behind supporting the ability to build Vim without multibyte support.

                            Of course, this whole tangent is just about speculative advantages to
                            only supporting multibyte-capable Vim builds.

                            The primary point of my original post is still to determine whether
                            there are any impediments preventing Vim from using UTF-8 for the
                            default 'encoding' and determining 'termencoding' from the user's
                            locale. Anything else that would happen because of that is just icing
                            on the cake.

                            --
                            James
                            GPG Key: 1024D/61326D40 2003-09-02 James Vega <jamessan@...>
                          • Tony Mechelynck
                            ... I don t know how large integers are in zOS (with EBCDIC), I guess large enough, since this is a Unix-like OS (but not Linux) for IBM mainframes, but
                            Message 13 of 21 , Mar 4, 2009
                            • 0 Attachment
                              On 04/03/09 08:24, James Vega wrote:
                              > On Wed, Mar 04, 2009 at 01:27:29AM -0500, James Vega wrote:
                              >> On Tue, Mar 03, 2009 at 01:12:36PM +0100, Dennis Benzinger wrote:
                              >>> I meant systems which have or can use only a small amount of memory. For
                              >>> example (16bit) MS-DOS where you can only use 640KB. These systems may
                              >>> be rare nowadays but if you'll encounter one you'd probably be happy to
                              >>> be able to minimize the size of the binary.
                              >> Indeed, but there are currently checks that prevent Vim from building
                              >> with multibyte support on such systems (ints that are smaller than 32
                              >> bit). I guess supporting such OSes would be a reason not to disallow
                              >> building without multibyte entirely.
                              >>
                              >> That does raise the question of where the trade-off between keeping
                              >> legacy, mostly unused code versus dropping support occurs.
                              >
                              > Actually, according to<http://www.vim.org/download.php>, the 16-bit DOS
                              > executable stopped being provided as of Vim 7.2 because 7.2 was too
                              > large for DOS' memory model.
                              >
                              >>> But I didn't try it out how
                              >>> much the size differs between a multibyte and a non-multibyte build.
                              >>> Therefore I wrote "_probably_ makes the resulting binary smaller" ;-)
                              >>>
                              >>> So if ripping out non-multibyte support does not make the code much
                              >>> simpler or smaller I'd simply keep it. Do you have any idea much simpler
                              >>> or smaller the code would be?
                              >> Well, since supporting 16bit systems is still desirable, there'd be no
                              >> change in code size.
                              >
                              > Since 16-bit DOS is out of the picture, are there any other supported
                              > OSes which *don't* have 32-bit integers? If so, that changes the weight
                              > behind supporting the ability to build Vim without multibyte support.
                              >
                              > Of course, this whole tangent is just about speculative advantages to
                              > only supporting multibyte-capable Vim builds.
                              >
                              > The primary point of my original post is still to determine whether
                              > there are any impediments preventing Vim from using UTF-8 for the
                              > default 'encoding' and determining 'termencoding' from the user's
                              > locale. Anything else that would happen because of that is just icing
                              > on the cake.
                              >

                              I don't know how large integers are in zOS (with EBCDIC), I guess large
                              enough, since this is a Unix-like OS (but not Linux) for IBM mainframes,
                              but according to the latest os_390.txt (under |zOS-weaknesses|), that
                              port of Vim has no multibyte support. However the zOS port of Vim is
                              apparently a port made by IBM software engineers in their spare time,
                              "just for fun because they liked Vim", and I don't know how active it
                              might still be. Bram might know, but don't ask IBM.

                              Best regards,
                              Tony.
                              --
                              Famous, adj.:
                              Conspicuously miserable.
                              -- Ambrose Bierce

                              --~--~---------~--~----~------------~-------~--~----~
                              You received this message from the "vim_dev" maillist.
                              For more information, visit http://www.vim.org/maillist.php
                              -~----------~----~----~----~------~----~------~--~---
                            • Antonio Colombo
                              Hi everybody, ... zOS has 32 bits and 64 bits integers. It never really had 16bits integers (back from 1964 or 1965). You could use them, but the hardware
                              Message 14 of 21 , Mar 5, 2009
                              • 0 Attachment
                                Hi everybody,

                                > I don't know how large integers are in zOS (with EBCDIC), I
                                > guess large
                                > enough, since this is a Unix-like OS (but not Linux) for IBM
                                > mainframes,

                                zOS has 32 bits and 64 bits integers. It never really had
                                16bits integers (back from 1964 or 1965). You could use them,
                                but the hardware registers have always
                                been 32 bits long, so the related 16bits hardware instructions
                                just blanked out the leftmost part of the said registers.

                                zOS itself is NOT Unix like, but the underlying architecture
                                can support Linux as well. I think we are speaking here of the
                                mainframe part of zOS, which can support a kind of Unix, more
                                or less in the same way Cygwin is supported under Windows.

                                Cheers, Antonio
                                --
                                /||\ | Antonio Colombo
                                / || \ | antonio@...
                                / () \ | azc100@...
                                (___||___) | azc10@...


                                --~--~---------~--~----~------------~-------~--~----~
                                You received this message from the "vim_dev" maillist.
                                For more information, visit http://www.vim.org/maillist.php
                                -~----------~----~----~----~------~----~------~--~---
                              • Matt Wozniski
                                ... Yeah. We regularly see people in #vim who don t realize that they should be changing fenc instead of enc , and I ve seen it come up on vim-use a few
                                Message 15 of 21 , Mar 13, 2009
                                • 0 Attachment
                                  On Mon, Mar 2, 2009 at 8:40 PM, James Vega wrote:
                                  > With Vim's current behavior, 'encoding' is derived from the environment
                                  > and 'fileencoding'/'termencoding' derive from 'encoding' (modulo
                                  > 'fileencodings' affect on 'fenc').  This seems sub-optimal for various
                                  > reasons.
                                  >
                                  > 1) Vim is using an internal encoding derived from the environment which
                                  >   may or may not be able to represent the different file encodings
                                  >   encountered when editing various files.
                                  > 2) The encoding Vim uses for interpreting input from the user and
                                  >   determining how to display to the user is not directly derived from
                                  >   the user's environment.
                                  > 3) File encoding detection ('fencs') defaults to a value that is
                                  >   unlikely to correctly work with most interesting (non-ascii) files.
                                  >
                                  > Defaulting 'enc' to UTF-8 helps address these problems.
                                  >
                                  > 1) This is now a non-issue as Vim can internally represent all
                                  >   characters by converting them to their unicode counterpart.
                                  > 2) This can be addressed by making 'tenc' derive its value from the
                                  >   environment instead of from 'enc', which is more in line with the
                                  >   behavior implied by the name.
                                  > 3) File encoding detection now has a sane default value which means new
                                  >   users are less likely to encounter problems when editing files of
                                  >   various encodings.
                                  >
                                  > This change would also allow eliminating 'encoding' as an option or,
                                  > less drastic, disallowing changing 'enc' once the startup files have
                                  > been sourced.
                                  >
                                  > Changing 'enc' in a running Vim session is a very common mistake to new
                                  > Vim users that are trying to get their file written out in a specific
                                  > encoding or editing a file that's not in their environment's encoding.

                                  Yeah. We regularly see people in #vim who don't realize that they
                                  should be changing 'fenc' instead of 'enc', and I've seen it come up
                                  on vim-use a few times as well...

                                  > The help already states that changing 'enc' in a running session is a
                                  > bad idea, and I know from experience that it can cause Vim to crash[0].
                                  > Taking the next logical step and preventing users from doing that
                                  > (unless someone can provide a compelling reason to continue allowing it)
                                  > makes sense and helps prevent potential data loss.

                                  This sounds like a very good idea to me. I don't know of any other
                                  programs that allow you to change encoding used internally, and we
                                  would be in good company if we chose to always use a unicode encoding
                                  internally: Java uses UTF-16 internally, and I believe python does as
                                  well. Is there any time when it would be desirable to use a
                                  non-unicode 'encoding' (assuming, of course, that +multi_byte is
                                  available)? I can't think of any.

                                  ~Matt

                                  --~--~---------~--~----~------------~-------~--~----~
                                  You received this message from the "vim_dev" maillist.
                                  For more information, visit http://www.vim.org/maillist.php
                                  -~----------~----~----~----~------~----~------~--~---
                                • Mike Williams
                                  ... Yes, editing very large (say a few 100MB) data files that in a single byte encoding. For my day job I regularly enjoy having to spelunk my way around
                                  Message 16 of 21 , Mar 13, 2009
                                  • 0 Attachment
                                    Matt Wozniski wrote:
                                    > On Mon, Mar 2, 2009 at 8:40 PM, James Vega wrote:
                                    >> With Vim's current behavior, 'encoding' is derived from the environment
                                    >> and 'fileencoding'/'termencoding' derive from 'encoding' (modulo
                                    >> 'fileencodings' affect on 'fenc'). This seems sub-optimal for various
                                    >> reasons.
                                    >>
                                    >> 1) Vim is using an internal encoding derived from the environment which
                                    >> may or may not be able to represent the different file encodings
                                    >> encountered when editing various files.
                                    >> 2) The encoding Vim uses for interpreting input from the user and
                                    >> determining how to display to the user is not directly derived from
                                    >> the user's environment.
                                    >> 3) File encoding detection ('fencs') defaults to a value that is
                                    >> unlikely to correctly work with most interesting (non-ascii) files.
                                    >>
                                    >> Defaulting 'enc' to UTF-8 helps address these problems.
                                    >>
                                    >> 1) This is now a non-issue as Vim can internally represent all
                                    >> characters by converting them to their unicode counterpart.
                                    >> 2) This can be addressed by making 'tenc' derive its value from the
                                    >> environment instead of from 'enc', which is more in line with the
                                    >> behavior implied by the name.
                                    >> 3) File encoding detection now has a sane default value which means new
                                    >> users are less likely to encounter problems when editing files of
                                    >> various encodings.
                                    >>
                                    >> This change would also allow eliminating 'encoding' as an option or,
                                    >> less drastic, disallowing changing 'enc' once the startup files have
                                    >> been sourced.
                                    >>
                                    >> Changing 'enc' in a running Vim session is a very common mistake to new
                                    >> Vim users that are trying to get their file written out in a specific
                                    >> encoding or editing a file that's not in their environment's encoding.
                                    >
                                    > Yeah. We regularly see people in #vim who don't realize that they
                                    > should be changing 'fenc' instead of 'enc', and I've seen it come up
                                    > on vim-use a few times as well...
                                    >
                                    >> The help already states that changing 'enc' in a running session is a
                                    >> bad idea, and I know from experience that it can cause Vim to crash[0].
                                    >> Taking the next logical step and preventing users from doing that
                                    >> (unless someone can provide a compelling reason to continue allowing it)
                                    >> makes sense and helps prevent potential data loss.
                                    >
                                    > This sounds like a very good idea to me. I don't know of any other
                                    > programs that allow you to change encoding used internally, and we
                                    > would be in good company if we chose to always use a unicode encoding
                                    > internally: Java uses UTF-16 internally, and I believe python does as
                                    > well. Is there any time when it would be desirable to use a
                                    > non-unicode 'encoding' (assuming, of course, that +multi_byte is
                                    > available)? I can't think of any.

                                    Yes, editing very large (say a few 100MB) data files that in a single
                                    byte encoding. For my day job I regularly enjoy having to spelunk my
                                    way around large files containing a mix of readable ASCII and binary
                                    data. Using a Unicode encoding could make this prohibitive. Yes, this
                                    is essentially a raw file edit mode, perhaps that should be an option -
                                    or would it be part of setting binary mode?

                                    TTFN

                                    Mike
                                    --
                                    I am not young enough to know everything.

                                    --~--~---------~--~----~------------~-------~--~----~
                                    You received this message from the "vim_dev" maillist.
                                    For more information, visit http://www.vim.org/maillist.php
                                    -~----------~----~----~----~------~----~------~--~---
                                  • Matt Wozniski
                                    ... How would using Unicode for enc in any way affect this? Sure, you d want to use a single-byte fenc , but no one is suggesting that the fenc option
                                    Message 17 of 21 , Mar 13, 2009
                                    • 0 Attachment
                                      On Fri, Mar 13, 2009 at 12:01 PM, Mike Williams wrote:
                                      >
                                      > Matt Wozniski wrote:
                                      >> This sounds like a very good idea to me.  I don't know of any other
                                      >> programs that allow you to change encoding used internally, and we
                                      >> would be in good company if we chose to always use a unicode encoding
                                      >> internally: Java uses UTF-16 internally, and I believe python does as
                                      >> well.  Is there any time when it would be desirable to use a
                                      >> non-unicode 'encoding' (assuming, of course, that +multi_byte is
                                      >> available)?  I can't think of any.
                                      >
                                      > Yes, editing very large (say a few 100MB) data files that in a single
                                      > byte encoding.  For my day job I regularly enjoy having to spelunk my
                                      > way around large files containing a mix of readable ASCII and binary
                                      > data.  Using a Unicode encoding could make this prohibitive.  Yes, this
                                      > is essentially a raw file edit mode, perhaps that should be an option -
                                      > or would it be part of setting binary mode?

                                      How would using Unicode for 'enc' in any way affect this? Sure, you'd
                                      want to use a single-byte 'fenc', but no one is suggesting that the
                                      'fenc' option should be removed. If there is a reason why editing
                                      binary files should be affected at all by what encoding the editor
                                      uses for storing the buffer text internally, I don't see it and you'll
                                      need to elaborate.

                                      ~Matt

                                      --~--~---------~--~----~------------~-------~--~----~
                                      You received this message from the "vim_dev" maillist.
                                      For more information, visit http://www.vim.org/maillist.php
                                      -~----------~----~----~----~------~----~------~--~---
                                    • Mike Williams
                                      ... With a UTF-16 internal encoding a 250MB data file blossoms into a nice round 500MB. For all the cheap memory these days this will still have an effect on
                                      Message 18 of 21 , Mar 13, 2009
                                      • 0 Attachment
                                        Matt Wozniski wrote:
                                        > On Fri, Mar 13, 2009 at 12:01 PM, Mike Williams wrote:
                                        >> Matt Wozniski wrote:
                                        >>> This sounds like a very good idea to me. I don't know of any other
                                        >>> programs that allow you to change encoding used internally, and we
                                        >>> would be in good company if we chose to always use a unicode encoding
                                        >>> internally: Java uses UTF-16 internally, and I believe python does as
                                        >>> well. Is there any time when it would be desirable to use a
                                        >>> non-unicode 'encoding' (assuming, of course, that +multi_byte is
                                        >>> available)? I can't think of any.
                                        >> Yes, editing very large (say a few 100MB) data files that in a single
                                        >> byte encoding. For my day job I regularly enjoy having to spelunk my
                                        >> way around large files containing a mix of readable ASCII and binary
                                        >> data. Using a Unicode encoding could make this prohibitive. Yes, this
                                        >> is essentially a raw file edit mode, perhaps that should be an option -
                                        >> or would it be part of setting binary mode?
                                        >
                                        > How would using Unicode for 'enc' in any way affect this? Sure, you'd
                                        > want to use a single-byte 'fenc', but no one is suggesting that the
                                        > 'fenc' option should be removed. If there is a reason why editing
                                        > binary files should be affected at all by what encoding the editor
                                        > uses for storing the buffer text internally, I don't see it and you'll
                                        > need to elaborate.

                                        With a UTF-16 internal encoding a 250MB data file blossoms into a nice
                                        round 500MB. For all the cheap memory these days this will still have
                                        an effect on system performance - time to allocate, paging out of idle
                                        apps to disk, etc.

                                        And will VIM internally use a canonical Unicode form? What happens if I
                                        want to insert some 8-bit data whose unicode character has multiple
                                        forms? Which one is used? How will I know that the 8-bit value I
                                        intend does not appear as composed sequence? I haven't used VIM for
                                        editing unicode with composing characters (damn my native english
                                        country) - I see there is some discussion on composing but a first
                                        glance it is not clear whether it is automatic or not. In my case I
                                        would not want deletion of data byte to result in other bytes to deleted
                                        as well.

                                        At the moment I cannot see how supporting Unicode semantics maps to
                                        editing binary data files. Not saying it is impossible, I'd just like
                                        to see the possible way out of the woods if we did go this way.

                                        TTFN

                                        Mike
                                        --
                                        Imagination is more important than knowledge.

                                        --~--~---------~--~----~------------~-------~--~----~
                                        You received this message from the "vim_dev" maillist.
                                        For more information, visit http://www.vim.org/maillist.php
                                        -~----------~----~----~----~------~----~------~--~---
                                      • Tony Mechelynck
                                        ... Vim doesn t use UTF-16 internally but UTF-8 -- even if you set encoding to, let s say, utf-16le, because Vim cannot tolerate actual nulls in the middle
                                        Message 19 of 21 , Mar 14, 2009
                                        • 0 Attachment
                                          On 13/03/09 17:22, Mike Williams wrote:
                                          >
                                          > Matt Wozniski wrote:
                                          >> On Fri, Mar 13, 2009 at 12:01 PM, Mike Williams wrote:
                                          >>> Matt Wozniski wrote:
                                          >>>> This sounds like a very good idea to me. I don't know of any other
                                          >>>> programs that allow you to change encoding used internally, and we
                                          >>>> would be in good company if we chose to always use a unicode encoding
                                          >>>> internally: Java uses UTF-16 internally, and I believe python does as
                                          >>>> well. Is there any time when it would be desirable to use a
                                          >>>> non-unicode 'encoding' (assuming, of course, that +multi_byte is
                                          >>>> available)? I can't think of any.
                                          >>> Yes, editing very large (say a few 100MB) data files that in a single
                                          >>> byte encoding. For my day job I regularly enjoy having to spelunk my
                                          >>> way around large files containing a mix of readable ASCII and binary
                                          >>> data. Using a Unicode encoding could make this prohibitive. Yes, this
                                          >>> is essentially a raw file edit mode, perhaps that should be an option -
                                          >>> or would it be part of setting binary mode?
                                          >>
                                          >> How would using Unicode for 'enc' in any way affect this? Sure, you'd
                                          >> want to use a single-byte 'fenc', but no one is suggesting that the
                                          >> 'fenc' option should be removed. If there is a reason why editing
                                          >> binary files should be affected at all by what encoding the editor
                                          >> uses for storing the buffer text internally, I don't see it and you'll
                                          >> need to elaborate.
                                          >
                                          > With a UTF-16 internal encoding a 250MB data file blossoms into a nice
                                          > round 500MB. For all the cheap memory these days this will still have
                                          > an effect on system performance - time to allocate, paging out of idle
                                          > apps to disk, etc.

                                          Vim doesn't use UTF-16 internally but UTF-8 -- even if you set
                                          'encoding' to, let's say, utf-16le, because Vim cannot tolerate actual
                                          nulls in the middle of lines. This also means there is no space loss for
                                          7-bit ASCII, which is represented identically in ASCII, Latin1, UTF-8,
                                          and indeed also in most iso-8859 encodings.

                                          >
                                          > And will VIM internally use a canonical Unicode form? What happens if I
                                          > want to insert some 8-bit data whose unicode character has multiple
                                          > forms? Which one is used? How will I know that the 8-bit value I
                                          > intend does not appear as composed sequence? I haven't used VIM for
                                          > editing unicode with composing characters (damn my native english
                                          > country) - I see there is some discussion on composing but a first
                                          > glance it is not clear whether it is automatic or not. In my case I
                                          > would not want deletion of data byte to result in other bytes to deleted
                                          > as well.
                                          >
                                          > At the moment I cannot see how supporting Unicode semantics maps to
                                          > editing binary data files. Not saying it is impossible, I'd just like
                                          > to see the possible way out of the woods if we did go this way.
                                          >
                                          > TTFN
                                          >
                                          > Mike

                                          IMHO, binary data should be read "as if" 8-bit because in an 8-bit
                                          'fileencoding' there are no "invalid" byte sequences -- and probably
                                          Latin1 because the conversion Latin1 <=> UTF-8 is trivial and requires
                                          no iconv library. An alternate possibility (but to be used only at
                                          user's explicit request IMHO) is to convert binary to hex and vice-versa
                                          via xxd.

                                          However, this is not what Vim does if you read a file with ++bin: what
                                          it does is "no conversion", which means that if 'encoding' is set to
                                          UTF-8 you'll probably get invalid UTF-8 sequences at many places in your
                                          code. For instance an a-acute in Spanish Latin1 text will appear as <e1>
                                          instead of á and an e-circumflex in French Latin1 text will appear as
                                          <ea> instead of ê. Not very convenient if they happen to be within text
                                          strings -- messages, maybe, to be typed out on the screen. So even if
                                          you know that the code is binary you might prefer to use

                                          :e ++enc=latin1 ++ff=unix foobar.bin

                                          and omit the 'binary' setting. The result, if you make changes and save
                                          them, could be an extra 0x0A at the very end if there wasn't one
                                          already, but I don't expect trouble even if it happens. (Overlong lines
                                          might be split if you were on a 16-bit machine, but on 32-bit machines
                                          the maximum line lize and the maximum file size are both 2GB, and even
                                          on a 64-bit machine I don't expect you'll often have to edit a binary
                                          file containing a 2GB stretch of code without a single 0x0A in it.)

                                          Of course, the utmost care should be used when editing binary files
                                          because, if it is e.g. program code,
                                          - the code can contain displacements in binary, which will become
                                          invalid if the length of the intervening text is modified
                                          - executable code should in general not be touched
                                          - compressed binaries are probably not editable in any way
                                          - and what if the program includes a binary hash of its ASCII text
                                          somewhere?

                                          As for canonical forms: I don't think Vim will spontaneously convert
                                          either way between a spacing character + combining character(s) combo
                                          and a precomposed character. If you type a then Ctrl-v u 0301 you'll get
                                          a spacing a and a combining acute. If your keyboard allows "keying" an
                                          a-acute character, or if you type Ctrl-V x e1, you'll get a precomposed
                                          a-acute. The two results will be indistinguishable if you have a "good"
                                          font but Vim doesn't know that, and searching for the precomposed
                                          character will not match the ascii + accent two-codepoint combo.


                                          Best regards,
                                          Tony.
                                          --
                                          "Can you hammer a 6-inch spike into a wooden plank with your penis?"

                                          "Uh, not right now."

                                          "Tsk. A girl has to have some standards."
                                          -- "Real Genius"

                                          --~--~---------~--~----~------------~-------~--~----~
                                          You received this message from the "vim_dev" maillist.
                                          For more information, visit http://www.vim.org/maillist.php
                                          -~----------~----~----~----~------~----~------~--~---
                                        Your message has been successfully submitted and would be delivered to recipients shortly.