Loading ...
Sorry, an error occurred while loading the content.

Re: [RFC] Default 'encoding' to UTF-8

Expand Messages
  • Markus Heidelberg
    ... What do you mean exactly with resource constrained systems ? On an old PC, Vim with multibyte should still run fast. On embedded devices people normally
    Message 1 of 21 , Mar 3, 2009
    • 0 Attachment
      Dennis Benzinger, 03.03.2009:
      >
      > Hi!
      >
      > Am 03.03.2009 06:40, James Vega schrieb:
      > > [...]
      > >> 2) Vim compiled with the --disable-multibyte configure option cannot use
      > >> UTF-8, or any other multibyte encoding; in fact it doesn't even accept
      > >> the 'encoding' option as valid.
      > >
      > > Is there a reason to allow building Vim without multibyte support?
      > > Always having multibyte support would make the code simpler/smaller.
      >
      > It would make the code smaller but compiling without multibyte support
      > probably makes the resulting binary smaller. That can make a big
      > difference for users on resource constrained systems.

      What do you mean exactly with "resource constrained systems"?
      On an old PC, Vim with multibyte should still run fast.
      On embedded devices people normally use vi from the busybox package.
      Development is not done on this devices, mostly just editing config
      files. No need for a featureful editor like Vim.

      But now that multibyte support is optional and people are using versions
      without it, it should of course not be thrown out unnecessarily.

      Markus


      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_dev" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---
    • Markus Heidelberg
      ... Why the tiny build without multibyte? Is this only a fallback in case of system problems, when root has to edit config files, where you know, they don t
      Message 2 of 21 , Mar 3, 2009
      • 0 Attachment
        Tony Mechelynck, 03.03.2009:
        >
        > On 03/03/09 06:40, James Vega wrote:
        > > On Tue, Mar 03, 2009 at 03:32:45AM +0100, Tony Mechelynck wrote:
        > >> 2) Vim compiled with the --disable-multibyte configure option cannot use
        > >> UTF-8, or any other multibyte encoding; in fact it doesn't even accept
        > >> the 'encoding' option as valid.
        > >
        > > Is there a reason to allow building Vim without multibyte support?
        > > Always having multibyte support would make the code simpler/smaller.
        >
        > With +multi_byte is always bigger than -multi_byte: one reason could be
        > making the Vim binary really "lean and mean". Personally I keep two Vim
        > builds on this computer: a Huge build named vim, with GTK2/Gnome2 GUI
        > (and +multi_byte), used via softlinks for most possible executable
        > names, and a Tiny build named vi (with no GUI and -multi_byte).

        Why the tiny build without multibyte? Is this only a fallback in case of
        system problems, when root has to edit config files, where you know,
        they don't contain multibyte characters?

        Markus


        --~--~---------~--~----~------------~-------~--~----~
        You received this message from the "vim_dev" maillist.
        For more information, visit http://www.vim.org/maillist.php
        -~----------~----~----~----~------~----~------~--~---
      • Dennis Benzinger
        Hi Markus! ... I meant systems which have or can use only a small amount of memory. For example (16bit) MS-DOS where you can only use 640KB. These systems may
        Message 3 of 21 , Mar 3, 2009
        • 0 Attachment
          Hi Markus!

          Am 03.03.2009 11:14, Markus Heidelberg schrieb:
          > Dennis Benzinger, 03.03.2009:
          >>
          >> Hi!
          >>
          >> Am 03.03.2009 06:40, James Vega schrieb:
          >> > [...]
          >> >> 2) Vim compiled with the --disable-multibyte configure option cannot use
          >> >> UTF-8, or any other multibyte encoding; in fact it doesn't even accept
          >> >> the 'encoding' option as valid.
          >> >
          >> > Is there a reason to allow building Vim without multibyte support?
          >> > Always having multibyte support would make the code simpler/smaller.
          >>
          >> It would make the code smaller but compiling without multibyte support
          >> probably makes the resulting binary smaller. That can make a big
          >> difference for users on resource constrained systems.
          >
          > What do you mean exactly with "resource constrained systems"?
          > On an old PC, Vim with multibyte should still run fast.
          > [...]

          I meant systems which have or can use only a small amount of memory. For
          example (16bit) MS-DOS where you can only use 640KB. These systems may
          be rare nowadays but if you'll encounter one you'd probably be happy to
          be able to minimize the size of the binary. But I didn't try it out how
          much the size differs between a multibyte and a non-multibyte build.
          Therefore I wrote "_probably_ makes the resulting binary smaller" ;-)

          So if ripping out non-multibyte support does not make the code much
          simpler or smaller I'd simply keep it. Do you have any idea much simpler
          or smaller the code would be?


          Dennis Benzinger

          --~--~---------~--~----~------------~-------~--~----~
          You received this message from the "vim_dev" maillist.
          For more information, visit http://www.vim.org/maillist.php
          -~----------~----~----~----~------~----~------~--~---
        • Markus Heidelberg
          ... No, that s for sure :) ... Not sure, a lot of #ifdef would vanish. Markus --~--~---------~--~----~------------~-------~--~----~ You received this message
          Message 4 of 21 , Mar 3, 2009
          • 0 Attachment
            Dennis Benzinger, 03.03.2009:
            >
            > Hi Markus!
            >
            > Am 03.03.2009 11:14, Markus Heidelberg schrieb:
            > > Dennis Benzinger, 03.03.2009:
            > >>
            > >> Hi!
            > >>
            > >> Am 03.03.2009 06:40, James Vega schrieb:
            > >> > [...]
            > >> >> 2) Vim compiled with the --disable-multibyte configure option cannot use
            > >> >> UTF-8, or any other multibyte encoding; in fact it doesn't even accept
            > >> >> the 'encoding' option as valid.
            > >> >
            > >> > Is there a reason to allow building Vim without multibyte support?
            > >> > Always having multibyte support would make the code simpler/smaller.
            > >>
            > >> It would make the code smaller but compiling without multibyte support
            > >> probably makes the resulting binary smaller. That can make a big
            > >> difference for users on resource constrained systems.
            > >
            > > What do you mean exactly with "resource constrained systems"?
            > > On an old PC, Vim with multibyte should still run fast.
            > > [...]
            >
            > I meant systems which have or can use only a small amount of memory. For
            > example (16bit) MS-DOS where you can only use 640KB. These systems may
            > be rare nowadays but if you'll encounter one you'd probably be happy to
            > be able to minimize the size of the binary. But I didn't try it out how
            > much the size differs between a multibyte and a non-multibyte build.
            > Therefore I wrote "_probably_ makes the resulting binary smaller" ;-)

            No, that's for sure :)

            > So if ripping out non-multibyte support does not make the code much
            > simpler or smaller I'd simply keep it. Do you have any idea much simpler
            > or smaller the code would be?

            Not sure, a lot of #ifdef would vanish.

            Markus


            --~--~---------~--~----~------------~-------~--~----~
            You received this message from the "vim_dev" maillist.
            For more information, visit http://www.vim.org/maillist.php
            -~----------~----~----~----~------~----~------~--~---
          • Tony Mechelynck
            ... That, and also a sanity check that the latest patches work also with a minimal config, so if they don t I can warn Bram immediately. Once I was very
            Message 5 of 21 , Mar 3, 2009
            • 0 Attachment
              On 03/03/09 11:20, Markus Heidelberg wrote:
              > Tony Mechelynck, 03.03.2009:
              >> On 03/03/09 06:40, James Vega wrote:
              >>> On Tue, Mar 03, 2009 at 03:32:45AM +0100, Tony Mechelynck wrote:
              >>>> 2) Vim compiled with the --disable-multibyte configure option cannot use
              >>>> UTF-8, or any other multibyte encoding; in fact it doesn't even accept
              >>>> the 'encoding' option as valid.
              >>> Is there a reason to allow building Vim without multibyte support?
              >>> Always having multibyte support would make the code simpler/smaller.
              >> With +multi_byte is always bigger than -multi_byte: one reason could be
              >> making the Vim binary really "lean and mean". Personally I keep two Vim
              >> builds on this computer: a Huge build named vim, with GTK2/Gnome2 GUI
              >> (and +multi_byte), used via softlinks for most possible executable
              >> names, and a Tiny build named vi (with no GUI and -multi_byte).
              >
              > Why the tiny build without multibyte? Is this only a fallback in case of
              > system problems, when root has to edit config files, where you know,
              > they don't contain multibyte characters?
              >
              > Markus

              That, and also a "sanity check" that the latest patches work also with a
              minimal config, so if they don't I can warn Bram immediately. Once I was
              very happy to have it, in order to be able to intervene halfway a system
              install run, when my Huge GTK2/Gnome2 build wouldn't load because of
              missing libraries.


              Best regards,
              Tony.
              --
              "Even nowadays a man can't step up and kill a woman without feeling
              just a bit unchivalrous ..."
              -- Robert Benchley

              --~--~---------~--~----~------------~-------~--~----~
              You received this message from the "vim_dev" maillist.
              For more information, visit http://www.vim.org/maillist.php
              -~----------~----~----~----~------~----~------~--~---
            • Tony Mechelynck
              ... I did try: - vim (gvim with all bells and whistles except +mzscheme) 3370388 bytes. - vi (vim with minimal features) 508048 bytes 6.63 times smaller Both
              Message 6 of 21 , Mar 3, 2009
              • 0 Attachment
                On 03/03/09 13:12, Dennis Benzinger wrote:
                > Hi Markus!
                >
                > Am 03.03.2009 11:14, Markus Heidelberg schrieb:
                >> Dennis Benzinger, 03.03.2009:
                >>> Hi!
                >>>
                >>> Am 03.03.2009 06:40, James Vega schrieb:
                >>>> [...]
                >>>>> 2) Vim compiled with the --disable-multibyte configure option cannot use
                >>>>> UTF-8, or any other multibyte encoding; in fact it doesn't even accept
                >>>>> the 'encoding' option as valid.
                >>>> Is there a reason to allow building Vim without multibyte support?
                >>>> Always having multibyte support would make the code simpler/smaller.
                >>> It would make the code smaller but compiling without multibyte support
                >>> probably makes the resulting binary smaller. That can make a big
                >>> difference for users on resource constrained systems.
                >> What do you mean exactly with "resource constrained systems"?
                >> On an old PC, Vim with multibyte should still run fast.
                >> [...]
                >
                > I meant systems which have or can use only a small amount of memory. For
                > example (16bit) MS-DOS where you can only use 640KB. These systems may
                > be rare nowadays but if you'll encounter one you'd probably be happy to
                > be able to minimize the size of the binary. But I didn't try it out how
                > much the size differs between a multibyte and a non-multibyte build.
                > Therefore I wrote "_probably_ makes the resulting binary smaller" ;-)
                >
                > So if ripping out non-multibyte support does not make the code much
                > simpler or smaller I'd simply keep it. Do you have any idea much simpler
                > or smaller the code would be?
                >
                >
                > Dennis Benzinger

                I did try:
                - vim (gvim with all bells and whistles except +mzscheme) 3370388 bytes.
                - vi (vim with minimal features) 508048 bytes
                6.63 times smaller

                Both compiled on the same Linux-i686 system with the same 7.2.130
                sources (but different config options), and both binaries "stripped" of
                their debug info. The difference consists not only of +multi_byte but of
                everything which I knew how to enable/disable at compile-time. These are
                32-bit binaries; I suspect 16-bit builds would be smaller -- hopefully
                they would, because 508k is still big for a Dos machine without Extended
                Memory.


                Best regards,
                Tony.
                --
                Remember, UNIX spelled backwards is XINU.

                --~--~---------~--~----~------------~-------~--~----~
                You received this message from the "vim_dev" maillist.
                For more information, visit http://www.vim.org/maillist.php
                -~----------~----~----~----~------~----~------~--~---
              • Tony Mechelynck
                ... [...] ... Making the source smaller and simpler, but not the object code since false ifdef sections are removed before parsing the resulting C code. Best
                Message 7 of 21 , Mar 3, 2009
                • 0 Attachment
                  On 04/03/09 02:57, Markus Heidelberg wrote:
                  > Dennis Benzinger, 03.03.2009:
                  [...]
                  >> So if ripping out non-multibyte support does not make the code much
                  >> simpler or smaller I'd simply keep it. Do you have any idea much simpler
                  >> or smaller the code would be?
                  >
                  > Not sure, a lot of #ifdef would vanish.
                  >
                  > Markus

                  Making the source smaller and simpler, but not the object code since
                  "false ifdef" sections are removed before parsing the resulting C code.

                  Best regards,
                  Tony.
                  --
                  The shortest distance between two points is under construction.
                  -- Noelie Alito

                  --~--~---------~--~----~------------~-------~--~----~
                  You received this message from the "vim_dev" maillist.
                  For more information, visit http://www.vim.org/maillist.php
                  -~----------~----~----~----~------~----~------~--~---
                • James Vega
                  ... Indeed, but there are currently checks that prevent Vim from building with multibyte support on such systems (ints that are smaller than 32 bit). I guess
                  Message 8 of 21 , Mar 3, 2009
                  • 0 Attachment
                    On Tue, Mar 03, 2009 at 01:12:36PM +0100, Dennis Benzinger wrote:
                    >
                    > Hi Markus!
                    >
                    > Am 03.03.2009 11:14, Markus Heidelberg schrieb:
                    > > Dennis Benzinger, 03.03.2009:
                    > >>
                    > >> Hi!
                    > >>
                    > >> Am 03.03.2009 06:40, James Vega schrieb:
                    > >> > [...]
                    > >> >> 2) Vim compiled with the --disable-multibyte configure option cannot use
                    > >> >> UTF-8, or any other multibyte encoding; in fact it doesn't even accept
                    > >> >> the 'encoding' option as valid.
                    > >> >
                    > >> > Is there a reason to allow building Vim without multibyte support?
                    > >> > Always having multibyte support would make the code simpler/smaller.
                    > >>
                    > >> It would make the code smaller but compiling without multibyte support
                    > >> probably makes the resulting binary smaller. That can make a big
                    > >> difference for users on resource constrained systems.
                    > >
                    > > What do you mean exactly with "resource constrained systems"?
                    > > On an old PC, Vim with multibyte should still run fast.
                    > > [...]
                    >
                    > I meant systems which have or can use only a small amount of memory. For
                    > example (16bit) MS-DOS where you can only use 640KB. These systems may
                    > be rare nowadays but if you'll encounter one you'd probably be happy to
                    > be able to minimize the size of the binary.

                    Indeed, but there are currently checks that prevent Vim from building
                    with multibyte support on such systems (ints that are smaller than 32
                    bit). I guess supporting such OSes would be a reason not to disallow
                    building without multibyte entirely.

                    That does raise the question of where the trade-off between keeping
                    legacy, mostly unused code versus dropping support occurs.

                    > But I didn't try it out how
                    > much the size differs between a multibyte and a non-multibyte build.
                    > Therefore I wrote "_probably_ makes the resulting binary smaller" ;-)
                    >
                    > So if ripping out non-multibyte support does not make the code much
                    > simpler or smaller I'd simply keep it. Do you have any idea much simpler
                    > or smaller the code would be?

                    Well, since supporting 16bit systems is still desirable, there'd be no
                    change in code size.

                    Just for the sake of argument, though, it would remove 933
                    '#ifdef FEAT_MBYTE' (or equivalent) conditional parts of the code and 4
                    '#ifndef FEAT_MBYTE' (or equivalent). How many of the #ifdef scenarios
                    have a paired #else would require more investigation than I'm willing to
                    do for the sake of argument. :)

                    As for the resulting binary sizes:

                    features=tiny, with multibyte: 560.9k
                    features=tiny, w/out multibyte: 493.4k
                    67k or 12% saving

                    features=small, with multibyte: 618.7k
                    features=small, w/out multibyte: 551.1k
                    67k or 11% saving

                    features=normal, with multibyte: 1390.3k
                    features=normal, w/out multibyte: 1279.0k
                    111k or 8% saving

                    --
                    James
                    GPG Key: 1024D/61326D40 2003-09-02 James Vega <jamessan@...>
                  • James Vega
                    ... Actually, according to , the 16-bit DOS executable stopped being provided as of Vim 7.2 because 7.2 was too large for DOS
                    Message 9 of 21 , Mar 3, 2009
                    • 0 Attachment
                      On Wed, Mar 04, 2009 at 01:27:29AM -0500, James Vega wrote:
                      > On Tue, Mar 03, 2009 at 01:12:36PM +0100, Dennis Benzinger wrote:
                      > > I meant systems which have or can use only a small amount of memory. For
                      > > example (16bit) MS-DOS where you can only use 640KB. These systems may
                      > > be rare nowadays but if you'll encounter one you'd probably be happy to
                      > > be able to minimize the size of the binary.
                      >
                      > Indeed, but there are currently checks that prevent Vim from building
                      > with multibyte support on such systems (ints that are smaller than 32
                      > bit). I guess supporting such OSes would be a reason not to disallow
                      > building without multibyte entirely.
                      >
                      > That does raise the question of where the trade-off between keeping
                      > legacy, mostly unused code versus dropping support occurs.

                      Actually, according to <http://www.vim.org/download.php>, the 16-bit DOS
                      executable stopped being provided as of Vim 7.2 because 7.2 was too
                      large for DOS' memory model.

                      > > But I didn't try it out how
                      > > much the size differs between a multibyte and a non-multibyte build.
                      > > Therefore I wrote "_probably_ makes the resulting binary smaller" ;-)
                      > >
                      > > So if ripping out non-multibyte support does not make the code much
                      > > simpler or smaller I'd simply keep it. Do you have any idea much simpler
                      > > or smaller the code would be?
                      >
                      > Well, since supporting 16bit systems is still desirable, there'd be no
                      > change in code size.

                      Since 16-bit DOS is out of the picture, are there any other supported
                      OSes which *don't* have 32-bit integers? If so, that changes the weight
                      behind supporting the ability to build Vim without multibyte support.

                      Of course, this whole tangent is just about speculative advantages to
                      only supporting multibyte-capable Vim builds.

                      The primary point of my original post is still to determine whether
                      there are any impediments preventing Vim from using UTF-8 for the
                      default 'encoding' and determining 'termencoding' from the user's
                      locale. Anything else that would happen because of that is just icing
                      on the cake.

                      --
                      James
                      GPG Key: 1024D/61326D40 2003-09-02 James Vega <jamessan@...>
                    • Tony Mechelynck
                      ... I don t know how large integers are in zOS (with EBCDIC), I guess large enough, since this is a Unix-like OS (but not Linux) for IBM mainframes, but
                      Message 10 of 21 , Mar 4, 2009
                      • 0 Attachment
                        On 04/03/09 08:24, James Vega wrote:
                        > On Wed, Mar 04, 2009 at 01:27:29AM -0500, James Vega wrote:
                        >> On Tue, Mar 03, 2009 at 01:12:36PM +0100, Dennis Benzinger wrote:
                        >>> I meant systems which have or can use only a small amount of memory. For
                        >>> example (16bit) MS-DOS where you can only use 640KB. These systems may
                        >>> be rare nowadays but if you'll encounter one you'd probably be happy to
                        >>> be able to minimize the size of the binary.
                        >> Indeed, but there are currently checks that prevent Vim from building
                        >> with multibyte support on such systems (ints that are smaller than 32
                        >> bit). I guess supporting such OSes would be a reason not to disallow
                        >> building without multibyte entirely.
                        >>
                        >> That does raise the question of where the trade-off between keeping
                        >> legacy, mostly unused code versus dropping support occurs.
                        >
                        > Actually, according to<http://www.vim.org/download.php>, the 16-bit DOS
                        > executable stopped being provided as of Vim 7.2 because 7.2 was too
                        > large for DOS' memory model.
                        >
                        >>> But I didn't try it out how
                        >>> much the size differs between a multibyte and a non-multibyte build.
                        >>> Therefore I wrote "_probably_ makes the resulting binary smaller" ;-)
                        >>>
                        >>> So if ripping out non-multibyte support does not make the code much
                        >>> simpler or smaller I'd simply keep it. Do you have any idea much simpler
                        >>> or smaller the code would be?
                        >> Well, since supporting 16bit systems is still desirable, there'd be no
                        >> change in code size.
                        >
                        > Since 16-bit DOS is out of the picture, are there any other supported
                        > OSes which *don't* have 32-bit integers? If so, that changes the weight
                        > behind supporting the ability to build Vim without multibyte support.
                        >
                        > Of course, this whole tangent is just about speculative advantages to
                        > only supporting multibyte-capable Vim builds.
                        >
                        > The primary point of my original post is still to determine whether
                        > there are any impediments preventing Vim from using UTF-8 for the
                        > default 'encoding' and determining 'termencoding' from the user's
                        > locale. Anything else that would happen because of that is just icing
                        > on the cake.
                        >

                        I don't know how large integers are in zOS (with EBCDIC), I guess large
                        enough, since this is a Unix-like OS (but not Linux) for IBM mainframes,
                        but according to the latest os_390.txt (under |zOS-weaknesses|), that
                        port of Vim has no multibyte support. However the zOS port of Vim is
                        apparently a port made by IBM software engineers in their spare time,
                        "just for fun because they liked Vim", and I don't know how active it
                        might still be. Bram might know, but don't ask IBM.

                        Best regards,
                        Tony.
                        --
                        Famous, adj.:
                        Conspicuously miserable.
                        -- Ambrose Bierce

                        --~--~---------~--~----~------------~-------~--~----~
                        You received this message from the "vim_dev" maillist.
                        For more information, visit http://www.vim.org/maillist.php
                        -~----------~----~----~----~------~----~------~--~---
                      • Antonio Colombo
                        Hi everybody, ... zOS has 32 bits and 64 bits integers. It never really had 16bits integers (back from 1964 or 1965). You could use them, but the hardware
                        Message 11 of 21 , Mar 5, 2009
                        • 0 Attachment
                          Hi everybody,

                          > I don't know how large integers are in zOS (with EBCDIC), I
                          > guess large
                          > enough, since this is a Unix-like OS (but not Linux) for IBM
                          > mainframes,

                          zOS has 32 bits and 64 bits integers. It never really had
                          16bits integers (back from 1964 or 1965). You could use them,
                          but the hardware registers have always
                          been 32 bits long, so the related 16bits hardware instructions
                          just blanked out the leftmost part of the said registers.

                          zOS itself is NOT Unix like, but the underlying architecture
                          can support Linux as well. I think we are speaking here of the
                          mainframe part of zOS, which can support a kind of Unix, more
                          or less in the same way Cygwin is supported under Windows.

                          Cheers, Antonio
                          --
                          /||\ | Antonio Colombo
                          / || \ | antonio@...
                          / () \ | azc100@...
                          (___||___) | azc10@...


                          --~--~---------~--~----~------------~-------~--~----~
                          You received this message from the "vim_dev" maillist.
                          For more information, visit http://www.vim.org/maillist.php
                          -~----------~----~----~----~------~----~------~--~---
                        • Matt Wozniski
                          ... Yeah. We regularly see people in #vim who don t realize that they should be changing fenc instead of enc , and I ve seen it come up on vim-use a few
                          Message 12 of 21 , Mar 13, 2009
                          • 0 Attachment
                            On Mon, Mar 2, 2009 at 8:40 PM, James Vega wrote:
                            > With Vim's current behavior, 'encoding' is derived from the environment
                            > and 'fileencoding'/'termencoding' derive from 'encoding' (modulo
                            > 'fileencodings' affect on 'fenc').  This seems sub-optimal for various
                            > reasons.
                            >
                            > 1) Vim is using an internal encoding derived from the environment which
                            >   may or may not be able to represent the different file encodings
                            >   encountered when editing various files.
                            > 2) The encoding Vim uses for interpreting input from the user and
                            >   determining how to display to the user is not directly derived from
                            >   the user's environment.
                            > 3) File encoding detection ('fencs') defaults to a value that is
                            >   unlikely to correctly work with most interesting (non-ascii) files.
                            >
                            > Defaulting 'enc' to UTF-8 helps address these problems.
                            >
                            > 1) This is now a non-issue as Vim can internally represent all
                            >   characters by converting them to their unicode counterpart.
                            > 2) This can be addressed by making 'tenc' derive its value from the
                            >   environment instead of from 'enc', which is more in line with the
                            >   behavior implied by the name.
                            > 3) File encoding detection now has a sane default value which means new
                            >   users are less likely to encounter problems when editing files of
                            >   various encodings.
                            >
                            > This change would also allow eliminating 'encoding' as an option or,
                            > less drastic, disallowing changing 'enc' once the startup files have
                            > been sourced.
                            >
                            > Changing 'enc' in a running Vim session is a very common mistake to new
                            > Vim users that are trying to get their file written out in a specific
                            > encoding or editing a file that's not in their environment's encoding.

                            Yeah. We regularly see people in #vim who don't realize that they
                            should be changing 'fenc' instead of 'enc', and I've seen it come up
                            on vim-use a few times as well...

                            > The help already states that changing 'enc' in a running session is a
                            > bad idea, and I know from experience that it can cause Vim to crash[0].
                            > Taking the next logical step and preventing users from doing that
                            > (unless someone can provide a compelling reason to continue allowing it)
                            > makes sense and helps prevent potential data loss.

                            This sounds like a very good idea to me. I don't know of any other
                            programs that allow you to change encoding used internally, and we
                            would be in good company if we chose to always use a unicode encoding
                            internally: Java uses UTF-16 internally, and I believe python does as
                            well. Is there any time when it would be desirable to use a
                            non-unicode 'encoding' (assuming, of course, that +multi_byte is
                            available)? I can't think of any.

                            ~Matt

                            --~--~---------~--~----~------------~-------~--~----~
                            You received this message from the "vim_dev" maillist.
                            For more information, visit http://www.vim.org/maillist.php
                            -~----------~----~----~----~------~----~------~--~---
                          • Mike Williams
                            ... Yes, editing very large (say a few 100MB) data files that in a single byte encoding. For my day job I regularly enjoy having to spelunk my way around
                            Message 13 of 21 , Mar 13, 2009
                            • 0 Attachment
                              Matt Wozniski wrote:
                              > On Mon, Mar 2, 2009 at 8:40 PM, James Vega wrote:
                              >> With Vim's current behavior, 'encoding' is derived from the environment
                              >> and 'fileencoding'/'termencoding' derive from 'encoding' (modulo
                              >> 'fileencodings' affect on 'fenc'). This seems sub-optimal for various
                              >> reasons.
                              >>
                              >> 1) Vim is using an internal encoding derived from the environment which
                              >> may or may not be able to represent the different file encodings
                              >> encountered when editing various files.
                              >> 2) The encoding Vim uses for interpreting input from the user and
                              >> determining how to display to the user is not directly derived from
                              >> the user's environment.
                              >> 3) File encoding detection ('fencs') defaults to a value that is
                              >> unlikely to correctly work with most interesting (non-ascii) files.
                              >>
                              >> Defaulting 'enc' to UTF-8 helps address these problems.
                              >>
                              >> 1) This is now a non-issue as Vim can internally represent all
                              >> characters by converting them to their unicode counterpart.
                              >> 2) This can be addressed by making 'tenc' derive its value from the
                              >> environment instead of from 'enc', which is more in line with the
                              >> behavior implied by the name.
                              >> 3) File encoding detection now has a sane default value which means new
                              >> users are less likely to encounter problems when editing files of
                              >> various encodings.
                              >>
                              >> This change would also allow eliminating 'encoding' as an option or,
                              >> less drastic, disallowing changing 'enc' once the startup files have
                              >> been sourced.
                              >>
                              >> Changing 'enc' in a running Vim session is a very common mistake to new
                              >> Vim users that are trying to get their file written out in a specific
                              >> encoding or editing a file that's not in their environment's encoding.
                              >
                              > Yeah. We regularly see people in #vim who don't realize that they
                              > should be changing 'fenc' instead of 'enc', and I've seen it come up
                              > on vim-use a few times as well...
                              >
                              >> The help already states that changing 'enc' in a running session is a
                              >> bad idea, and I know from experience that it can cause Vim to crash[0].
                              >> Taking the next logical step and preventing users from doing that
                              >> (unless someone can provide a compelling reason to continue allowing it)
                              >> makes sense and helps prevent potential data loss.
                              >
                              > This sounds like a very good idea to me. I don't know of any other
                              > programs that allow you to change encoding used internally, and we
                              > would be in good company if we chose to always use a unicode encoding
                              > internally: Java uses UTF-16 internally, and I believe python does as
                              > well. Is there any time when it would be desirable to use a
                              > non-unicode 'encoding' (assuming, of course, that +multi_byte is
                              > available)? I can't think of any.

                              Yes, editing very large (say a few 100MB) data files that in a single
                              byte encoding. For my day job I regularly enjoy having to spelunk my
                              way around large files containing a mix of readable ASCII and binary
                              data. Using a Unicode encoding could make this prohibitive. Yes, this
                              is essentially a raw file edit mode, perhaps that should be an option -
                              or would it be part of setting binary mode?

                              TTFN

                              Mike
                              --
                              I am not young enough to know everything.

                              --~--~---------~--~----~------------~-------~--~----~
                              You received this message from the "vim_dev" maillist.
                              For more information, visit http://www.vim.org/maillist.php
                              -~----------~----~----~----~------~----~------~--~---
                            • Matt Wozniski
                              ... How would using Unicode for enc in any way affect this? Sure, you d want to use a single-byte fenc , but no one is suggesting that the fenc option
                              Message 14 of 21 , Mar 13, 2009
                              • 0 Attachment
                                On Fri, Mar 13, 2009 at 12:01 PM, Mike Williams wrote:
                                >
                                > Matt Wozniski wrote:
                                >> This sounds like a very good idea to me.  I don't know of any other
                                >> programs that allow you to change encoding used internally, and we
                                >> would be in good company if we chose to always use a unicode encoding
                                >> internally: Java uses UTF-16 internally, and I believe python does as
                                >> well.  Is there any time when it would be desirable to use a
                                >> non-unicode 'encoding' (assuming, of course, that +multi_byte is
                                >> available)?  I can't think of any.
                                >
                                > Yes, editing very large (say a few 100MB) data files that in a single
                                > byte encoding.  For my day job I regularly enjoy having to spelunk my
                                > way around large files containing a mix of readable ASCII and binary
                                > data.  Using a Unicode encoding could make this prohibitive.  Yes, this
                                > is essentially a raw file edit mode, perhaps that should be an option -
                                > or would it be part of setting binary mode?

                                How would using Unicode for 'enc' in any way affect this? Sure, you'd
                                want to use a single-byte 'fenc', but no one is suggesting that the
                                'fenc' option should be removed. If there is a reason why editing
                                binary files should be affected at all by what encoding the editor
                                uses for storing the buffer text internally, I don't see it and you'll
                                need to elaborate.

                                ~Matt

                                --~--~---------~--~----~------------~-------~--~----~
                                You received this message from the "vim_dev" maillist.
                                For more information, visit http://www.vim.org/maillist.php
                                -~----------~----~----~----~------~----~------~--~---
                              • Mike Williams
                                ... With a UTF-16 internal encoding a 250MB data file blossoms into a nice round 500MB. For all the cheap memory these days this will still have an effect on
                                Message 15 of 21 , Mar 13, 2009
                                • 0 Attachment
                                  Matt Wozniski wrote:
                                  > On Fri, Mar 13, 2009 at 12:01 PM, Mike Williams wrote:
                                  >> Matt Wozniski wrote:
                                  >>> This sounds like a very good idea to me. I don't know of any other
                                  >>> programs that allow you to change encoding used internally, and we
                                  >>> would be in good company if we chose to always use a unicode encoding
                                  >>> internally: Java uses UTF-16 internally, and I believe python does as
                                  >>> well. Is there any time when it would be desirable to use a
                                  >>> non-unicode 'encoding' (assuming, of course, that +multi_byte is
                                  >>> available)? I can't think of any.
                                  >> Yes, editing very large (say a few 100MB) data files that in a single
                                  >> byte encoding. For my day job I regularly enjoy having to spelunk my
                                  >> way around large files containing a mix of readable ASCII and binary
                                  >> data. Using a Unicode encoding could make this prohibitive. Yes, this
                                  >> is essentially a raw file edit mode, perhaps that should be an option -
                                  >> or would it be part of setting binary mode?
                                  >
                                  > How would using Unicode for 'enc' in any way affect this? Sure, you'd
                                  > want to use a single-byte 'fenc', but no one is suggesting that the
                                  > 'fenc' option should be removed. If there is a reason why editing
                                  > binary files should be affected at all by what encoding the editor
                                  > uses for storing the buffer text internally, I don't see it and you'll
                                  > need to elaborate.

                                  With a UTF-16 internal encoding a 250MB data file blossoms into a nice
                                  round 500MB. For all the cheap memory these days this will still have
                                  an effect on system performance - time to allocate, paging out of idle
                                  apps to disk, etc.

                                  And will VIM internally use a canonical Unicode form? What happens if I
                                  want to insert some 8-bit data whose unicode character has multiple
                                  forms? Which one is used? How will I know that the 8-bit value I
                                  intend does not appear as composed sequence? I haven't used VIM for
                                  editing unicode with composing characters (damn my native english
                                  country) - I see there is some discussion on composing but a first
                                  glance it is not clear whether it is automatic or not. In my case I
                                  would not want deletion of data byte to result in other bytes to deleted
                                  as well.

                                  At the moment I cannot see how supporting Unicode semantics maps to
                                  editing binary data files. Not saying it is impossible, I'd just like
                                  to see the possible way out of the woods if we did go this way.

                                  TTFN

                                  Mike
                                  --
                                  Imagination is more important than knowledge.

                                  --~--~---------~--~----~------------~-------~--~----~
                                  You received this message from the "vim_dev" maillist.
                                  For more information, visit http://www.vim.org/maillist.php
                                  -~----------~----~----~----~------~----~------~--~---
                                • Tony Mechelynck
                                  ... Vim doesn t use UTF-16 internally but UTF-8 -- even if you set encoding to, let s say, utf-16le, because Vim cannot tolerate actual nulls in the middle
                                  Message 16 of 21 , Mar 14, 2009
                                  • 0 Attachment
                                    On 13/03/09 17:22, Mike Williams wrote:
                                    >
                                    > Matt Wozniski wrote:
                                    >> On Fri, Mar 13, 2009 at 12:01 PM, Mike Williams wrote:
                                    >>> Matt Wozniski wrote:
                                    >>>> This sounds like a very good idea to me. I don't know of any other
                                    >>>> programs that allow you to change encoding used internally, and we
                                    >>>> would be in good company if we chose to always use a unicode encoding
                                    >>>> internally: Java uses UTF-16 internally, and I believe python does as
                                    >>>> well. Is there any time when it would be desirable to use a
                                    >>>> non-unicode 'encoding' (assuming, of course, that +multi_byte is
                                    >>>> available)? I can't think of any.
                                    >>> Yes, editing very large (say a few 100MB) data files that in a single
                                    >>> byte encoding. For my day job I regularly enjoy having to spelunk my
                                    >>> way around large files containing a mix of readable ASCII and binary
                                    >>> data. Using a Unicode encoding could make this prohibitive. Yes, this
                                    >>> is essentially a raw file edit mode, perhaps that should be an option -
                                    >>> or would it be part of setting binary mode?
                                    >>
                                    >> How would using Unicode for 'enc' in any way affect this? Sure, you'd
                                    >> want to use a single-byte 'fenc', but no one is suggesting that the
                                    >> 'fenc' option should be removed. If there is a reason why editing
                                    >> binary files should be affected at all by what encoding the editor
                                    >> uses for storing the buffer text internally, I don't see it and you'll
                                    >> need to elaborate.
                                    >
                                    > With a UTF-16 internal encoding a 250MB data file blossoms into a nice
                                    > round 500MB. For all the cheap memory these days this will still have
                                    > an effect on system performance - time to allocate, paging out of idle
                                    > apps to disk, etc.

                                    Vim doesn't use UTF-16 internally but UTF-8 -- even if you set
                                    'encoding' to, let's say, utf-16le, because Vim cannot tolerate actual
                                    nulls in the middle of lines. This also means there is no space loss for
                                    7-bit ASCII, which is represented identically in ASCII, Latin1, UTF-8,
                                    and indeed also in most iso-8859 encodings.

                                    >
                                    > And will VIM internally use a canonical Unicode form? What happens if I
                                    > want to insert some 8-bit data whose unicode character has multiple
                                    > forms? Which one is used? How will I know that the 8-bit value I
                                    > intend does not appear as composed sequence? I haven't used VIM for
                                    > editing unicode with composing characters (damn my native english
                                    > country) - I see there is some discussion on composing but a first
                                    > glance it is not clear whether it is automatic or not. In my case I
                                    > would not want deletion of data byte to result in other bytes to deleted
                                    > as well.
                                    >
                                    > At the moment I cannot see how supporting Unicode semantics maps to
                                    > editing binary data files. Not saying it is impossible, I'd just like
                                    > to see the possible way out of the woods if we did go this way.
                                    >
                                    > TTFN
                                    >
                                    > Mike

                                    IMHO, binary data should be read "as if" 8-bit because in an 8-bit
                                    'fileencoding' there are no "invalid" byte sequences -- and probably
                                    Latin1 because the conversion Latin1 <=> UTF-8 is trivial and requires
                                    no iconv library. An alternate possibility (but to be used only at
                                    user's explicit request IMHO) is to convert binary to hex and vice-versa
                                    via xxd.

                                    However, this is not what Vim does if you read a file with ++bin: what
                                    it does is "no conversion", which means that if 'encoding' is set to
                                    UTF-8 you'll probably get invalid UTF-8 sequences at many places in your
                                    code. For instance an a-acute in Spanish Latin1 text will appear as <e1>
                                    instead of á and an e-circumflex in French Latin1 text will appear as
                                    <ea> instead of ê. Not very convenient if they happen to be within text
                                    strings -- messages, maybe, to be typed out on the screen. So even if
                                    you know that the code is binary you might prefer to use

                                    :e ++enc=latin1 ++ff=unix foobar.bin

                                    and omit the 'binary' setting. The result, if you make changes and save
                                    them, could be an extra 0x0A at the very end if there wasn't one
                                    already, but I don't expect trouble even if it happens. (Overlong lines
                                    might be split if you were on a 16-bit machine, but on 32-bit machines
                                    the maximum line lize and the maximum file size are both 2GB, and even
                                    on a 64-bit machine I don't expect you'll often have to edit a binary
                                    file containing a 2GB stretch of code without a single 0x0A in it.)

                                    Of course, the utmost care should be used when editing binary files
                                    because, if it is e.g. program code,
                                    - the code can contain displacements in binary, which will become
                                    invalid if the length of the intervening text is modified
                                    - executable code should in general not be touched
                                    - compressed binaries are probably not editable in any way
                                    - and what if the program includes a binary hash of its ASCII text
                                    somewhere?

                                    As for canonical forms: I don't think Vim will spontaneously convert
                                    either way between a spacing character + combining character(s) combo
                                    and a precomposed character. If you type a then Ctrl-v u 0301 you'll get
                                    a spacing a and a combining acute. If your keyboard allows "keying" an
                                    a-acute character, or if you type Ctrl-V x e1, you'll get a precomposed
                                    a-acute. The two results will be indistinguishable if you have a "good"
                                    font but Vim doesn't know that, and searching for the precomposed
                                    character will not match the ascii + accent two-codepoint combo.


                                    Best regards,
                                    Tony.
                                    --
                                    "Can you hammer a 6-inch spike into a wooden plank with your penis?"

                                    "Uh, not right now."

                                    "Tsk. A girl has to have some standards."
                                    -- "Real Genius"

                                    --~--~---------~--~----~------------~-------~--~----~
                                    You received this message from the "vim_dev" maillist.
                                    For more information, visit http://www.vim.org/maillist.php
                                    -~----------~----~----~----~------~----~------~--~---
                                  Your message has been successfully submitted and would be delivered to recipients shortly.