Loading ...
Sorry, an error occurred while loading the content.

UTF-8 input with Terminal.app

Expand Messages
  • Charles Collicutt
    Hi, I m using Vim 6.4.1 (co from the CVS yesterday) on OS X 10.3.9 compiled with --with-features=big --enable-multibyte. $LANG is set to en_GB.UTF-8 in both
    Message 1 of 3 , Nov 1, 2005
    • 0 Attachment
      Hi,

      I'm using Vim 6.4.1 (co from the CVS yesterday) on OS X
      10.3.9 compiled with --with-features=big --enable-multibyte.

      $LANG is set to "en_GB.UTF-8" in both .bashrc and
      environment.plist and Vim correctly notices this and sets
      encoding to "utf-8".

      When I run Vim with its GUI - by double-clicking the Vim.app
      icon or using "open -a Vim" - it automatically sets
      termencoding to "macroman". I have "set termencoding=utf-8"
      in my .gvimrc but it seems to be overridden. Vim will then
      accept UTF-8 input (generated using the option key or the
      character palette.) Inputting with i_CTRL-V_digit or
      digraphs also works.

      That's great, but I usually use Vim in a terminal. With
      Terminal.app set to use UTF-8, vim will accept input using
      i_CTRL-V_digit or digraphs perfectly but won't accept UTF-8
      input directly (such as that generated by the character
      palette or by using the option key.) It seems to interpret
      16-bit UTF-8 codes (such as those in the Latin-1 extension)
      as two separate 8-bit characters. If I attempt to enter
      lowercase-a-with-an-umlaut, I get uppercase-a-with-a-tilde
      followed by the international currency symbol. If I'm right,
      the UTF-8 representation of ä consists of two bytes, the
      first of which happens to correspond to à in ISO8859-1 and
      the second to ¤. Why is it assuming that the input is
      ISO8859-1 when encoding is set to UTF-8 and termencoding is
      empty?

      If I set termencoding to "macroman" in my .vimrc then an
      attempt to input "ä" results in "ä " (i.e. an extra space is
      inserted after the character) which is no good but at least
      it has sort of recognised the character properly.
      Unfortunately, now I cannot input unicode characters with
      digraphs because vim thinks I am limited to macroman.

      I prefer to use digraphs anyway - so it is the same wherever
      I am using vim - so this isn't that bad but I'd like to know
      what is going on.

      As I understand it, Unicode-aware applications in OS X (i.e.
      all Cocoa apps and most recent Carbon apps) should receive
      UTF-8 input. Non-aware apps receive whatever Script is
      specified in International in System Preferences (in my
      case, MacRoman.) Terminal.app is a Unicode-aware app, so it
      should be receiving UTF-8 input. This fits with the fact
      that the garbage I get when I try to enter non-ascii
      characters into vim does seem to be the result of
      interpreting 16-bit UTF-8 codes as two separate 8-bit
      characters. What I don't understand is why setting
      termencoding to macroman results in the correct character
      followed by a space? I also don't understand why it works in
      gvim and not vim. When running with a GUI, termencoding only
      specifies the input and not the display, whereas in a
      terminal it specifies both, right? Does that have anything
      to do with it? Is there any way to decouple the input and
      display encodings when running in a terminal? And why does
      macroman work anyway, if it's receiving UTF-8 input?

      Any help would be very appreciated...

      --
      Charles
    • Charles Collicutt
      I ve done some more investigation and can clarify the ... It doesn t actually accept UTF-8 input at all. OS X treats Vim.app as a non-Unicode-aware application
      Message 2 of 3 , Nov 1, 2005
      • 0 Attachment
        I've done some more investigation and can clarify the
        problem slightly now:

        > When I run Vim with its GUI - by double-clicking the
        > Vim.app icon or using "open -a Vim" - it automatically
        > sets termencoding to "macroman". Vim will then accept
        > UTF-8 input.

        It doesn't actually accept UTF-8 input at all. OS X treats
        Vim.app as a non-Unicode-aware application so sends it input
        according to the Script setting in the International panel
        of System Preferences. So it is actually receiving MacRoman
        input (which fits termencoding) and converting it to UTF-8
        internally as encoding is set to utf-8. Therefore you can't
        actually input anything that isn't in the MacRoman character
        repertoire without using digraphs or i_CTRL-V_digit.

        Terminal.app is treated as being Unicode-aware, so it
        receives UTF-8 input. However, vim seemed to be treating
        this as ISO8859-1 input. So, for example, ä would appear as
        ä (because ä is C3A4 in UTF-8 while à is C3 and ¤ is A4 in
        ISO8859-1.) This turned out to be the fault of a default
        setting in Terminal.app - in the Emulation pane of Window
        Settings there is an option to "Escape non-ASCII characters"
        which is ticked by default. If unticked, UTF-8 input to vim
        works properly.

        So, all that remains is to get OS X to treat Vim.app as a
        Unicode-aware application so UTF-8 input works with the GUI.

        --
        Charles
      • Benji Fisher
        ... Have you tried vim 7.0? I think that some Mac-specific code was added that changes how it deals with Unicode. If you do not feel like compiling it
        Message 3 of 3 , Nov 3, 2005
        • 0 Attachment
          On Tue, Nov 01, 2005 at 06:54:49PM +0000, Charles Collicutt wrote:
          > I've done some more investigation and can clarify the
          > problem slightly now:
          >
          > > When I run Vim with its GUI - by double-clicking the
          > > Vim.app icon or using "open -a Vim" - it automatically
          > > sets termencoding to "macroman". Vim will then accept
          > > UTF-8 input.
          >
          > It doesn't actually accept UTF-8 input at all. OS X treats
          > Vim.app as a non-Unicode-aware application so sends it input
          > according to the Script setting in the International panel
          > of System Preferences. So it is actually receiving MacRoman
          > input (which fits termencoding) and converting it to UTF-8
          > internally as encoding is set to utf-8. Therefore you can't
          > actually input anything that isn't in the MacRoman character
          > repertoire without using digraphs or i_CTRL-V_digit.
          >
          > Terminal.app is treated as being Unicode-aware, so it
          > receives UTF-8 input. However, vim seemed to be treating
          > this as ISO8859-1 input. So, for example, ä would appear as
          > ä (because ä is C3A4 in UTF-8 while à is C3 and ¤ is A4 in
          > ISO8859-1.) This turned out to be the fault of a default
          > setting in Terminal.app - in the Emulation pane of Window
          > Settings there is an option to "Escape non-ASCII characters"
          > which is ticked by default. If unticked, UTF-8 input to vim
          > works properly.
          >
          > So, all that remains is to get OS X to treat Vim.app as a
          > Unicode-aware application so UTF-8 input works with the GUI.
          >
          > --
          > Charles

          Have you tried vim 7.0? I think that some Mac-specific code was
          added that changes how it deals with Unicode. If you do not feel like
          compiling it yourself, you can get a binary at

          http://macvim.org/OSX/index.php#Downloading

          HTH --Benji Fisher
        Your message has been successfully submitted and would be delivered to recipients shortly.