Loading ...
Sorry, an error occurred while loading the content.

Bug in Vim' locales when calling Python script (?)

Expand Messages
  • Marijn
    Hi, I think I found a bug in Vim s UTF8 handling. I ve spend 2 days debugging, testing and hair pulling, but I coudn t find the solution the problem and now I
    Message 1 of 8 , Jan 3, 2007
    • 0 Attachment
      Hi,

      I think I found a bug in Vim's UTF8 handling. I've spend 2 days debugging, testing and hair pulling, but I coudn't find the solution the problem and now I think it's a bug. I use Gentoo, Vim 7.0 and UTF8 in the kernel e.t.c. To debug and check the following message I've compiled a fresh vim from the source, but unfortunately the results are the same for either version (source and gentoo build).

      I've got a Python Vim script that fetches (wordpress) content via xmlrpc (the xmlrpc-source has UTF8 encoding). After the content is fetched it is written to the current buffer. This works fine, but when there are any strange characters in the content the script fails (error below). I've deduced the script to the following:

      =================================================

      if has('python')
      python << EOF
      # -*- coding: utf-8 -*-

      import vim

      def foo():
      u = 'a' # This works fine
      vim.current.buffer.append(u)

      def bar():
      u = unichr(40960) # But this doesn't
      vim.current.buffer.append(u)

      EOF
      endif

      =================================================


      When I call this in Vim, foo() works fine, "a" is appended at the end of the buffer, but calling bar() results in the following error:

      Traceback (most recent call last):
      File "<string>", line 1, in ?
      File "<string>", line 11, in bar
      TypeError: bad argument type for built-in operation


      I've started Vim in utf8 mode:

      marijn@srv ~ $ export LC_ALL=en_US.utf8
      marijn@srv ~ $ export LANG=en_US.utf8
      marijn@srv ~ $ vim


      So that can't be the problem.

      I've also created a seperate file:

      =================================================

      #!/usr/bin/python
      # -*- coding: utf8 -*-
      # test.py
      print unichr(40960)

      =================================================

      Running this in a shell with LC_ALL=en_EN.utf8 works fine, with LC_ALL=C it fails, which is normal.

      When running it in Vim ":!python test.py" works fine, the character is printed.
      But when I try to insert it in the current buffer ": r !python test.py" it fails:
      "UnicodeEncodeError: *'ascii' codec* can't encode character u'ua000' in position 0: ordinal not in range(128)"

      Maybe this is because of the 'r' function not handling UTF8 well (http://vimdoc.sourceforge.net/htmldoc/mbyte.html#UTF-8), but I'm not sure of that. But for completeness I wanted to include this as well.

      I think this (especially the first part) is a bug of Vim, I hope you can acknowledge this, and/or help to find a solution.


      Tia and best wishes,

      Marijn Koesen
    • A.J.Mechelynck
      ... 1. Is your Vim executable built with +multi_byte? ... should answer 1 If the answer is zero, you should install a Vim executable with +multi_byte
      Message 2 of 8 , Jan 3, 2007
      • 0 Attachment
        Marijn wrote:
        > Hi,
        >
        > I think I found a bug in Vim's UTF8 handling. I've spend 2 days debugging, testing and hair pulling, but I coudn't find the solution the problem and now I think it's a bug. I use Gentoo, Vim 7.0 and UTF8 in the kernel e.t.c. To debug and check the following message I've compiled a fresh vim from the source, but unfortunately the results are the same for either version (source and gentoo build).
        >
        > I've got a Python Vim script that fetches (wordpress) content via xmlrpc (the xmlrpc-source has UTF8 encoding). After the content is fetched it is written to the current buffer. This works fine, but when there are any strange characters in the content the script fails (error below). I've deduced the script to the following:
        >
        > =================================================
        >
        > if has('python')
        > python << EOF
        > # -*- coding: utf-8 -*-
        >
        > import vim
        >
        > def foo():
        > u = 'a' # This works fine
        > vim.current.buffer.append(u)
        >
        > def bar():
        > u = unichr(40960) # But this doesn't
        > vim.current.buffer.append(u)
        >
        > EOF
        > endif
        >
        > =================================================
        >
        >
        > When I call this in Vim, foo() works fine, "a" is appended at the end of the buffer, but calling bar() results in the following error:
        >
        > Traceback (most recent call last):
        > File "<string>", line 1, in ?
        > File "<string>", line 11, in bar
        > TypeError: bad argument type for built-in operation
        >
        >
        > I've started Vim in utf8 mode:
        >
        > marijn@srv ~ $ export LC_ALL=en_US.utf8
        > marijn@srv ~ $ export LANG=en_US.utf8
        > marijn@srv ~ $ vim
        >
        >
        > So that can't be the problem.
        >
        > I've also created a seperate file:
        >
        > =================================================
        >
        > #!/usr/bin/python
        > # -*- coding: utf8 -*-
        > # test.py
        > print unichr(40960)
        >
        > =================================================
        >
        > Running this in a shell with LC_ALL=en_EN.utf8 works fine, with LC_ALL=C it fails, which is normal.
        >
        > When running it in Vim ":!python test.py" works fine, the character is printed.
        > But when I try to insert it in the current buffer ": r !python test.py" it fails:
        > "UnicodeEncodeError: *'ascii' codec* can't encode character u'ua000' in position 0: ordinal not in range(128)"
        >
        > Maybe this is because of the 'r' function not handling UTF8 well (http://vimdoc.sourceforge.net/htmldoc/mbyte.html#UTF-8), but I'm not sure of that. But for completeness I wanted to include this as well.
        >
        > I think this (especially the first part) is a bug of Vim, I hope you can acknowledge this, and/or help to find a solution.
        >
        >
        > Tia and best wishes,
        >
        > Marijn Koesen
        >

        1. Is your Vim executable built with +multi_byte?
        :echo has("multi_byte")
        should answer 1
        If the answer is zero, you should install a Vim executable with +multi_byte
        compiled-in.

        2. Do you have 'encoding' set to UTF-8?
        :set enc?
        should answer
        encoding=utf-8
        If the answer is something else (but it passes test 1 above), tell me what it
        is and I'll tell you what to add to your vimrc.

        If the answer to either question is "no", Vim cannot handle UTF-8 codepoints
        above U+007F (or maybe U+00FF, depending).


        Best regards,
        Tony.
      • Marijn
        ... 1) Yes, it s compiled with multi_byte: ... VIM - Vi IMproved 7.0 (2006 May 7, compiled Jan 2 2007 00:32:34) Included patches: 1-17 Modified by
        Message 3 of 8 , Jan 3, 2007
        • 0 Attachment
          A.J.Mechelynck wrote:
          > Marijn wrote:
          >> Hi,
          >>
          >> I think I found a bug in Vim's UTF8 handling. I've spend 2 days debugging, testing and hair pulling, but I coudn't find the solution the problem and now I think it's a bug. I use Gentoo, Vim 7.0 and UTF8 in the kernel e.t.c. To debug and check the following message I've compiled a fresh vim from the source, but unfortunately the results are the same for either version (source and gentoo build).
          >>
          >> I've got a Python Vim script that fetches (wordpress) content via xmlrpc (the xmlrpc-source has UTF8 encoding). After the content is fetched it is written to the current buffer. This works fine, but when there are any strange characters in the content the script fails (error below). I've deduced the script to the following:
          >>
          >> =================================================
          >>
          >> if has('python')
          >> python << EOF
          >> # -*- coding: utf-8 -*-
          >>
          >> import vim
          >>
          >> def foo():
          >> u = 'a' # This works fine
          >> vim.current.buffer.append(u)
          >>
          >> def bar():
          >> u = unichr(40960) # But this doesn't
          >> vim.current.buffer.append(u)
          >>
          >> EOF
          >> endif
          >>
          >> =================================================
          >>
          >>
          >> When I call this in Vim, foo() works fine, "a" is appended at the end of the buffer, but calling bar() results in the following error:
          >>
          >> Traceback (most recent call last):
          >> File "<string>", line 1, in ?
          >> File "<string>", line 11, in bar
          >> TypeError: bad argument type for built-in operation
          >>
          >>
          >> I've started Vim in utf8 mode:
          >>
          >> marijn@srv ~ $ export LC_ALL=en_US.utf8
          >> marijn@srv ~ $ export LANG=en_US.utf8
          >> marijn@srv ~ $ vim
          >>
          >>
          >> So that can't be the problem.
          >>
          >> I've also created a seperate file:
          >>
          >> =================================================
          >>
          >> #!/usr/bin/python
          >> # -*- coding: utf8 -*-
          >> # test.py
          >> print unichr(40960)
          >>
          >> =================================================
          >>
          >> Running this in a shell with LC_ALL=en_EN.utf8 works fine, with LC_ALL=C it fails, which is normal.
          >>
          >> When running it in Vim ":!python test.py" works fine, the character is printed.
          >> But when I try to insert it in the current buffer ": r !python test.py" it fails: "UnicodeEncodeError: *'ascii' codec* can't encode character u'ua000' in position 0: ordinal not in range(128)"
          >>
          >> Maybe this is because of the 'r' function not handling UTF8 well (http://vimdoc.sourceforge.net/htmldoc/mbyte.html#UTF-8), but I'm not sure of that. But for completeness I wanted to include this as well.
          >>
          >> I think this (especially the first part) is a bug of Vim, I hope you can acknowledge this, and/or help to find a solution.
          >>
          >>
          >> Tia and best wishes,
          >>
          >> Marijn Koesen
          >>
          >
          > 1. Is your Vim executable built with +multi_byte?
          > :echo has("multi_byte")
          > should answer 1
          > If the answer is zero, you should install a Vim executable with +multi_byte compiled-in.
          >
          > 2. Do you have 'encoding' set to UTF-8?
          > :set enc?
          > should answer
          > encoding=utf-8
          > If the answer is something else (but it passes test 1 above), tell me what it is and I'll tell you what to add to your vimrc.
          >
          > If the answer to either question is "no", Vim cannot handle UTF-8 codepoints above U+007F (or maybe U+00FF, depending).
          >
          >
          > Best regards,
          > Tony.
          >


          1) Yes, it's compiled with multi_byte:

          Some more details about my vim version:

          :version
          VIM - Vi IMproved 7.0 (2006 May 7, compiled Jan 2 2007 00:32:34)
          Included patches: 1-17
          Modified by Gentoo-7.0.17
          Compiled by root@henk
          Huge version without GUI. Features included (+) or not (-):
          +arabic +autocmd -balloon_eval -browse ++builtin_terms +byte_offset +cindent -clientserver -clipboard +cmdline_compl
          +cmdline_hist +cmdline_info +comments +cryptv -cscope +cursorshape +dialog_con +diff +digraphs -dnd -ebcdic +emacs_tags +eval
          +ex_extra +extra_search +farsi +file_in_path +find_in_path +folding -footer +fork() +gettext -hangul_input +iconv
          +insert_expand +jumplist +keymap +langmap +libcall +linebreak +lispindent +listcmds +localmap +menu +mksession +modify_fname
          +mouse -mouseshape +mouse_dec +mouse_gpm -mouse_jsbterm +mouse_netterm +mouse_xterm +multi_byte +multi_lang -mzscheme
          -netbeans_intg -osfiletype +path_extra +perl +postscript +printer +profile +python +quickfix +reltime +rightleft +ruby
          +scrollbind +signs +smartindent -sniff +statusline -sun_workshop +syntax +tag_binary +tag_old_static -tag_any_white -tcl
          +terminfo +termresponse +textobjects +title -toolbar +user_commands +vertsplit +virtualedit +visual +visualextra +viminfo
          +vreplace +wildignore +wildmenu +windows +writebackup -X11 -xfontset -xim -xsmp -xterm_clipboard -xterm_save
          system vimrc file: "/etc/vim/vimrc"
          user vimrc file: "$HOME/.vimrc"
          user exrc file: "$HOME/.exrc"
          fall-back for $VIM: "/usr/share/vim"
          Compilation: i686-pc-linux-gnu-gcc -c -I. -Iproto -DHAVE_CONFIG_H -march=pentium3 -O2 -pipe -fomit-frame-pointer -pipe
          -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/lib/perl5/5.8.8/i686-linux/CORE -I/usr/include/python2.4 -pthread -I/usr/
          lib/ruby/1.8/i686-linux
          Linking: i686-pc-linux-gnu-gcc -rdynamic -Wl,-export-dynamic -rdynamic -L/usr/local/lib -o vim -lncurses -lgpm -r
          dynamic -L/usr/local/lib /usr/lib/perl5/5.8.8/i686-linux/auto/DynaLoader/DynaLoader.a -L/usr/lib/perl5/5.8.8/i686-linux/CORE
          -lperl -lutil -lc -L/usr/lib/python2.4/config -lpython2.4 -lpthread -lutil -Xlinker -export-dynamic -Wl,-R -Wl,/usr/lib -L/us
          r/lib -L/usr/lib -lruby18 -lm


          2) Yes, all the files that I have used and created have (had) the utf8 encoding. I've tested all the files by "set enc".


          Best regards,

          Marijn
        • A.J.Mechelynck
          ... The last paragrapg there (about guifontwide etc.) seems inaccurate. I have the same at :help Unicode in the Vim help but IMHO what is said under :help
          Message 4 of 8 , Jan 4, 2007
          • 0 Attachment
            Marijn wrote:
            > A.J.Mechelynck wrote:
            >> Marijn wrote:
            >>> Hi,
            >>>
            >>> I think I found a bug in Vim's UTF8 handling. I've spend 2 days debugging, testing and hair pulling, but I coudn't find the solution the problem and now I think it's a bug. I use Gentoo, Vim 7.0 and UTF8 in the kernel e.t.c. To debug and check the following message I've compiled a fresh vim from the source, but unfortunately the results are the same for either version (source and gentoo build).
            >>>
            >>> I've got a Python Vim script that fetches (wordpress) content via xmlrpc (the xmlrpc-source has UTF8 encoding). After the content is fetched it is written to the current buffer. This works fine, but when there are any strange characters in the content the script fails (error below). I've deduced the script to the following:
            >>>
            >>> =================================================
            >>>
            >>> if has('python')
            >>> python << EOF
            >>> # -*- coding: utf-8 -*-
            >>>
            >>> import vim
            >>>
            >>> def foo():
            >>> u = 'a' # This works fine
            >>> vim.current.buffer.append(u)
            >>>
            >>> def bar():
            >>> u = unichr(40960) # But this doesn't
            >>> vim.current.buffer.append(u)
            >>>
            >>> EOF
            >>> endif
            >>>
            >>> =================================================
            >>>
            >>>
            >>> When I call this in Vim, foo() works fine, "a" is appended at the end of the buffer, but calling bar() results in the following error:
            >>>
            >>> Traceback (most recent call last):
            >>> File "<string>", line 1, in ?
            >>> File "<string>", line 11, in bar
            >>> TypeError: bad argument type for built-in operation
            >>>
            >>>
            >>> I've started Vim in utf8 mode:
            >>>
            >>> marijn@srv ~ $ export LC_ALL=en_US.utf8
            >>> marijn@srv ~ $ export LANG=en_US.utf8
            >>> marijn@srv ~ $ vim
            >>>
            >>>
            >>> So that can't be the problem.
            >>>
            >>> I've also created a seperate file:
            >>>
            >>> =================================================
            >>>
            >>> #!/usr/bin/python
            >>> # -*- coding: utf8 -*-
            >>> # test.py
            >>> print unichr(40960)
            >>>
            >>> =================================================
            >>>
            >>> Running this in a shell with LC_ALL=en_EN.utf8 works fine, with LC_ALL=C it fails, which is normal.
            >>>
            >>> When running it in Vim ":!python test.py" works fine, the character is printed.
            >>> But when I try to insert it in the current buffer ": r !python test.py" it fails: "UnicodeEncodeError: *'ascii' codec* can't encode character u'ua000' in position 0: ordinal not in range(128)"
            >>>
            >>> Maybe this is because of the 'r' function not handling UTF8 well (http://vimdoc.sourceforge.net/htmldoc/mbyte.html#UTF-8), but I'm not sure of that. But for completeness I wanted to include this as well.

            The last paragrapg there (about 'guifontwide' etc.) seems inaccurate. I have
            the same at ":help Unicode" in the Vim help but IMHO what is said under ":help
            'guifontwide'" seems more accurate. Even though 'guifontwide' is not set my
            gvim uses wide glyphs for wide characters. Maybe because my Chinese 'guifont'
            has them.

            >>>
            >>> I think this (especially the first part) is a bug of Vim, I hope you can acknowledge this, and/or help to find a solution.
            >>>
            >>>
            >>> Tia and best wishes,
            >>>
            >>> Marijn Koesen
            >>>
            >> 1. Is your Vim executable built with +multi_byte?
            >> :echo has("multi_byte")
            >> should answer 1
            >> If the answer is zero, you should install a Vim executable with +multi_byte compiled-in.
            >>
            >> 2. Do you have 'encoding' set to UTF-8?
            >> :set enc?
            >> should answer
            >> encoding=utf-8
            >> If the answer is something else (but it passes test 1 above), tell me what it is and I'll tell you what to add to your vimrc.
            >>
            >> If the answer to either question is "no", Vim cannot handle UTF-8 codepoints above U+007F (or maybe U+00FF, depending).
            >>
            >>
            >> Best regards,
            >> Tony.
            >>
            >
            >
            > 1) Yes, it's compiled with multi_byte:
            >
            > Some more details about my vim version:
            >
            > :version
            > VIM - Vi IMproved 7.0 (2006 May 7, compiled Jan 2 2007 00:32:34)
            > Included patches: 1-17
            > Modified by Gentoo-7.0.17
            > Compiled by root@henk
            [...]
            That one (7.0.017) isn't very recent anymore. The current release is 7.0.178.
            See the "table of contents" of the bugfixes at
            http://ftp.vim.org/pub/vim/patches/7.0/README
            >
            >
            > 2) Yes, all the files that I have used and created have (had) the utf8 encoding. I've tested all the files by "set enc".
            >
            >
            > Best regards,
            >
            > Marijn
            >

            'enc' defines the _internal_ encoding used by Vim to represent the characters
            _internally_ in memory. The charset of an edited file can be different. Open
            the buffer where you were trying to add data by means of that python script
            and use

            :verbose set enc? fenc?

            What is the reply?


            Best regards,
            Tony.
          • Marijn
            ... Thanks for the reply Tony, but everything is really utf8: All the files return: encoding=utf-8 fileencoding=utf-8 Best regards, Marijn
            Message 5 of 8 , Jan 4, 2007
            • 0 Attachment
              A.J.Mechelynck wrote:
              > Marijn wrote:
              >> A.J.Mechelynck wrote:
              >>> Marijn wrote:
              >>>> Hi,
              >>>>
              >>>> I think I found a bug in Vim's UTF8 handling. I've spend 2 days debugging, testing and hair pulling, but I coudn't find the solution the problem and now I think it's a bug. I use Gentoo, Vim 7.0 and UTF8 in the kernel e.t.c. To debug and check the following message I've compiled a fresh vim from the source, but unfortunately the results are the same for either version (source and gentoo build).
              >>>>
              >>>> I've got a Python Vim script that fetches (wordpress) content via xmlrpc (the xmlrpc-source has UTF8 encoding). After the content is fetched it is written to the current buffer. This works fine, but when there are any strange characters in the content the script fails (error below). I've deduced the script to the following:
              >>>>
              >>>> =================================================
              >>>>
              >>>> if has('python')
              >>>> python << EOF
              >>>> # -*- coding: utf-8 -*-
              >>>>
              >>>> import vim
              >>>>
              >>>> def foo():
              >>>> u = 'a' # This works fine
              >>>> vim.current.buffer.append(u)
              >>>>
              >>>> def bar():
              >>>> u = unichr(40960) # But this doesn't
              >>>> vim.current.buffer.append(u)
              >>>>
              >>>> EOF
              >>>> endif
              >>>>
              >>>> =================================================
              >>>>
              >>>>
              >>>> When I call this in Vim, foo() works fine, "a" is appended at the end of the buffer, but calling bar() results in the following error:
              >>>>
              >>>> Traceback (most recent call last):
              >>>> File "<string>", line 1, in ?
              >>>> File "<string>", line 11, in bar
              >>>> TypeError: bad argument type for built-in operation
              >>>>
              >>>>
              >>>> I've started Vim in utf8 mode:
              >>>>
              >>>> marijn@srv ~ $ export LC_ALL=en_US.utf8
              >>>> marijn@srv ~ $ export LANG=en_US.utf8
              >>>> marijn@srv ~ $ vim
              >>>>
              >>>>
              >>>> So that can't be the problem.
              >>>>
              >>>> I've also created a seperate file:
              >>>>
              >>>> =================================================
              >>>>
              >>>> #!/usr/bin/python
              >>>> # -*- coding: utf8 -*-
              >>>> # test.py print unichr(40960)
              >>>>
              >>>> =================================================
              >>>>
              >>>> Running this in a shell with LC_ALL=en_EN.utf8 works fine, with LC_ALL=C it fails, which is normal.
              >>>>
              >>>> When running it in Vim ":!python test.py" works fine, the character is printed.
              >>>> But when I try to insert it in the current buffer ": r !python test.py" it fails: "UnicodeEncodeError: *'ascii' codec* can't encode character u'ua000' in position 0: ordinal not in range(128)"
              >>>>
              >>>> Maybe this is because of the 'r' function not handling UTF8 well (http://vimdoc.sourceforge.net/htmldoc/mbyte.html#UTF-8), but I'm not sure of that. But for completeness I wanted to include this as well.
              >
              > The last paragrapg there (about 'guifontwide' etc.) seems inaccurate. I have the same at ":help Unicode" in the Vim help but IMHO what is said under ":help 'guifontwide'" seems more accurate. Even though 'guifontwide' is not set my gvim uses wide glyphs for wide characters. Maybe because my Chinese 'guifont' has them.
              >
              >>>>
              >>>> I think this (especially the first part) is a bug of Vim, I hope you can acknowledge this, and/or help to find a solution.
              >>>>
              >>>>
              >>>> Tia and best wishes,
              >>>>
              >>>> Marijn Koesen
              >>>>
              >>> 1. Is your Vim executable built with +multi_byte?
              >>> :echo has("multi_byte")
              >>> should answer 1
              >>> If the answer is zero, you should install a Vim executable with +multi_byte compiled-in.
              >>>
              >>> 2. Do you have 'encoding' set to UTF-8?
              >>> :set enc?
              >>> should answer
              >>> encoding=utf-8
              >>> If the answer is something else (but it passes test 1 above), tell me what it is and I'll tell you what to add to your vimrc.
              >>>
              >>> If the answer to either question is "no", Vim cannot handle UTF-8 codepoints above U+007F (or maybe U+00FF, depending).
              >>>
              >>>
              >>> Best regards,
              >>> Tony.
              >>>
              >>
              >>
              >> 1) Yes, it's compiled with multi_byte:
              >>
              >> Some more details about my vim version:
              >>
              >> :version
              >> VIM - Vi IMproved 7.0 (2006 May 7, compiled Jan 2 2007 00:32:34)
              >> Included patches: 1-17
              >> Modified by Gentoo-7.0.17
              >> Compiled by root@henk
              > [...]
              > That one (7.0.017) isn't very recent anymore. The current release is 7.0.178. See the "table of contents" of the bugfixes at http://ftp.vim.org/pub/vim/patches/7.0/README
              >>
              >>
              >> 2) Yes, all the files that I have used and created have (had) the utf8 encoding. I've tested all the files by "set enc".
              >>
              >>
              >> Best regards,
              >>
              >> Marijn
              >>
              >
              > 'enc' defines the _internal_ encoding used by Vim to represent the characters _internally_ in memory. The charset of an edited file can be different. Open the buffer where you were trying to add data by means of that python script and use
              >
              > :verbose set enc? fenc?
              >
              > What is the reply?
              >
              >
              > Best regards,
              > Tony.
              >

              Thanks for the reply Tony, but everything is really utf8:

              All the files return:
              encoding=utf-8
              fileencoding=utf-8


              Best regards,
              Marijn
            • Marijn
              ... I ve just fetched all the patches, patched vim and tried it again: VIM - Vi IMproved 7.0 (2006 May 7, compiled Jan 4 2007 19:12:01) Included patches:
              Message 6 of 8 , Jan 4, 2007
              • 0 Attachment
                Marijn wrote:
                > A.J.Mechelynck wrote:
                >> Marijn wrote:
                >>> A.J.Mechelynck wrote:
                >>>> Marijn wrote:
                >>>>> Hi,
                >>>>>
                >>>>> I think I found a bug in Vim's UTF8 handling. I've spend 2 days debugging, testing and hair pulling, but I coudn't find the solution the problem and now I think it's a bug. I use Gentoo, Vim 7.0 and UTF8 in the kernel e.t.c. To debug and check the following message I've compiled a fresh vim from the source, but unfortunately the results are the same for either version (source and gentoo build).
                >>>>>
                >>>>> I've got a Python Vim script that fetches (wordpress) content via xmlrpc (the xmlrpc-source has UTF8 encoding). After the content is fetched it is written to the current buffer. This works fine, but when there are any strange characters in the content the script fails (error below). I've deduced the script to the following:
                >>>>>
                >>>>> =================================================
                >>>>>
                >>>>> if has('python')
                >>>>> python << EOF
                >>>>> # -*- coding: utf-8 -*-
                >>>>>
                >>>>> import vim
                >>>>>
                >>>>> def foo():
                >>>>> u = 'a' # This works fine
                >>>>> vim.current.buffer.append(u)
                >>>>>
                >>>>> def bar():
                >>>>> u = unichr(40960) # But this doesn't
                >>>>> vim.current.buffer.append(u)
                >>>>>
                >>>>> EOF
                >>>>> endif
                >>>>>
                >>>>> =================================================
                >>>>>
                >>>>>
                >>>>> When I call this in Vim, foo() works fine, "a" is appended at the end of the buffer, but calling bar() results in the following error:
                >>>>>
                >>>>> Traceback (most recent call last):
                >>>>> File "<string>", line 1, in ?
                >>>>> File "<string>", line 11, in bar
                >>>>> TypeError: bad argument type for built-in operation
                >>>>>
                >>>>>
                >>>>> I've started Vim in utf8 mode:
                >>>>>
                >>>>> marijn@srv ~ $ export LC_ALL=en_US.utf8
                >>>>> marijn@srv ~ $ export LANG=en_US.utf8
                >>>>> marijn@srv ~ $ vim
                >>>>>
                >>>>>
                >>>>> So that can't be the problem.
                >>>>>
                >>>>> I've also created a seperate file:
                >>>>>
                >>>>> =================================================
                >>>>>
                >>>>> #!/usr/bin/python
                >>>>> # -*- coding: utf8 -*-
                >>>>> # test.py print unichr(40960)
                >>>>>
                >>>>> =================================================
                >>>>>
                >>>>> Running this in a shell with LC_ALL=en_EN.utf8 works fine, with LC_ALL=C it fails, which is normal.
                >>>>>
                >>>>> When running it in Vim ":!python test.py" works fine, the character is printed.
                >>>>> But when I try to insert it in the current buffer ": r !python test.py" it fails: "UnicodeEncodeError: *'ascii' codec* can't encode character u'ua000' in position 0: ordinal not in range(128)"
                >>>>>
                >>>>> Maybe this is because of the 'r' function not handling UTF8 well (http://vimdoc.sourceforge.net/htmldoc/mbyte.html#UTF-8), but I'm not sure of that. But for completeness I wanted to include this as well.
                >> The last paragrapg there (about 'guifontwide' etc.) seems inaccurate. I have the same at ":help Unicode" in the Vim help but IMHO what is said under ":help 'guifontwide'" seems more accurate. Even though 'guifontwide' is not set my gvim uses wide glyphs for wide characters. Maybe because my Chinese 'guifont' has them.
                >>
                >>>>> I think this (especially the first part) is a bug of Vim, I hope you can acknowledge this, and/or help to find a solution.
                >>>>>
                >>>>>
                >>>>> Tia and best wishes,
                >>>>>
                >>>>> Marijn Koesen
                >>>>>
                >>>> 1. Is your Vim executable built with +multi_byte?
                >>>> :echo has("multi_byte")
                >>>> should answer 1
                >>>> If the answer is zero, you should install a Vim executable with +multi_byte compiled-in.
                >>>>
                >>>> 2. Do you have 'encoding' set to UTF-8?
                >>>> :set enc?
                >>>> should answer
                >>>> encoding=utf-8
                >>>> If the answer is something else (but it passes test 1 above), tell me what it is and I'll tell you what to add to your vimrc.
                >>>>
                >>>> If the answer to either question is "no", Vim cannot handle UTF-8 codepoints above U+007F (or maybe U+00FF, depending).
                >>>>
                >>>>
                >>>> Best regards,
                >>>> Tony.
                >>>>
                >>>
                >>> 1) Yes, it's compiled with multi_byte:
                >>>
                >>> Some more details about my vim version:
                >>>
                >>> :version
                >>> VIM - Vi IMproved 7.0 (2006 May 7, compiled Jan 2 2007 00:32:34)
                >>> Included patches: 1-17
                >>> Modified by Gentoo-7.0.17
                >>> Compiled by root@henk
                >> [...]
                >> That one (7.0.017) isn't very recent anymore. The current release is 7.0.178. See the "table of contents" of the bugfixes at http://ftp.vim.org/pub/vim/patches/7.0/README
                >>>
                >>> 2) Yes, all the files that I have used and created have (had) the utf8 encoding. I've tested all the files by "set enc".
                >>>
                >>>
                >>> Best regards,
                >>>
                >>> Marijn
                >>>
                >> 'enc' defines the _internal_ encoding used by Vim to represent the characters _internally_ in memory. The charset of an edited file can be different. Open the buffer where you were trying to add data by means of that python script and use
                >>
                >> :verbose set enc? fenc?
                >>
                >> What is the reply?
                >>
                >>
                >> Best regards,
                >> Tony.
                >>
                >
                > Thanks for the reply Tony, but everything is really utf8:
                >
                > All the files return:
                > encoding=utf-8
                > fileencoding=utf-8
                >
                >
                > Best regards,
                > Marijn
                >


                >> That one (7.0.017) isn't very recent anymore. The current release is 7.0.178. See the "table of contents" of the bugfixes at http://ftp.vim.org/pub/vim/patches/7.0/README

                I've just fetched all the patches, patched vim and tried it again:

                VIM - Vi IMproved 7.0 (2006 May 7, compiled Jan 4 2007 19:12:01)
                Included patches: 1-178
                Compiled by joost@henk
                Normal version with GTK2 GUI. Features included (+) or not (-):
                -arabic +autocmd +balloon_eval +browse +builtin_terms +byte_offset +cindent +clientserver +clipboard +cmdline_compl +cmdline_hist +cmdline_info +comments +cryptv
                -cscope +cursorshape +dialog_con_gui +diff +digraphs +dnd -ebcdic -emacs_tags +eval +ex_extra +extra_search -farsi +file_in_path +find_in_path +folding -footer +fork()
                -gettext -hangul_input +iconv +insert_expand +jumplist -keymap -langmap +libcall +linebreak +lispindent +listcmds +localmap +menu +mksession +modify_fname +mouse
                +mouseshape -mouse_dec +mouse_gpm -mouse_jsbterm -mouse_netterm +mouse_xterm +multi_byte +multi_lang -mzscheme +netbeans_intg -osfiletype +path_extra -perl +postscript
                +printer -profile +python +quickfix +reltime -rightleft -ruby +scrollbind +signs +smartindent -sniff +statusline -sun_workshop +syntax +tag_binary +tag_old_static
                -tag_any_white -tcl +terminfo +termresponse +textobjects +title +toolbar +user_commands +vertsplit +virtualedit +visual +visualextra +viminfo +vreplace +wildignore
                +wildmenu +windows +writebackup +X11 -xfontset +xim +xsmp_interact +xterm_clipboard -xterm_save
                system vimrc file: "$VIM/vimrc"
                user vimrc file: "$HOME/.vimrc"
                user exrc file: "$HOME/.exrc"
                system gvimrc file: "$VIM/gvimrc"
                user gvimrc file: "$HOME/.gvimrc"
                system menu file: "$VIMRUNTIME/menu.vim"
                fall-back for $VIM: "/home/joost/vim7/share/vim"
                Compilation: gcc -c -I. -Iproto -DHAVE_CONFIG_H -DFEAT_GUI_GTK -DXTHREADS -D_REENTRANT -DXUSE_MTSAFE_API -I/usr/include/gtk-2.0 -I/usr/lib/gtk-2.0/include -I/usr/inclu
                de/atk-1.0 -I/usr/include/pango-1.0 -I/usr/include/freetype2 -I/usr/include/freetype2/config -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -g -O2 -I/usr/i
                nclude/python2.4 -pthread
                Linking: gcc -L/usr/local/lib -o vim -lgtk-x11-2.0 -lgdk-x11-2.0 -latk-1.0 -lgdk_pixbuf-2.0 -lm -lpangoxft-1.0 -lpangox-1.0 -lpango-1.0 -lgobject-2.0 -lgmodule-2.0
                -lglib-2.0 -lXt -lncurses -lgpm -L/usr/lib/python2.4/config -lpython2.4 -lpthread -ldl -lutil -lm -Xlinker -export-dynamic




                But the following script

                ===========================
                #!/usr/bin/python
                # -*- coding: utf8 -*-
                print unichr(40960)
                ===========================

                In vim: "r !python test.py" gives the same: "'ascii' codec can't encode character" error.

                Also:

                ===========================
                def bar():
                u = unichr(40960)
                vim.current.buffer.append(u)
                ===========================

                Still gives the "TypeError: bad argument type for built-in operation" error.


                Best regards,
                Marijn
              • Bram Moolenaar
                ... The problem is in Python, this is a Python error message. This has nothing to do with Vim. I guess you somehow have to put Python in utf-8 mode first.
                Message 7 of 8 , Jan 6, 2007
                • 0 Attachment
                  Marijn wrote:

                  > But the following script
                  >
                  > ===========================
                  > #!/usr/bin/python
                  > # -*- coding: utf8 -*-
                  > print unichr(40960)
                  > ===========================
                  >
                  > In vim: "r !python test.py" gives the same: "'ascii' codec can't
                  > encode character" error.

                  The problem is in Python, this is a Python error message. This has
                  nothing to do with Vim.

                  I guess you somehow have to put Python in utf-8 mode first. This page
                  appears to provide info: http://www.amk.ca/python/howto/unicode

                  > Also:
                  >
                  > ===========================
                  > def bar():
                  > u = unichr(40960)
                  > vim.current.buffer.append(u)
                  > ===========================
                  >
                  > Still gives the "TypeError: bad argument type for built-in operation" error.

                  Here "u" is of type "unicode", while the append() function requires a
                  string. Perhaps Python has a function to convert type "unicode" to a
                  string with utf-8 characters?

                  --
                  If you had to identify, in one word, the reason why the
                  human race has not achieved, and never will achieve, its
                  full potential, that word would be "meetings."

                  /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                  /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
                  \\\ download, build and distribute -- http://www.A-A-P.org ///
                  \\\ help me help AIDS victims -- http://ICCF-Holland.org ///
                • Marijn
                  ... You were correct. Thanks a million. ... I added: u.encode( utf-8 ) and it worked flawlessly. After adding the same to my original code, that worked as
                  Message 8 of 8 , Jan 6, 2007
                  • 0 Attachment
                    Bram Moolenaar wrote:
                    > Marijn wrote:
                    >
                    >> But the following script
                    >>
                    >> ===========================
                    >> #!/usr/bin/python
                    >> # -*- coding: utf8 -*-
                    >> print unichr(40960)
                    >> ===========================
                    >>
                    >> In vim: "r !python test.py" gives the same: "'ascii' codec can't
                    >> encode character" error.
                    >
                    > The problem is in Python, this is a Python error message. This has
                    > nothing to do with Vim.
                    >
                    > I guess you somehow have to put Python in utf-8 mode first. This page
                    > appears to provide info: http://www.amk.ca/python/howto/unicode
                    >

                    You were correct. Thanks a million.


                    >> Also:
                    >>
                    >> ===========================
                    >> def bar():
                    >> u = unichr(40960)
                    >> vim.current.buffer.append(u)
                    >> ===========================
                    >>
                    >> Still gives the "TypeError: bad argument type for built-in operation" error.
                    >
                    > Here "u" is of type "unicode", while the append() function requires a
                    > string. Perhaps Python has a function to convert type "unicode" to a
                    > string with utf-8 characters?
                    >

                    I added: "u.encode('utf-8')" and it worked flawlessly. After adding the same to my original code, that worked as well. I guess I didn't understand unicode too well after all... after reading you link everything felt in place. Python tried to pass unicode instead of utf8 strings. Sorry for the troubles, but thanks a lot.


                    Best regards,
                    Marijn
                  Your message has been successfully submitted and would be delivered to recipients shortly.