Loading ...
Sorry, an error occurred while loading the content.

Re: BUG: Unicode characters in commands

Expand Messages
  • Bram Moolenaar
    ... I ll add it to the todo list. Don t expect a solution soon... -- hundred-and-one symptoms of being an internet addict: 116. You are living with your
    Message 1 of 2 , Sep 29 1:47 PM
    • 0 Attachment
      Matt Wozniski wrote:

      > On Sun, Sep 28, 2008 at 4:35 PM, Tony Mechelynck wrote:
      > >
      > >> On Sun, Sep 28, 2008 at 9:40 AM, John Hughes wrote:
      > >>> I am trying to write a command that substitutes some Ascii characters
      > >>> with a Unicode character. The following substitution works when
      > >>> entered directly:
      > >>>
      > >>> :%s/\.\.\./…/eg
      > >>>
      > >>> However, when defined as a command, it does not work:
      > >>>
      > >>> :com Ellipsis %s/\.\.\./…/eg
      > >>>
      > >>> The command :Ellipsis converts
      > >>>
      > >>> ...
      > >>>
      > >>> into
      > >>>
      > >>> â<80><fe>X¦
      > >>>
      > >>> Why is this? Is there any way of using Unicode characters in
      > >>> substitute commands?
      > >
      > > I'm using gvim 7.2.21, huge build with Gnome2 GUI and 'encoding' set to
      > > UTF-8. Just like the OP, I see the following:
      > >
      > > - Typing the :s command at the command-line works OK.
      > > - Defining that :s command as a user-command text, then running that
      > > user command, replaces every set of three dots by â<80><fe>X¦ (5
      > > characters including two invalid UTF-8 sequences, 7 bytes viz. C3 A2 80
      > > FE 58 C2 A6).
      > > - Recalling that command definition with ":command Ellipsis" displays
      > > the ellipsis character as an ellipsis.
      > > - The ellipsis is U+2026, in UTF-8 0xE2 0x80 0xA6. Notice that 80 and A6
      > > appear (though not consecutively) as part of the replace-text actually
      > > used, and that E2 is C3 A2 which also appears. This makes me suspect
      > > that Vim is applying a spurious Latin1-to-UTF8 conversion to what is
      > > already UTF-8 (with something wrong, maybe buffer-overflow, happening in
      > > the middle). Another possibility would be using a "character length"
      > > instead of a "byte length", or vice-versa, at some point in the
      > > user-command execution.
      >
      > I can confirm this. It looks to me like it's not a spurious
      > Latin1-UTF8 conversion, but an internally-escaped string that's not
      > un-escaped before being used. Sourcediving, it seems that
      > mb_unescape() is called to escape any multibyte characters when
      > displaying the command, but that mb_unescape() is never called before
      > the command is passed to do_cmdline() to be executed. That seems to
      > explain why it's displayed properly but executed incorrectly. I don't
      > completely follow all of the string escaping being done here, though,
      > so Bram knows for sure. I've cross-posted to the vim-dev list
      > accordingly.

      I'll add it to the todo list. Don't expect a solution soon...

      --
      hundred-and-one symptoms of being an internet addict:
      116. You are living with your boyfriend who networks your respective
      computers so you can sit in separate rooms and email each other

      /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
      /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
      \\\ download, build and distribute -- http://www.A-A-P.org ///
      \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_dev" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---
    Your message has been successfully submitted and would be delivered to recipients shortly.