Loading ...
Sorry, an error occurred while loading the content.

Re: multibyte in patterns

Expand Messages
  • Bram Moolenaar
    ... Thanks for testing. It s a matter of taste whether foo =~ bar should result in TRUE of FALSE. Let s just leave it as it is until someone has a good
    Message 1 of 9 , Jan 1, 2003
    • 0 Attachment
      Benji Fisher wrote:

      > so it looks pretty good to me. The second =~ test is a little strange,
      > but should probably work this way for backward compatibility.

      Thanks for testing. It's a matter of taste whether foo =~ bar should
      result in TRUE of FALSE. Let's just leave it as it is until someone has
      a good reason why it should be different.

      > On the question of changing "\x" or adding "\u":
      > * Since vim is a *text* editor, I am not convinced that it should be
      > able to enter invalid bytes into my document. (I admit that
      > :put=\"xe4\" does not count as entering a character *easily*.) Perhaps
      > it would be better to make "\x" act like the new "\u" after all.

      There are always exceptions, e.g. when 'encoding' is not properly set or
      when intentionally creating illegal bytes. I don't think we have a good
      reason to forbid inserting any byte value.

      > * By habit and because of legacy scripts, people will continue to use
      > "\x". I assume that the new "\u" will be recommended for most purposes
      > (and the docs will mention this). It will take a while for people to
      > adjust. Again, this argues for using "\x" to insert valid bytes, and
      > adding a new construct for arbitrary bytes.

      Existing scripts that use "\xab" to insert valid UTF-8 bytes should keep
      on working, that's another reason why changing the meaning of "\xab" is
      a bad idea.

      > Final question: I want my script to be able to insert "«" without
      > forcing users to adopt the latest patched vim. (I am thinking of the
      > LaTeX suite.) Instead of
      >
      > :let foo = "\uab"
      >
      > with this patch, should
      >
      > :let foo = iconv("\xab", "latin1", &enc)
      >
      > have the same effect? It seems to work, as far as I can tell.

      If iconv() is supported it should work. So long as 'encoding' does
      support a character to represent the latin1 "\xab" character (not all
      8-bit encodings have it).

      --
      hundred-and-one symptoms of being an internet addict:
      269. You receive an e-mail from the wife of a deceased president, offering
      to send you twenty million dollar, and you are not even surprised.

      /// Bram Moolenaar -- Bram@... -- http://www.moolenaar.net \\\
      /// Creator of Vim - Vi IMproved -- http://www.vim.org \\\
      \\\ Project leader for A-A-P -- http://www.a-a-p.org ///
      \\\ Lord Of The Rings helps Uganda - http://iccf-holland.org/lotr.html ///
    • Benji Fisher
      ... No, I have only tried it with utf-8 and latin1. What other encodings should I try? ... Thanks! --Benji Fisher
      Message 2 of 9 , Jan 1, 2003
      • 0 Attachment
        Antoine J. Mechelynck wrote:
        > Benji Fisher <benji@...> wrote:
        >> Final question: I want my script to be able to insert "«" without
        >>forcing users to adopt the latest patched vim. (I am thinking of the
        >>LaTeX suite.) Instead of
        >>
        >>
        >>>let foo = "\uab"
        >>
        >>with this patch, should
        >>
        >>
        >>>let foo = iconv("\xab", "latin1", &enc)
        >>
        >>have the same effect? It seems to work, as far as I can tell.
        >
        >
        > have you tried it with encodings for which there is no equivalent for that
        > latin-1 character? (Iconv fails: what happens then?)

        No, I have only tried it with utf-8 and latin1. What other
        encodings should I try?

        > Best wishes -- and a happy New Year
        > Tony.

        Thanks!

        --Benji Fisher
      • Antoine J. Mechelynck
        ... [...] ... As many as possible, of course; but this is not really an answer. Maybe you could start, if you have them, with Central-European and Turkish
        Message 3 of 9 , Jan 1, 2003
        • 0 Attachment
          Benji Fisher <benji@...> wrote:
          > Antoine J. Mechelynck wrote:
          [...]
          > > have you tried it with encodings for which there is no equivalent for
          > > that latin-1 character? (Iconv fails: what happens then?)
          >
          > No, I have only tried it with utf-8 and latin1. What other
          > encodings should I try?

          As many as possible, of course; but this is not really an answer. Maybe you
          could start, if you have them, with Central-European and Turkish encodings,
          then if it works OK, with more esoteric ones like Greek, Cyrillic, Big5,
          sjis, euc-kr,... and wouldn't digraphs << and >> need to be switched around
          for right-to-left languages like Hebrew, Farsi and Arabic? -- As you see,
          I'm thinking of what the plugin would need to be as general as possible, for
          as many users as possible. Also, as could be inferred from Bram's post of a
          few minutes ago, mybe there ought to be a fallback if iconv() fails for any
          reason, and in particular for if ! has("iconv")...

          Tony.

          >
          > > Best wishes -- and a happy New Year
          > > Tony.
          >
          > Thanks!
          >
          > --Benji Fisher
        Your message has been successfully submitted and would be delivered to recipients shortly.