Loading ...
Sorry, an error occurred while loading the content.

Combining diacritical marks display as separate character

Expand Messages
  • Sven Siegmund
    Hello, I have just installed gVim 7.2 on Windows XP SP3 and have set utf-8 as the default encoding and a good unicode monospace font (DejaVu Sans Mono) as the
    Message 1 of 7 , Mar 12 1:09 AM
    • 0 Attachment
      Hello, I have just installed gVim 7.2 on Windows XP SP3 and have set
      utf-8 as the default encoding and a good unicode monospace font
      (DejaVu Sans Mono) as the guifont.

      gVim 7.2 has problems rendering combining diacritical marks on
      characters for which there is no dedicated unicode codepoint
      containing them with that diacritics. I can imagine why that is.

      When I try to type "n" and then the U+0302 combing circumflex "^" I
      get "n^" displayed instead of "n̂" (n with a circumflex on it). I can
      imagine why this happens: "n" with a combining "^" are technically two
      characters, two unicode codepoints. Its just OpenType features and the
      font renderer of the OS (in Windows it is Uniscribe) which don't let
      them display adjacently but overlap them.

      gVim does not use Uniscribe for rendering the font displayed. It is
      more low-level. It has very rigid rules to display a given number of
      characters/code-points per line and sticks to it. Hence it is forced
      to display "n" with combined "^" as two separate characters.

      But then I wonder how can you use gVim to write scripts where such
      combining of unicode-codepoints or reordering of letters (like in the
      devanagari script) or LRT-RTL changes happen. Is there a solution?

      Thanks for your answers.
      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_multibyte" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---
    • Tony Mechelynck
      ... I don t have any problems with recent gvim versions (currently 7.2.141 but it already worked last week) and GTK2 2.14.4-8.6.2 on openSUSE 11.1. -- Well, of
      Message 2 of 7 , Mar 12 2:53 AM
      • 0 Attachment
        On 12/03/09 09:09, Sven Siegmund wrote:
        > Hello, I have just installed gVim 7.2 on Windows XP SP3 and have set
        > utf-8 as the default encoding and a good unicode monospace font
        > (DejaVu Sans Mono) as the guifont.
        >
        > gVim 7.2 has problems rendering combining diacritical marks on
        > characters for which there is no dedicated unicode codepoint
        > containing them with that diacritics. I can imagine why that is.
        >
        > When I try to type "n" and then the U+0302 combing circumflex "^" I
        > get "n^" displayed instead of "n̂" (n with a circumflex on it). I can
        > imagine why this happens: "n" with a combining "^" are technically two
        > characters, two unicode codepoints. Its just OpenType features and the
        > font renderer of the OS (in Windows it is Uniscribe) which don't let
        > them display adjacently but overlap them.
        >
        > gVim does not use Uniscribe for rendering the font displayed. It is
        > more low-level. It has very rigid rules to display a given number of
        > characters/code-points per line and sticks to it. Hence it is forced
        > to display "n" with combined "^" as two separate characters.
        >
        > But then I wonder how can you use gVim to write scripts where such
        > combining of unicode-codepoints or reordering of letters (like in the
        > devanagari script) or LRT-RTL changes happen. Is there a solution?
        >
        > Thanks for your answers.

        I don't have any problems with recent gvim versions (currently 7.2.141
        but it already worked last week) and GTK2 2.14.4-8.6.2 on openSUSE 11.1.
        -- Well, of course I can't reproduce your case exactly since I'm on
        Linux. I'm currently typing a Russian dictionary with lots of combining
        acute accents (U+0301), which Vim correctly displays over the preceding
        spacing Cyrillic vowel. However IIRC even when I was on W98 with Windows
        6.1 it could display combining characters correctly in Unicode, using a
        "Courier New" font -- that's when I started my frontpage
        http://users.skynet.be/antoine.mechelynck/ where you can see several
        scripts on a single page, one of them vocalized Arabic. Since then,
        Unicode rendering has gone progressively better, not worse, over the years.

        Let me try n + U+0302 ... yep, I get the correct overprint, in my
        default font, which happens to be "Bitstream Vera Sans Mono", very
        similar to DejaVu IIUC.

        Current versions of gvim can display (by default) two combining
        characters on any spacing character, which is usually enough for Arabic,
        even IIUC Coranic Arabic, but not always for fully cantillated Hebrew;
        or (by a nondefault 'maxcombine' setting) up to 6 combining characters
        over a single spacing character, which is usually more than you'd need.
        But (IIUC) only if 'encoding' is set to UTF-8. You can set this even if
        you don't tell Windows to use Unicode everywhere, provided that you set
        it near the top of your vimrc. See
        http://vim.wikia.com/wiki/Working_with_Unicode for details.

        I'm not sure Vim does devanagari.

        It can do Hebrew or Arabic but not with true bidi: what Vim does is give
        you the option of displaying any window in either all RTL or all LTR.
        You can even have the same file in split-windows, one of them LTR (with
        English OK but Arabic or Hebrew wrong) and the other RTL (with Hebrew
        and/or Arabic OK, including Arabic joining forms if 'arabicshape' is on
        which is the default, but English wrong).


        Which exact version and patchlevel of gvim are you using? You might want
        to copy the first handful of lines from the output of ":version" (until
        the line with "Features included (+) or not (-)") -- see ":help :redir"
        about how to capture that kind of output. Also, when you type

        :echo has('multi_byte')

        what answer do you get? If it's zero, you're in trouble.

        Also, what is your _full_ 'guifont' setting? If it ends in cANSI, I
        think you're in trouble -- cDEFAULT is usually better IMHO.


        Best regards,
        Tony.
        --
        "Seven years and six months!" Humpty Dumpty repeated
        thoughtfully. "An uncomfortable sort of age. Now if you'd asked MY
        advice, I'd have said `Leave off at seven' -- but it's too late now."
        "I never ask advice about growing," Alice said indignantly.
        "Too proud?" the other enquired.
        Alice felt even more indignant at this suggestion. "I mean,"
        she said, "that one can't help growing older."
        "ONE can't, perhaps," said Humpty Dumpty; "but TWO can. With
        proper assistance, you might have left off at seven."
        -- Lewis Carroll

        --~--~---------~--~----~------------~-------~--~----~
        You received this message from the "vim_multibyte" maillist.
        For more information, visit http://www.vim.org/maillist.php
        -~----------~----~----~----~------~----~------~--~---
      • Ron Aaron
        On Mar 12, 11:53 am, Tony Mechelynck ... I use it on Windows and Linux, and it works well on both. ... That is, in fact, what I
        Message 3 of 7 , Mar 12 3:56 AM
        • 0 Attachment
          On Mar 12, 11:53 am, Tony Mechelynck <antoine.mechely...@...>
          wrote:
          > I don't have any problems with recent gvim versions (currently 7.2.141
          > but it already worked last week) and GTK2 2.14.4-8.6.2 on openSUSE 11.1.

          I use it on Windows and Linux, and it works well on both.

          > It can do Hebrew or Arabic but not with true bidi: what Vim does is give
          > you the option of displaying any window in either all RTL or all LTR.
          > You can even have the same file in split-windows, one of them LTR (with
          > English OK but Arabic or Hebrew wrong) and the other RTL (with Hebrew
          > and/or Arabic OK, including Arabic joining forms if 'arabicshape' is on
          > which is the default, but English wrong).

          That is, in fact, what I regularly do. I open a bilingual (English
          and Hebrew) file, split the window, and have one be LTR and the other
          RTL. Then I use XeLaTex to produce really nice output :)

          --~--~---------~--~----~------------~-------~--~----~
          You received this message from the "vim_multibyte" maillist.
          For more information, visit http://www.vim.org/maillist.php
          -~----------~----~----~----~------~----~------~--~---
        • Sven Siegmund
          Hello, thanks for the details, On Thu, Mar 12, 2009 at 10:53 AM, Tony Mechelynck ... Yep, two combining marks are enough for me. ... VIM - Vi IMproved 7.2
          Message 4 of 7 , Mar 12 6:51 AM
          • 0 Attachment
            Hello, thanks for the details,

            On Thu, Mar 12, 2009 at 10:53 AM, Tony Mechelynck
            <antoine.mechelynck@...> wrote:
            > Current versions of gvim can display (by default) two combining
            > characters on any spacing character, which is usually enough for Arabic,

            Yep, two combining marks are enough for me.

            > Which exact version and patchlevel of gvim are you using? You might want
            > to copy the first handful of lines from the output of ":version" (until
            > the line with "Features included (+) or not (-)") -- see ":help :redir"
            > about how to capture that kind of output. Also, when you type

            VIM - Vi IMproved 7.2 (2008 Aug 9, compiled Aug 9 2008 18:46:22)
            MS-Windows 32-bit GUI version with OLE support
            Compiled by Bram@KIBAALE
            Big version with GUI.

            >        :echo has('multi_byte')
            1

            > Also, what is your _full_ 'guifont' setting? If it ends in cANSI, I
            > think you're in trouble -- cDEFAULT is usually better IMHO.

            "unicode encoding:
            set enc=utf-8

            "set gui font
            set guifont=DejaVu_Sans_Mono:h11:cDEFAULT

            set nocompatible
            source $VIMRUNTIME/vimrc_example.vim
            ...
            ...
            ...

            I explored the problem further. There is something wrong with gvim
            interpreting deadkeys of the Windows-Keyboard layout. I could not type
            "n" with combined circumflex because I tried to map the combining
            circumflex on a dead key of my windows keyboard layout. When I map the
            combining circumflex to another key it works and it gets displayed
            well in gvim.

            I will explore the problems of remapping the dead keys of the windows
            keyboard layout later. So far I could not google anything about this
            issue in gvim in Windows.

            S.

            --~--~---------~--~----~------------~-------~--~----~
            You received this message from the "vim_multibyte" maillist.
            For more information, visit http://www.vim.org/maillist.php
            -~----------~----~----~----~------~----~------~--~---
          • Kenneth Reid Beesley
            ... I m using MacVim Snapshot 43, with DejaVu Sans Mono, and the handling of Unicode, including the rendering of letters with combining diacritical marks, is
            Message 5 of 7 , Mar 12 9:07 AM
            • 0 Attachment
              On 12 Mar 2009, at 07:51, Sven Siegmund wrote:

              >
              > Hello, thanks for the details,
              >
              > On Thu, Mar 12, 2009 at 10:53 AM, Tony Mechelynck
              > <antoine.mechelynck@...> wrote:
              >> Current versions of gvim can display (by default) two combining
              >> characters on any spacing character, which is usually enough for
              >> Arabic,
              >
              > Yep, two combining marks are enough for me.
              >
              >> Which exact version and patchlevel of gvim are you using? You might
              >> want
              >> to copy the first handful of lines from the output of
              >> ":version" (until
              >> the line with "Features included (+) or not (-)") -- see
              >> ":help :redir"
              >> about how to capture that kind of output. Also, when you type
              >
              > VIM - Vi IMproved 7.2 (2008 Aug 9, compiled Aug 9 2008 18:46:22)
              > MS-Windows 32-bit GUI version with OLE support
              > Compiled by Bram@KIBAALE
              > Big version with GUI.
              >
              >> :echo has('multi_byte')
              > 1
              >
              >> Also, what is your _full_ 'guifont' setting? If it ends in cANSI, I
              >> think you're in trouble -- cDEFAULT is usually better IMHO.
              >
              > "unicode encoding:
              > set enc=utf-8
              >
              > "set gui font
              > set guifont=DejaVu_Sans_Mono:h11:cDEFAULT
              >
              > set nocompatible
              > source $VIMRUNTIME/vimrc_example.vim
              > ...
              > ...
              > ...
              >
              > I explored the problem further. There is something wrong with gvim
              > interpreting deadkeys of the Windows-Keyboard layout. I could not type
              > "n" with combined circumflex because I tried to map the combining
              > circumflex on a dead key of my windows keyboard layout. When I map the
              > combining circumflex to another key it works and it gets displayed
              > well in gvim.
              >
              > I will explore the problems of remapping the dead keys of the windows
              > keyboard layout later. So far I could not google anything about this
              > issue in gvim in Windows.
              >
              > S.
              >
              > >


              I'm using MacVim Snapshot 43, with DejaVu Sans Mono, and the handling
              of Unicode, including the rendering of letters with combining
              diacritical marks, is surprisingly good.

              n+0x0302

              displays perfectly for me, with a circumflex placed nicely above the
              'n'. I sometimes work with orthographies for Native American
              languages, which sometimes require two combining diacritics on the
              same letter, and MacVim again does well. This is one of the (several)
              reasons that I made the painful move from emacs to vim.

              Ken

              ******************************
              Kenneth R. Beesley, D.Phil.
              P.O. Box 540475
              North Salt Lake, UT
              84054 USA






              --~--~---------~--~----~------------~-------~--~----~
              You received this message from the "vim_multibyte" maillist.
              For more information, visit http://www.vim.org/maillist.php
              -~----------~----~----~----~------~----~------~--~---
            • Tony Mechelynck
              ... My pleasure. Beware: I m going to send this email in UTF-8 because of the text I ll be typing into it. ... [...] ... This means 7.2.0. I would recommend
              Message 6 of 7 , Mar 12 12:29 PM
              • 0 Attachment
                On 12/03/09 14:51, Sven Siegmund wrote:
                > Hello, thanks for the details,

                My pleasure.

                Beware: I'm going to send this email in UTF-8 because of the text I'll
                be typing into it.

                >
                > On Thu, Mar 12, 2009 at 10:53 AM, Tony Mechelynck
                > <antoine.mechelynck@...> wrote:
                [...]
                >> Which exact version and patchlevel of gvim are you using? You might want
                >> to copy the first handful of lines from the output of ":version" (until
                >> the line with "Features included (+) or not (-)") -- see ":help :redir"
                >> about how to capture that kind of output. Also, when you type
                > VIM - Vi IMproved 7.2 (2008 Aug 9, compiled Aug 9 2008 18:46:22)
                > MS-Windows 32-bit GUI version with OLE support
                > Compiled by Bram@KIBAALE
                > Big version with GUI.

                This means 7.2.0. I would recommend that you install a more recent
                bugfixed versions, for instance (for Windows) one of Steve Hall's
                distributions at
                https://sourceforge.net/project/showfiles.php?group_id=43866&package_id=39721
                -- click the clipboard-like icon next to a download link to see when
                that build was compiled and what features are included.

                I'm not asying that a more recent build will necessarily cure _this_
                problem, but it is always worth doing, since it might cure _other_
                problems which you might be having. At
                http://ftp.vim.org/pub/vim/patches/7.2/README you can see a text file
                with a one-line description of every bugfix published sofar for Vim 7.2
                -- and whenever a new bugfix gets published, that README file is updated
                at the same time.

                >
                >> :echo has('multi_byte')
                > 1

                Good. Nonzero means "feature is present".

                >
                >> Also, what is your _full_ 'guifont' setting? If it ends in cANSI, I
                >> think you're in trouble -- cDEFAULT is usually better IMHO.
                > "unicode encoding:
                > set enc=utf-8
                >
                > "set gui font
                > set guifont=DejaVu_Sans_Mono:h11:cDEFAULT

                this ought to be all right.

                >
                > set nocompatible
                > source $VIMRUNTIME/vimrc_example.vim
                > ...
                > ...
                > ...
                >
                > I explored the problem further. There is something wrong with gvim
                > interpreting deadkeys of the Windows-Keyboard layout. I could not type
                > "n" with combined circumflex because I tried to map the combining
                > circumflex on a dead key of my windows keyboard layout. When I map the
                > combining circumflex to another key it works and it gets displayed
                > well in gvim.

                Aha! To enter any Unicode codepoint by its Unicode codepoint number in
                Vim, use the method described at |i_CTRL-V_digit|. Or if you frequently
                use some particular codepoints, you might want to use a keymap -- either
                a preexisting one if you find one that suits you, or else you can build
                your own: it isn't very hard once you get the hang of it. The
                "accents.vim" and "esperanto.vim" keymaps (in $VIMRUNTIME/keymap/) are
                small examples showing how keymaps are built. The relevant help is at
                |keymap-file-format|.

                -- Note that if you build your own keymap it should NOT go into
                $VIMRUNTIME/keymap/ (where any upgrade may silently destroy it) but into
                either $VIM/vimfiles/keymap/ (if you want to be able to access it from
                any Windows login name) or $HOME/vimfiles/keymap/ (to restrict it to one
                login name, since every "user" has a different $HOME directory). Create
                the needed directory, and maybe its parent too, if they don't yet exist.

                Of course Vim must see the keypress in order to act on it, and I suspect
                that Windows dead keas are retained by Windows (and not given to Vim)
                until you press something else (with which Windows, not Vim, will
                combine the "dead key"). And since "Unicode combining characters" must
                go _after_ the spacing character to which they apply, they are not
                really "dead keys" in the usual typewriter meaning of the expression: on
                my Belgian keyboard I hit "dead-circumflex" followed by c to get the
                _precombined_ Esperanto consonant ĉ (U+0109 LATIN SMALL LETTER C WITH
                CIRCUMFLEX) but in Vim I type c first and ^Vu0302 afterwards to get the
                _composite_ codepoints ĉ [i.e. c (U+0063 LATIN SMALL LETTER C) followed
                by "dead-circumflex" (U+0302 COMBINING CIRCUMFLEX ACCENT)] which
                SeaMonkey 2.0b1pre erroneously does not overprint in the mail
                composition window -- I don't know about your mailer.

                >
                > I will explore the problems of remapping the dead keys of the windows
                > keyboard layout later. So far I could not google anything about this
                > issue in gvim in Windows.
                >
                > S.

                As far as I know, everything, but _everything_ about Vim behaviour is
                in the help. (Obviously, the fine points of _Windows_ behaviour are not
                in the _Vim_ help.) To find your precious needle (any needle) in the Vim
                help^H^H^H^Hhaystack (which is admittedly a huge one), use the following
                starting points (magnets, if you will ;-) since sewing needles are
                usually made of steel):

                :help
                :help :help
                :help {subject}
                where {subject} means exactly open-brace, small-ess,
                small-you, small-bee, small-jay, small-eeh, small-cee,
                small-tee, close-brace. No fancy replacing (yet).
                :help :helpgrep

                which will explain progressively more complex methods of finding your
                way about the help.



                Best regards,
                Tony.
                --
                Mustgo, n.:
                Any item of food that has been sitting in the refrigerator so
                long it has become a science project.
                -- Sniglets, "Rich Hall & Friends"

                --~--~---------~--~----~------------~-------~--~----~
                You received this message from the "vim_multibyte" maillist.
                For more information, visit http://www.vim.org/maillist.php
                -~----------~----~----~----~------~----~------~--~---
              • Tony Mechelynck
                ... What I use to produce real nice true-bidi output is my browser -- SeaMonkey 2.0b1pre, but Firefox 3 (3.0 or 3.1 I m not sure) uses identically the same
                Message 7 of 7 , Mar 12 5:06 PM
                • 0 Attachment
                  On 12/03/09 11:56, Ron Aaron wrote:
                  > On Mar 12, 11:53 am, Tony Mechelynck<antoine.mechely...@...>
                  > wrote:
                  >> I don't have any problems with recent gvim versions (currently 7.2.141
                  >> but it already worked last week) and GTK2 2.14.4-8.6.2 on openSUSE 11.1.
                  > I use it on Windows and Linux, and it works well on both.
                  >
                  >> It can do Hebrew or Arabic but not with true bidi: what Vim does is give
                  >> you the option of displaying any window in either all RTL or all LTR.
                  >> You can even have the same file in split-windows, one of them LTR (with
                  >> English OK but Arabic or Hebrew wrong) and the other RTL (with Hebrew
                  >> and/or Arabic OK, including Arabic joining forms if 'arabicshape' is on
                  >> which is the default, but English wrong).
                  > That is, in fact, what I regularly do. I open a bilingual (English
                  > and Hebrew) file, split the window, and have one be LTR and the other
                  > RTL. Then I use XeLaTex to produce really nice output :)

                  What I use to produce real nice true-bidi output is my browser --
                  SeaMonkey 2.0b1pre, but Firefox 3 (3.0 or 3.1 I'm not sure) uses
                  identically the same rendering engine, and any "good" browser ought to
                  do well, which is not to say all of them indeed do, for the kind of
                  files which I use, namely HTML and plain text.


                  Best regards,
                  Tony.
                  --
                  There was a plane crash over mid-ocean, and only three survivors were
                  left in the life-raft: the Pope, the President, and Mayor Daley.
                  Unfortunately, it was a one-man life-raft, and quickly sinking, so they
                  started debating who should be allowed to stay.

                  The Pope pointed out that he was the spiritual leader of millions all
                  over the world, the President explained that if he died then America
                  would be stuck with the Vice-President, and so forth. Then Mayor Daley
                  said, "Look! We're not solving anything like this! The only fair
                  thing to do is to vote on it." So they did, and Mayor Daley won by 97
                  votes.

                  --~--~---------~--~----~------------~-------~--~----~
                  You received this message from the "vim_multibyte" maillist.
                  For more information, visit http://www.vim.org/maillist.php
                  -~----------~----~----~----~------~----~------~--~---
                Your message has been successfully submitted and would be delivered to recipients shortly.