Loading ...
Sorry, an error occurred while loading the content.

Real displayed width of a character

Expand Messages
  • Jehan Pagès
    Hi all, I have a question about displayed width (and not encoding length!) of a character. How does vim decide the width of a character, in term of number
    Message 1 of 4 , Oct 24, 2008
    • 0 Attachment
      Hi all,

      I have a question about "displayed width" (and not encoding length!) of a character. How does vim "decide" the width of a character, in term of number of columns? Does it use some function like "wcwidth" (POSIX function)? Some home-made similar function?

      The reason I ask this is that some characters sometimes would be single or double column depending on the used font. Moreover Unicode, as far as I could read, does not explicitely give a prefered size for characters, in the exception of some characters (mostly East-Asian), which are in dedicated Unicode planes (full-width and half-width characters). This is explained in this Technical Report for instance (the only paper from the Unicode Consortium I found which was dealing about character width as the main topic,elsewhere I could only find allusions, or small notes, as though it was implicit)
      http://unicode.org/reports/tr11/

      An extract from this:
      "
      Except for a few characters, which are explicitly called out as fullwidth or halfwidth in the Unicode Standard, characters are not duplicated based on distinction in width. Some characters, such as the ideographs, are always wide; others are always narrow; and some can be narrow or wide, depending on the context. The Unicode character property East_Asian_Width provides a default classification of characters, which an implementation can use to decide at runtime whether to treat a character as narrow or wide.
      "

      Even though it is focused on East-Asian characters, I could find some other characters which have very different sizes in different fonts. For instance I found a few fonts with '@' being double size compared to "typical" western characters (A-Z 0-9, etc.). Also this true for the European money character (euro: €), or even the Latin characters œ or æ (used in French among other places). I would even say that this seems logical as these characters are formed by including 2 characters in each other... So being double size seems normal to me, isn't it?
      Unfortunately a function like wcwidth considers it must be "one column wide", and apparently the function used by vim too (being the same or another). Then I must find a font which has these characters but the same width than the rest (so mono or close). If I don't, the characters are "cut" by vim.

      Would you have an idea about this? Couldn't vim be improved in such a way it would consider the font really used? This seems complicated as the font is defined in the Terminal Emulator, not in vim itself. And I could not find yet if there is some possible to advertise the used font in any terminal protocol (VT100 or else). But then what if there was an option in vim where the user could explicitely tell "I am using this font". So that when vim displays characters and then ask the terminal to "jump" to this or that column, it can calculate the right place to go, without cutting text?
      Thanks.

      Jehan

      --~--~---------~--~----~------------~-------~--~----~
      You received this message from the "vim_multibyte" maillist.
      For more information, visit http://www.vim.org/maillist.php
      -~----------~----~----~----~------~----~------~--~---

    • Tony Mechelynck
      ... Fullwidth characters always occupy two screen columns. Sometimes an empty column can be added in the last screen column if a fullwidth character would
      Message 2 of 4 , Oct 24, 2008
      • 0 Attachment
        On 24/10/08 16:22, Jehan Pagès wrote:
        > Hi all,
        >
        > I have a question about "displayed width" (and not encoding length!) of
        > a character. How does vim "decide" the width of a character, in term of
        > number of columns? Does it use some function like "wcwidth" (POSIX
        > function)? Some home-made similar function?
        >
        > The reason I ask this is that some characters sometimes would be single
        > or double column depending on the used font. Moreover Unicode, as far as
        > I could read, does not explicitely give a prefered size for characters,
        > in the exception of some characters (mostly East-Asian), which are in
        > dedicated Unicode planes (full-width and half-width characters). This is
        > explained in this Technical Report for instance (the only paper from the
        > Unicode Consortium I found which was dealing about character width as
        > the main topic,elsewhere I could only find allusions, or small notes, as
        > though it was implicit)
        > http://unicode.org/reports/tr11/
        >
        > An extract from this:
        > "
        > Except for a few characters, which are explicitly called out as
        > fullwidth or halfwidth in the Unicode Standard, characters are not
        > duplicated based on distinction in width. Some characters, such as the
        > ideographs, are always wide; others are always narrow; and some can be
        > narrow or wide, depending on the context. The Unicode character property
        > East_Asian_Width provides a default classification of characters, which
        > an implementation can use to decide at runtime whether to treat a
        > character as narrow or wide.
        > "
        >
        > Even though it is focused on East-Asian characters, I could find some
        > other characters which have very different sizes in different fonts. For
        > instance I found a few fonts with '@' being double size compared to
        > "typical" western characters (A-Z 0-9, etc.). Also this true for the
        > European money character (euro: €), or even the Latin characters /œ /or
        > æ (used in French among other places). I would even say that this seems
        > logical as these characters are formed by including 2 characters in each
        > other... So being double size seems normal to me, isn't it?
        > Unfortunately a function like wcwidth considers it must be "one column
        > wide", and apparently the function used by vim too (being the same or
        > another). Then I must find a font which has these characters but the
        > same width than the rest (so mono or close). If I don't, the characters
        > are "cut" by vim.
        >
        > Would you have an idea about this? Couldn't vim be improved in such a
        > way it would consider the font really used? This seems complicated as
        > the font is defined in the Terminal Emulator, not in vim itself. And I
        > could not find yet if there is some possible to advertise the used font
        > in any terminal protocol (VT100 or else). But then what if there was an
        > option in vim where the user could explicitely tell "I am using this
        > font". So that when vim displays characters and then ask the terminal to
        > "jump" to this or that column, it can calculate the right place to go,
        > without cutting text?
        > Thanks.
        >
        > Jehan

        Fullwidth characters always occupy two screen columns. Sometimes an
        empty column can be added in the last screen column if a fullwidth
        character would otherwise start in it.

        Halfwidth characters always occupy one screen column, except the hard
        tab (U+0009 HORIZONTAL TAB) which occupies one or more columns depending
        on 'tabstop' 'list' and 'listchars'. Strictly speaking, the tab is a
        "control character" anyway.

        Ambiguous-width characters are treated as fullwidth or halfwidth
        depending on the setting of the global 'ambiwidth' option.

        See:
        :help 'ambiwidth'
        :help 'tabstop'
        :help 'list'
        :help 'listchars'


        Note also that proportional fonts (fonts where m is much wider than i or
        l, not to mention Arabic final sad vs. isolated alif) are ugly in GTK2
        versions of gvim and cannot be used in any other versions, or in Console
        Vim.


        Best regards,
        Tony.
        --
        Although we modern persons tend to take our electric lights, radios,
        mixers, etc., for granted, hundreds of years ago people did not have
        any of these things, which is just as well because there was no place
        to plug them in. Then along came the first Electrical Pioneer,
        Benjamin Franklin, who flew a kite in a lighting storm and received a
        serious electrical shock. This proved that lighting was powered by the
        same force as carpets, but it also damaged Franklin's brain so severely
        that he started speaking only in incomprehensible maxims, such as "A
        penny saved is a penny earned." Eventually he had to be given a job
        running the post office.
        -- Dave Barry, "What is Electricity?"

        --~--~---------~--~----~------------~-------~--~----~
        You received this message from the "vim_multibyte" maillist.
        For more information, visit http://www.vim.org/maillist.php
        -~----------~----~----~----~------~----~------~--~---
      • Mansing
        Wow! For ages, I knew not to ask this question. Now with ... my Chinese /open/ quotation mark ( “ code=0x201c ) is displayed correctly --without colliding
        Message 3 of 4 , Oct 24, 2008
        • 0 Attachment
          Wow!  For ages, I knew not to ask this question.  Now with
          :set ambiwidth=double
          my Chinese /open/ quotation mark ( “ code=0x201c ) is displayed correctly --without colliding with the next character.  Strange that, the /close/ quotation mark ( ” code=0x201d ) has always been displayed well regardless of the ambiwidth setting?!

          mt 081025


          Tony Mechelynck wrote:
          On 24/10/08 16:22, Jehan Pagès wrote:
            
          Hi all,
          
          I have a question about "displayed width" (and not encoding length!) of
          a character. How does vim "decide" the width of a character, in term of
          number of columns? . . .
          
          Jehan
              
          . . .
          
          Ambiguous-width characters are treated as fullwidth or halfwidth 
          depending on the setting of the global 'ambiwidth' option.
          
          . . .
          Tony.
            

          --~--~---------~--~----~------------~-------~--~----~
          You received this message from the "vim_multibyte" maillist.
          For more information, visit http://www.vim.org/maillist.php
          -~----------~----~----~----~------~----~------~--~---

        • Tony Mechelynck
          ... Hm. Here these characters are displayed with the same (narrow) glyph as a plain double quote in Bitstream Vera Sans Mono, but with FZFangSong U+201C is a
          Message 4 of 4 , Oct 24, 2008
          • 0 Attachment
            On 25/10/08 01:40, Mansing wrote:
            > Wow! For ages, I knew not to ask this question. Now with
            >
            > :set ambiwidth=double
            >
            > my Chinese /open/ quotation mark ( “ code=0x201c ) is displayed
            > correctly --without colliding with the next character. Strange that,
            > the /close/ quotation mark ( ” code=0x201d ) has always been displayed
            > well regardless of the ambiwidth setting?!
            >
            > mt 081025

            Hm. Here these characters are displayed with the same (narrow) glyph as
            a plain double quote in Bitstream Vera Sans Mono, but with FZFangSong
            U+201C is a 66 quote occupying the right half of its wide glyph while
            U+201D is a 99 quote in the left half of _its_ wide glyph (well, maybe I
            should say right-top and left-top quarters), so that with
            ambiwidth=single U+201C is overprinted on the next character while it's
            only the blank right half of U+201D which is overprinted on _its_ follower.

            Best regards,
            Tony.
            --
            Meader's Law:
            Whatever happens to you, it will previously have happened to
            everyone you know, only more so.

            --~--~---------~--~----~------------~-------~--~----~
            You received this message from the "vim_multibyte" maillist.
            For more information, visit http://www.vim.org/maillist.php
            -~----------~----~----~----~------~----~------~--~---
          Your message has been successfully submitted and would be delivered to recipients shortly.