Loading ...
Sorry, an error occurred while loading the content.

Re: printf() vs. unicode (multi-byte encodings)

Expand Messages
  • ZyX
    ... It is clearly stated in documentation that printf() operates with bytes. -- You received this message from the vim_dev maillist. Do not top-post! Type
    Message 1 of 9 , Nov 11, 2012
    • 0 Attachment
      > Is this intended? Is there an (easy) way to make printf() respect multi-byte encodings?
      It is clearly stated in documentation that printf() operates with bytes.

      --
      You received this message from the "vim_dev" maillist.
      Do not top-post! Type your reply below the text you are replying to.
      For more information, visit http://www.vim.org/maillist.php
    • lith
      ... My question rather was am I missing something since I personally find this behaviour rather useless in the context of a text editor. A formatted string
      Message 2 of 9 , Nov 11, 2012
      • 0 Attachment
        Am Sonntag, 11. November 2012 10:03:49 UTC+1 schrieb ZyX:
        > > Is this intended? Is there an (easy) way to make printf() respect multi-byte encodings?
        > It is clearly stated in documentation that printf() operates with bytes.

        My question rather was am I missing something since I personally find this behaviour rather useless in the context of a text editor.

        A formatted string usually is something that should eventually be displayed in a vim buffer. When I want to get a string padded with whitespace, I'm almost always rather interested in its display width than in its size in bytes -- and I can hardly imagine a use case where it would be otherwise.

        I can work around this problem by adjusting the width dependent on the difference between len(s) and strwidth(s) but I personally find this unnecessarily complicated.

        --
        You received this message from the "vim_dev" maillist.
        Do not top-post! Type your reply below the text you are replying to.
        For more information, visit http://www.vim.org/maillist.php
      • Christian Brabandt
        Hi lith! ... I guess, we could use the S type for printf() to specify length is given in char (see patch). regards, Christian -- -- You received this message
        Message 3 of 9 , Nov 11, 2012
        • 0 Attachment
          Hi lith!

          On So, 11 Nov 2012, lith wrote:

          > Am Sonntag, 11. November 2012 10:03:49 UTC+1 schrieb ZyX:
          > > > Is this intended? Is there an (easy) way to make printf() respect
          > > > multi-byte encodings?
          > > It is clearly stated in documentation that printf() operates with
          > > bytes.
          >
          > My question rather was am I missing something since I personally find
          > this behaviour rather useless in the context of a text editor.
          >
          > A formatted string usually is something that should eventually be
          > displayed in a vim buffer. When I want to get a string padded with
          > whitespace, I'm almost always rather interested in its display width
          > than in its size in bytes -- and I can hardly imagine a use case where
          > it would be otherwise.
          >
          > I can work around this problem by adjusting the width dependent on the
          > difference between len(s) and strwidth(s) but I personally find this
          > unnecessarily complicated.

          I guess, we could use the 'S' type for printf() to specify length is
          given in char (see patch).

          regards,
          Christian
          --

          --
          You received this message from the "vim_dev" maillist.
          Do not top-post! Type your reply below the text you are replying to.
          For more information, visit http://www.vim.org/maillist.php
        • ZyX
          ... I don t say this is very useful, I am as well constantly finding myself writing code that uses strdisplaywidth for this job (with an emulation function for
          Message 4 of 9 , Nov 11, 2012
          • 0 Attachment
            воскресенье, 11 ноября 2012 г., 15:40:32 UTC+4 пользователь lith написал:
            > Am Sonntag, 11. November 2012 10:03:49 UTC+1 schrieb ZyX:
            > > > Is this intended? Is there an (easy) way to make printf() respect multi-byte encodings?
            > > It is clearly stated in documentation that printf() operates with bytes.
            >
            > My question rather was am I missing something since I personally find this behaviour rather useless in the context of a text editor.
            >
            > A formatted string usually is something that should eventually be displayed in a vim buffer. When I want to get a string padded with whitespace, I'm almost always rather interested in its display width than in its size in bytes -- and I can hardly imagine a use case where it would be otherwise.
            >
            > I can work around this problem by adjusting the width dependent on the difference between len(s) and strwidth(s) but I personally find this unnecessarily complicated.

            I don't say this is very useful, I am as well constantly finding myself writing code that uses strdisplaywidth for this job (with an emulation function for older vims) in case I think there may be any unicode characters (sometimes I know there will be no, like when displaying progress bar with "[==> ]"). It is just answer on the first question: yes, this was intended. It is how POSIX printf works, it is expected by any C programmer and it is here for backwards compatibility and historical reasons.

            --
            You received this message from the "vim_dev" maillist.
            Do not top-post! Type your reply below the text you are replying to.
            For more information, visit http://www.vim.org/maillist.php
          • ZyX
            ... I actually never need characters in this case. What is *really* needed is called “display cell”. I see that your patch uses right this, but
            Message 5 of 9 , Nov 11, 2012
            • 0 Attachment
              > I guess, we could use the 'S' type for printf() to specify length is
              > given in char (see patch).

              I actually never need characters in this case. What is *really* needed is called “display cell”. I see that your patch uses right this, but documentation states it uses characters.

              Second is that you should probably fall back to %s in case vim is compiled without +mbyte, not make %S an error (well, not %S itself, but number of arguments).

              And tests. Though this is easy.

              --
              You received this message from the "vim_dev" maillist.
              Do not top-post! Type your reply below the text you are replying to.
              For more information, visit http://www.vim.org/maillist.php
            • Christian Brabandt
              Hi ZyX! ... Call it display cell, I don t care. I think the user would simply call it characters. ... I don t mind. But in the failing case, it is right
              Message 6 of 9 , Nov 11, 2012
              • 0 Attachment
                Hi ZyX!

                On So, 11 Nov 2012, ZyX wrote:

                > > I guess, we could use the 'S' type for printf() to specify length is
                > > given in char (see patch).
                >
                > I actually never need characters in this case. What is *really* needed
                > is called “display cell”. I see that your patch uses right this, but
                > documentation states it uses characters.

                Call it display cell, I don't care. I think the user would simply call
                it characters.

                > Second is that you should probably fall back to %s in case vim is
                > compiled without +mbyte, not make %S an error (well, not %S itself,
                > but number of arguments).

                I don't mind. But in the failing case, it is right obvious, why it
                doesn't work, if you use a vim without multibyte feature, in the other
                case it isn't.

                > And tests. Though this is easy.

                Sure. Go ahead ;)

                regards,
                Christian
                --

                --
                You received this message from the "vim_dev" maillist.
                Do not top-post! Type your reply below the text you are replying to.
                For more information, visit http://www.vim.org/maillist.php
              • ZyX
                ... No if he is working with fullwidth ones. mb_string2cells outputs displays cells, and documentation uses both terms for different things. Characters here
                Message 7 of 9 , Nov 12, 2012
                • 0 Attachment
                  > Call it display cell, I don't care. I think the user would simply call
                  > it characters.
                  No if he is working with fullwidth ones. mb_string2cells outputs displays cells, and documentation uses both terms for different things. Characters here mean “unicode codepoint” (ref: :h strchars()).

                  > I don't mind. But in the failing case, it is right obvious, why it
                  > doesn't work, if you use a vim without multibyte feature, in the other
                  > case it isn't.
                  If I was writing a vim plugin I would be already much fond of having to write

                  if (v:version==703 && !has('patch713') || v:version<703) || !has('multibyte')
                  " Use strdisplaywidth workaround. Respects versions without it because
                  " I have its emulation using another :if, similar to what is now done
                  " with shiftwidth()
                  else
                  " Use printf('%S')
                  endif
                  . When 7.3.713 will go to history all you should do is to purge first branch and new plugin authors just should not care, in the current state neither can be done due to has() condition.

                  --
                  You received this message from the "vim_dev" maillist.
                  Do not top-post! Type your reply below the text you are replying to.
                  For more information, visit http://www.vim.org/maillist.php
                • Bram Moolenaar
                  ... Thanks. I think it s good that without the +multi_byte feature S works just like s , that s easier for script writers. I also fixed a signed/unsigned
                  Message 8 of 9 , Nov 14, 2012
                  • 0 Attachment
                    Christian Brabandt wrote:

                    > Hi lith!
                    >
                    > On So, 11 Nov 2012, lith wrote:
                    >
                    > > Am Sonntag, 11. November 2012 10:03:49 UTC+1 schrieb ZyX:
                    > > > > Is this intended? Is there an (easy) way to make printf() respect
                    > > > > multi-byte encodings?
                    > > > It is clearly stated in documentation that printf() operates with
                    > > > bytes.
                    > >
                    > > My question rather was am I missing something since I personally find
                    > > this behaviour rather useless in the context of a text editor.
                    > >
                    > > A formatted string usually is something that should eventually be
                    > > displayed in a vim buffer. When I want to get a string padded with
                    > > whitespace, I'm almost always rather interested in its display width
                    > > than in its size in bytes -- and I can hardly imagine a use case where
                    > > it would be otherwise.
                    > >
                    > > I can work around this problem by adjusting the width dependent on the
                    > > difference between len(s) and strwidth(s) but I personally find this
                    > > unnecessarily complicated.
                    >
                    > I guess, we could use the 'S' type for printf() to specify length is
                    > given in char (see patch).

                    Thanks. I think it's good that without the +multi_byte feature 'S'
                    works just like 's', that's easier for script writers.
                    I also fixed a signed/unsigned comparison that my compiler complained
                    about.

                    --
                    How To Keep A Healthy Level Of Insanity:
                    18. When leaving the zoo, start running towards the parking lot,
                    yelling "run for your lives, they're loose!!"

                    /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
                    /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
                    \\\ an exciting new programming language -- http://www.Zimbu.org ///
                    \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

                    --
                    You received this message from the "vim_dev" maillist.
                    Do not top-post! Type your reply below the text you are replying to.
                    For more information, visit http://www.vim.org/maillist.php
                  Your message has been successfully submitted and would be delivered to recipients shortly.