Loading ...
Sorry, an error occurred while loading the content.

Printing 8-bit encodings

Expand Messages
  • Valery Kondakoff
    Hello, vim-developers! ... I m not sure this patch was supposed to fix this, but it is still impossible to print Russian text when encoding is set to
    Message 1 of 9 , May 7, 2003
    • 0 Attachment
      Hello, vim-developers!

      > Added support for PostScript printing of various 8-bit encodings. (Mike
      > Williams)

      I'm not sure this patch was supposed to fix this, but it is still
      impossible to print Russian text when 'encoding' is set to 'utf-8'.
      When I set 'encoding' to 'cp1251' the Russian text is printed
      correctly.

      (Vim 6.2c, WinXP)

      --
      Best regards,
      Valery Kondakoff
      http://www.nbk.orc.ru (Ne Bey Kopytom)
      http://www.nbk.orc.ru/mtb (MTB riding in Moscow)

      PGP key: mailto:pgp-public-keys@...?subject=GET%20strauss@...

      np: Isan - You Can Use Bamboo As A Ruler (Lucky Cat) [stopped]
    • Glenn Maynard
      ... UTF-8 isn t an 8-bit (SBCS) encoding, it s a multibyte encoding (MBCS). (I wonder a little about why UTF-8 doesn t work, but as I never print I don t have
      Message 2 of 9 , May 7, 2003
      • 0 Attachment
        On Wed, May 07, 2003 at 10:06:27PM +0400, Valery Kondakoff wrote:
        > > Added support for PostScript printing of various 8-bit encodings. (Mike
        > > Williams)
        >
        > I'm not sure this patch was supposed to fix this, but it is still
        > impossible to print Russian text when 'encoding' is set to 'utf-8'.
        > When I set 'encoding' to 'cp1251' the Russian text is printed
        > correctly.

        UTF-8 isn't an 8-bit (SBCS) encoding, it's a multibyte encoding (MBCS).
        (I wonder a little about why UTF-8 doesn't work, but as I never print I
        don't have much interest in fixing it myself ...)

        --
        Glenn Maynard
      • Tony Mechelynck
        ... UTF-8 is in fact a variable encoding. Vim treats it as multibyte; in fact, characters 0-127 are printed as one byte in 7-bit ASCII, the others require
        Message 3 of 9 , May 7, 2003
        • 0 Attachment
          Glenn Maynard <glenn@...> wrote:
          > On Wed, May 07, 2003 at 10:06:27PM +0400, Valery Kondakoff wrote:
          > > > Added support for PostScript printing of various 8-bit encodings.
          > > > (Mike Williams)
          > >
          > > I'm not sure this patch was supposed to fix this, but it is still
          > > impossible to print Russian text when 'encoding' is set to 'utf-8'.
          > > When I set 'encoding' to 'cp1251' the Russian text is printed
          > > correctly.
          >
          > UTF-8 isn't an 8-bit (SBCS) encoding, it's a multibyte encoding
          > (MBCS). (I wonder a little about why UTF-8 doesn't work, but as I
          > never print I don't have much interest in fixing it myself ...)
          >
          > --
          > Glenn Maynard

          UTF-8 is in fact a variable encoding. Vim treats it as multibyte; in fact,
          characters 0-127 are printed as one byte in 7-bit ASCII, the others require
          between 2 and (in theory) 6 bytes. In addition, it provides for combining
          (i.e., roughly, "overprinting") characters.

          Valery: I see there is no 'printencoding' option. (Maybe one is needed, but
          that's not for me to decide. Hint: in the DOS console, the printer and the
          display can be set to use different codepages.) Did you try to set
          'termencoding' to cp1251 just before issuing the hardcopy command? (Just a
          wild shot in the dark. I don't really how Vim determines what character
          set[s] the printer will accept.)

          Tony.
        • Glenn Maynard
          ... A multibyte encoding *is* an encoding that uses a variable number of bytes per character. UTF-8 is a multibyte encoding, and Vim treats it as such. (I ve
          Message 4 of 9 , May 7, 2003
          • 0 Attachment
            On Wed, May 07, 2003 at 09:22:14PM +0200, Tony Mechelynck wrote:
            > > UTF-8 isn't an 8-bit (SBCS) encoding, it's a multibyte encoding
            > > (MBCS). (I wonder a little about why UTF-8 doesn't work, but as I
            > > never print I don't have much interest in fixing it myself ...)
            >
            > UTF-8 is in fact a variable encoding. Vim treats it as multibyte; in fact,
            > characters 0-127 are printed as one byte in 7-bit ASCII, the others require
            > between 2 and (in theory) 6 bytes. In addition, it provides for combining
            > (i.e., roughly, "overprinting") characters.

            A multibyte encoding *is* an encoding that uses a variable number of
            bytes per character. UTF-8 is a multibyte encoding, and Vim treats it
            as such. (I've never heard of it referred to as a "variable encoding",
            but you seem to be making a distinction where there isn't one ...)

            This is in contrast to double-byte encodings, which take exactly one or two
            bytes per character with a table of initial bytes that determine whether the
            character spans a second.

            UTF-8 doesn't provide combining; Unicode does that, and UTF-8 is simply an
            encoding for Unicode.

            --
            Glenn Maynard
          • Bram Moolenaar
            ... Please read the docs on this: :help printencoding and everything below :help :hardcopy . -- I noticed my daughter s Disney-net password on a sticky
            Message 5 of 9 , May 7, 2003
            • 0 Attachment
              Valery Kondakoff wrote:

              > > Added support for PostScript printing of various 8-bit encodings. (Mike
              > > Williams)
              >
              > I'm not sure this patch was supposed to fix this, but it is still
              > impossible to print Russian text when 'encoding' is set to 'utf-8'.
              > When I set 'encoding' to 'cp1251' the Russian text is printed
              > correctly.

              Please read the docs on this: ":help 'printencoding'" and everything
              below ":help :hardcopy".

              --
              I noticed my daughter's Disney-net password on a sticky note:
              "MickeyMinnieGoofyPluto". I asked her why it was so long.
              "Because they say it has to have at least four characters."

              /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
              /// Creator of Vim - Vi IMproved -- http://www.Vim.org \\\
              \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
              \\\ Help AIDS victims, buy at Amazon -- http://ICCF.nl/click1.html ///
            • Tony Mechelynck
              Oops... sorry I hit send the first time without composing an answer ... The distinction I make (probably unnecessarily, I agree on that) is that there are
              Message 6 of 9 , May 7, 2003
              • 0 Attachment
                Oops... sorry I hit "send" the first time without composing an answer

                Glenn Maynard <glenn@...> wrote:
                > On Wed, May 07, 2003 at 09:22:14PM +0200, Tony Mechelynck wrote:
                > > > UTF-8 isn't an 8-bit (SBCS) encoding, it's a multibyte encoding
                > > > (MBCS). (I wonder a little about why UTF-8 doesn't work, but as I
                > > > never print I don't have much interest in fixing it myself ...)
                > >
                > > UTF-8 is in fact a variable encoding. Vim treats it as multibyte;
                > > in fact, characters 0-127 are printed as one byte in 7-bit ASCII,
                > > the others require between 2 and (in theory) 6 bytes. In addition,
                > > it provides for combining (i.e., roughly, "overprinting")
                > > characters.
                >
                > A multibyte encoding *is* an encoding that uses a variable number of
                > bytes per character. UTF-8 is a multibyte encoding, and Vim treats it
                > as such. (I've never heard of it referred to as a "variable
                > encoding",
                > but you seem to be making a distinction where there isn't one ...)

                The distinction I make (probably unnecessarily, I agree on that) is that
                there are encodings like UTF-32, which use more than one byte (in this case,
                32 bits fixed-width) for every character (or codepoint, to be more precise);
                while for some languages like English or even (but to a lesser degree)
                French or Spanish, UTF-8 is "almost" a single-byte encoding. In this case of
                course the subject was Russian, and that is almost the opposite case, since
                there every alphabetic character is two bytes, while only digits and
                punctuation signs (spaces, commas, colons, semicolons, full stops...) are
                single-byte.
                >
                > This is in contrast to double-byte encodings, which take exactly one
                > or two bytes per character with a table of initial bytes that
                > determine whether the character spans a second.
                >
                > UTF-8 doesn't provide combining; Unicode does that, and UTF-8 is
                > simply an encoding for Unicode.

                Well, so anything provided by Unicode is implicitly, or should I say
                vicariously, provided by UTF-8.
                >
                > --
                > Glenn Maynard

                Anyway, this part of my former reply is mostly a question of terminology,
                and probably not very important at that. I would never have taken the
                trouble of writing it if there hadn't been the part that you snipped, which
                was about using Vim...

                Regards,
                Tony.
              • Glenn Maynard
                ... MBCS specifically doesn t refer to encodings like UTF-32. Encodings which use a fixed multiple number of bytes per codepoint[1] are wide strings. C makes
                Message 7 of 9 , May 7, 2003
                • 0 Attachment
                  On Wed, May 07, 2003 at 10:19:56PM +0200, Tony Mechelynck wrote:
                  > The distinction I make (probably unnecessarily, I agree on that) is that
                  > there are encodings like UTF-32, which use more than one byte (in this case,
                  > 32 bits fixed-width) for every character (or codepoint, to be more precise);
                  > while for some languages like English or even (but to a lesser degree)
                  > French or Spanish, UTF-8 is "almost" a single-byte encoding. In this case of
                  > course the subject was Russian, and that is almost the opposite case, since
                  > there every alphabetic character is two bytes, while only digits and
                  > punctuation signs (spaces, commas, colons, semicolons, full stops...) are
                  > single-byte.

                  MBCS specifically doesn't refer to encodings like UTF-32. Encodings
                  which use a fixed multiple number of bytes per codepoint[1] are wide
                  strings. C makes this distinction; eg. see mbstowcs and wcstombs.

                  The difference is that SBCS, MBCS and DBCS strings are normally stored
                  in the same way--strings of bytes (char *). Wide strings normally have
                  a different natural layout, eg (wchar_t *).

                  [1] ignoring surrogate pairs, which are also used in wide strings

                  --
                  Glenn Maynard
                • Tony Mechelynck
                  ... printencoding seems quite new. New in 6.2 maybe? (I don t see it in the helpfiles that came with 6.1.469, and the corresponding executable spits error at
                  Message 8 of 9 , May 7, 2003
                  • 0 Attachment
                    Bram Moolenaar <Bram@...> wrote:
                    > Valery Kondakoff wrote:
                    >
                    > > > Added support for PostScript printing of various 8-bit encodings.
                    > > > (Mike Williams)
                    > >
                    > > I'm not sure this patch was supposed to fix this, but it is still
                    > > impossible to print Russian text when 'encoding' is set to 'utf-8'.
                    > > When I set 'encoding' to 'cp1251' the Russian text is printed
                    > > correctly.
                    >
                    > Please read the docs on this: ":help 'printencoding'" and everything
                    > below ":help :hardcopy".
                    >
                    > --
                    > I noticed my daughter's Disney-net password on a sticky note:
                    > "MickeyMinnieGoofyPluto". I asked her why it was so long.
                    > "Because they say it has to have at least four characters."
                    >
                    > /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net
                    > \\\ /// Creator of Vim - Vi IMproved -- http://www.Vim.org
                    > \\\ \\\ Project leader for A-A-P -- http://www.A-A-P.org
                    > /// \\\ Help AIDS victims, buy at Amazon --
                    > http://ICCF.nl/click1.html ///

                    'printencoding' seems quite new. New in 6.2 maybe? (I don't see it in the
                    helpfiles that came with 6.1.469, and the corresponding executable spits
                    error at ":set printencoding?".) Any idea of an ETA (oh, even not a precise
                    date but a "rough order of magnitude", i.e. error -90%, +900%) for a
                    "stable" 6.2 ? [Say "five weeks" and I will expect it "some time between
                    three days and a year from now" :-).]

                    Regards,
                    Tony.
                  • Tony Mechelynck
                    Glenn Maynard wrote: [...] ... I stand corrected. Tony.
                    Message 9 of 9 , May 7, 2003
                    • 0 Attachment
                      Glenn Maynard <glenn@...> wrote:
                      [...]
                      > MBCS specifically doesn't refer to encodings like UTF-32. Encodings
                      > which use a fixed multiple number of bytes per codepoint[1] are wide
                      > strings. C makes this distinction; eg. see mbstowcs and wcstombs.
                      >
                      > The difference is that SBCS, MBCS and DBCS strings are normally stored
                      > in the same way--strings of bytes (char *). Wide strings normally
                      > have
                      > a different natural layout, eg (wchar_t *).
                      >
                      > [1] ignoring surrogate pairs, which are also used in wide strings
                      >
                      > --
                      > Glenn Maynard

                      I stand corrected.

                      Tony.
                    Your message has been successfully submitted and would be delivered to recipients shortly.