Loading ...
Sorry, an error occurred while loading the content.

Patch for multibyte printing (long)

Expand Messages
  • Mike Williams
    Hi, Here is an updated patch for supporting multi-byte printing against 6.2p195. This patch provides multi-byte printing from VIMfor printing Japanese,
    Message 1 of 3 , Jan 19, 2004
    • 0 Attachment
      Hi,

      Here is an updated patch for supporting multi-byte printing against
      6.2p195. This patch provides multi-byte printing from VIMfor
      printing Japanese, Traditional and Simplified Chinese, and Korean,
      including half-width characters.

      Important Note: The patch does not provide any of the necessary CMap
      or CID font files for actual printing of CJK text. You will need to
      have these already installed in your printer.

      There is also some new runtime files for the print runtime sub-
      directory. Just unzip the zip file into your runtime directory and
      use directory names. prolog.ps has been changed which means you will
      not be able to use it with an unpatched version of VIM.

      I haven't written any documentation yet since things may change based
      on any feedback. The following should be enough to get you going and
      I can get any feedback. (I didn't get any from the first patch sent
      to vim-multibyte, so now including vim-dev as well.)

      How to do multi-byte printing?

      Setting up multi-byte printing is a bit complicated, you need to
      specify the character set to print, the character encoding to use,
      and which fonts to use.

      1) Specifying the fonts to use.
      There is a new option, printmbfont. This is a comma-separated list
      that specifies the fonts to use for various styles. The fields are:
      r:name font to use for normal text
      b:name font to use for bold text
      i:name font to use for italic text
      o:name font to use for bold italic text

      There are no default font names. The r: field has to be specified,
      the rest are optional. If a field is not specified then VIM will use

      another one as follows:
      if b: is blank, then use r:
      if i: is blank, then use r:
      if o: is black, then use b: (which if blank will use r:)

      I would expect only r: and b: to be specified, so an example of
      setting the option would be:

      :set printmbfont=r:WadaMin-Regular,b:WadaMin-Bold

      A common issue is that some multi-byte fonts do not contain
      characters for codes in the ASCII range. If this is the case with
      your font, then there is a field, c:, to tell VIM to use Courier for
      characters in the ASCII range. The field should be set to yes to use

      Courier - by default Courier is not used.

      Normally, each country's ASCII code range differs in a couple of
      characters from, er, true ASCII. If you are using Courier for
      characters in the ASCII code range, then you can specify true ASCII
      character set printing with another field, a:, which takes yes or no
      as its value.

      So, if you want to use Courier for characters in the ASCII range, and

      use the American ASCII character set along with the font example
      above, the option would then be set as follows:

      :set printmbfont=c:yes,a:yes:,r:WadaMin-Regular,b:WadaMin-Bold

      2) Specifying the character encoding to use for printing.
      This works the same way as current printing - VIM first looks for a
      value for printencoding, and if that is empty then use the value of
      encoding. If printencoding is set then any necessary conversion is
      done when printing. The recognised multi-byte encodings are the ones

      that VIM knows about. Others may be supported - I have still to
      check what iconv will handle.

      3) Specifying the character set.
      Each country has multiple possible character sets, and this is where
      the fun starts. Not all character sets can be used with all
      character encodings, for example you cannot use a Unicode encoding
      with the Simplified Chinese GBT character set. If you try to specify

      this then VIM will report an error.

      You specify the character set to use with a new VIM option -
      printmbcharset. This takes one of a fixed set of values for each
      country as follows:

      Chinese (Simplified)
      Value Description
      GB_2312-80
      GBT_12345-90
      MAC ; Apple Mac Simplified Chinese
      GBT-90_MAC ; GB/T 12345-90 Apple Mac Simplified Chinese
      GBK ; GBK (GB 13000.1-93)
      ISO10646 ; ISO 10646-1:1993

      Chinese (Traditional)
      Value Description
      CNS_1993 ; CNS 11643-1993, Planes 1 & 2
      BIG5
      ETEN ; Big5 with ETen extensions
      ISO10646 ; ISO 10646-1:1993

      Japanese
      Value Description
      JIS_C_1978
      JIS_X_1983
      JIS_X_1990
      MSWINDOWS ; Win3.1/95J (JIS X 1997 + NEC + IBM extensions)
      KANJITALK6 ; Apple Mac KanjiTalk V6.x
      KANJITALK7 ; Apple Mac KanjiTalk V7.x

      Korean
      Value Description
      KS_X_1992
      MAC ; Apple Macintosh Korean
      MSWINDOWS ; KS X 1992 with MS extensions
      ISO10646 ; ISO 10646-1:1993

      Only certain encodings and character sets can be used together when
      printing. This means if you use utf8 encoding to edit in VIM you
      will need iconv installed to convert your text to an encoding that
      can be combined with the character set you want. The following
      tables show which cahractersets can be used with which encoding for
      each country.

      Simplified Chinese
      euc-cn gbk ucs-2 utf-8
      GB_2312-80 x
      GBT_12345-90 x
      MAC x
      GBT-90_MAC x
      GBK x
      ISO10646 x x

      Traditional Chinese
      euc-tw big5 ucs-2 utf-8
      CNS_1993 x
      BIG5 x
      ETEN x
      ISO10646 x x

      Japanese
      euc-jp sjis ucs-2 utf-8
      JIS_C_1978 x x
      JIS_X_1983 x x
      JIS_X_1990 x x x
      MSWINDOWS x
      KANJITALK6 x
      KANJITALK7 x

      Korean
      euc-kr cp949 ucs-2 utf-8
      KS_X_1992 x
      MAC x
      MSWINDOWS x
      ISO10646 x x


      So, to set up printing in Japanese from a VIM encoding of utf8

      set printencoding=euc-jp
      set printmbfont=c:yes,r:WadaMin-Regular,b:WadaMin-Bold
      set printmbcharset=JIS_X_1983

      If you could try things out and report any problems you have it would

      be appreciated. Once known issue is with the page header if you use

      %= to have left and right aligned components, due to printing half-
      width characters.

      If you discover a bug in the generated output if you could send me
      details including the original text file along with your encoding and
      print settings, I would appreciate it.

      I would also be interested in any comments on the the new options
      used to configure printing, on the naming of the various country
      character sets, or any other suggestions you may have.

      TTFN

      Mike
      --
      If at first you don't succeed, give up, no use being a damn fool.
    • Glenn Maynard
      ... On which systems does the ASCII range differ, other than Japanese Windows systems having a broken backslash? There are probably a couple more that I m
      Message 2 of 3 , Jan 19, 2004
      • 0 Attachment
        On Mon, Jan 19, 2004 at 10:05:32PM -0000, Mike Williams wrote:
        > Normally, each country's ASCII code range differs in a couple of
        > characters from, er, true ASCII. If you are using Courier for
        > characters in the ASCII code range, then you can specify true ASCII
        > character set printing with another field, a:, which takes yes or no
        > as its value.

        On which systems does the "ASCII range" differ, other than Japanese Windows
        systems having a broken backslash? There are probably a couple more that
        I'm not aware of, but it's definitely not normal at all for ASCII codepoints
        to deviate; it's exceptional (albeit an important exception that must be
        dealt with).

        --
        Glenn Maynard
      • Mike Williams
        ... I don t have my reference to hand but each country has it s own national character set standard for the one byte code range, which is similar to the ASCII
        Message 3 of 3 , Jan 20, 2004
        • 0 Attachment
          On 19 Jan 2004 at 17:18, Glenn Maynard wrote:

          > On which systems does the "ASCII range" differ, other than Japanese Windows
          > systems having a broken backslash? There are probably a couple more that
          > I'm not aware of, but it's definitely not normal at all for ASCII codepoints
          > to deviate; it's exceptional (albeit an important exception that must be
          > dealt with).

          I don't have my reference to hand but each country has it's own
          national character set standard for the one byte code range, which is
          similar to the ASCII character set but can differ in a couple of
          characters.

          For the example you give, the Japanese standard defines a Yen symbol
          for the code 0x5c which is a backslash in ASCII (is this what you
          mean by broken?) and an overline (a high horizontal line) in place of
          a tilde. Another example is with Simplified Chines where ASCII code
          for the dollar is used for the yuan.

          These slight differences can cause problems when printing files that
          are dependent on the US-ASCII character set - such as Perl or PHP
          (especially on Windows!). It is a simple option to let the user
          force the use of US-ASCII for one byte characters in this case.
          Perhaps, what I should do is allow for this to be done without having
          to use Courier as well. I'll have a look.

          TTFN

          Mike
          --
          I is a uni student.
        Your message has been successfully submitted and would be delivered to recipients shortly.