Loading ...
Sorry, an error occurred while loading the content.

proposal for a GNU Unicode font

Expand Messages
  • Roman Czyborra
    Please have a thorough look at http://czyborra.com/unifont/ proposal for a GNU Unicode font and share your comments about what must happen before this can be
    Message 1 of 3 , Sep 29, 1998
    View Source
    • 0 Attachment
      Please have a thorough look at

      http://czyborra.com/unifont/
      proposal for a GNU Unicode font

      and share your comments about what must happen
      before this can be released to the general public.

      On 23 Jun 1998, Mark Crispin wrote:

      > Newsgroups: netscape.public.mozilla.i18n
      > Subject: re: About ISO-2022-CN and Mozilla
      >
      > It may be worthwhile to build a few CJK Unicode X (and perhaps
      > MacOS) fonts, perhaps from the existing public domain CJK fonts.

      On 23 Jul 1998, Markus Kuhn wrote:

      > Newsgroups: comp.std.internat,comp.software.international,comp.fonts
      > Subject: Re: Unicode reference fonts
      >
      > William Ehrich wrote:
      > > Would it make sense for the code defining organizations also to
      > > define a bit-mapped representation, at least as an example and
      > > as a reality check?

      Cheers
      Roman



      ------------------------------------------------------------------------
      eGroup home: http://www.eGroups.com/list/gnu-unifont
      Free Web-based e-mail groups by eGroups.com
    • Mark Crispin
      All I can say is it s about time . Yes, we badly need a basic and (reasonably) complete Unicode font. It doesn t matter if it is ugly; that can be fixed at
      Message 2 of 3 , Sep 29, 1998
      View Source
      • 0 Attachment
        All I can say is "it's about time".

        Yes, we badly need a basic and (reasonably) complete Unicode font. It doesn't
        matter if it is ugly; that can be fixed at leisure.

        What about the uni24.bdf font that came from Ho Yean Fee's group? She was
        from the National University of Singapore, but then formed a separate compay.
        I could probably find an address if I looked. I played with it a bit, but at
        6MB it was too big for my X server to swallow... ;-)

        Whatever you do, don't talk to Ohta Masataka (or similar cranks) about this
        project. He'll come up with all sorts of reasons why it shouldn't be done,
        and just waste everybody's time.


        ------------------------------------------------------------------------
        eGroup home: http://www.eGroups.com/list/gnu-unifont
        Free Web-based e-mail groups by eGroups.com
      • Jungshik Shin
        ... Thank you for your efforts and keeping me informed about your project. Let me begin with some comments and go on to say a good news :-). ... For Korean
        Message 3 of 3 , Sep 29, 1998
        View Source
        • 0 Attachment
          On Tue, 29 Sep 1998, Roman Czyborra wrote:

          > Please have a thorough look at
          >
          > http://czyborra.com/unifont/
          > proposal for a GNU Unicode font
          >
          > and share your comments about what must happen
          > before this can be released to the general public.

          Thank you for your efforts and keeping me informed about
          your project. Let me begin with some comments and go
          on to say a good news :-).


          A couple of comments about your project web page:

          > On the other hand, you can very well pretend to forget about the subtle
          > difference between characters and glyphs and abuse Unicode as a
          > one-to-one glyph numbering scheme. All Unicode characters have exactly
          > one reference glyph in the Unicode book. Pasting these glyphs next to
          > each other on horizontal lines just like you did with ASCII and
          > ISO-8859-1 works for many languages, including most European languages,
          > Ethiopic, Chinese, Japanese and Korean (CJK). Y

          For Korean Hangul script, "one glyph-one code point" model doesn't
          work as you think. Your assumption is valid if you wish to consider
          only 11,172 pre-composed modern Hangul syllables in UAC00-UDxxx block.
          However, that's only one of a few different ways of representing Hangul
          which we're not so fond of, but which has been sort of forced upon
          us(for several reasons including ease of implementation). A far more
          ideal approach (I'm fully aware that your intention is NOT being ideal
          and that your assumption is very useful for your limited purpose.
          Therefore, this is just to dispell common misconception about Korean
          Hangul scripts widely spread among non-Koreans) is use Hangul
          Jamos(alphabets) enumerated in U1100-U11FF block and dynamically compose
          glyphs for Hangul syllables made up of two,three or more Hangul Jamos.
          In this respect, Korean Hangul is much more like most Indic scripts and
          Thai script. Thus, the following statement of yours about Indic scripts
          used in South(it's not Central) Asia(mostly Indian subcontinent) should
          be noted to be the case of Korean Hangul scripts and to a lesser extent
          Thai/Lao scripts.

          > The native Central Asian languages from in and around India are
          > currently the odd man out as nobody has publically numbered their many
          > ligature glyphs yet. They will appear far from perfect with a bare

          That's why Korean Hangul and Thai scripts are listed as target scripts
          of X11 CTL.

          > Others may hopefully find it more interesting to work on proofreading,
          > optimizing and completing parts of the oriental [U+3000..U+FFFF] range.
          > 20'902 - 18'174 = 2'728 Han ideographs are still missing, as well as the
          > 11'172 - 2350 = 8'822 UHC precomposed Hangul syllables for which there
          > does not seem to exist any free font yet.

          "11,172-2,350 = 8,822 UHC" should read

          "11,172-2,350 = 8,822 not covered by KS C 5601-1987 but convered
          by KS X 1001(KS C 5601-1993) annex 3 and Unicode 2.0 or later", instead.
          UHC has the same character repertoire as Unicode 2.0 and Johab encoding
          defined in KS X 1001 annex 3, but its encoding has nothing to do with
          standard(just a Microsoft's proprietary encoding), so refering to it
          when talking about 8,822 precomposed Hangul syllable is not so good an idea
          especially considering there ARE at least two standards including all of
          them, Unicode 2.x(which has been adopted as KS X 1005) and KS X 1001
          annex 3.



          > Bitstream Cyberbit

          > The most impressive Unicode font so far is Bitstream's cyberbit.ttf, a
          > 13 MB serifed TrueType font covering all 20'902 Han ideographs besides
          > basic Latin (no Vietnamese or IPA), Greek, Cyrillic, Arabic, Hebrew,
          > Thai, and 1'153 of the Hangul syllables. It was released in 1997 for
          > free download in the form of MS-DOS *.EXE archives from
          > ftp.bitstream.com. You had to sign a license agreement restricting its
          > use to a single copy. I am confused about the font's latest status.

          Bitstream cyberbit, if my memory serves me right, includes a full set
          of glyphs for 11,172(not 1,153) precomposed Hangul
          syllables(UAC00-UD7A3). Besides, a self-extracting *.exe file can be
          decompressed easily with Unix version of 'unzip'. If its license terms
          permit, ttf2bdf would be the easiest way to get X11 BDF unicode
          font(more or less complete).

          Now, it's time for good news. Bitmapped glyphs for 11,172 pre-composed
          Hangul in UCS-2 can be easily(with some Perl tinkering) made from
          several different FREE sources. The reason it's not been done is those
          who are most likely to do the chore(Linux and other free Unix users)
          don't feel need for it. Why? Because there are a few alternatives to
          huge X11 bitmap fonts : ttf support for X11 is widely available to
          Korean Linux/FreeBSD users along with a couple of Korean truetype fonts
          with 11,172 glyphs. In addition, an X11 font server for Linux that
          presents several free scalable fonts used by the most popular Korean
          word processor as X11 fonts to X11 application has been available about
          a year.(For details, see http://pantheon.yale.edu/~jshin/faq/qa6.html
          and references therein)


          > 11'172 - 2350 = 8'822 UHC precomposed Hangul syllables for which there
          > does not seem to exist any free font yet.

          As I wrote above, there are some free fonts with all of 11,172
          pre-composed Hangul syllables including several fonts distributed with
          HLaTeX 0.98 although not in X11 BDF format. Anyway, here's one of a few
          ways to get them in X11 BDF format as well as in your bitmap format.
          Hanterm(Korean xterm) uses "Johab-encoded fonts" (made up of a few sets
          of glyphs for Hangul jamos) to dynamically compose all modern (and a
          subset of medivial) syllables. These fonts used by Hanterm available in
          X11 BDF format can be converted to your format in UCS-2 encoding using
          the Perl script enclosed below. I think it's much better to use bitmap
          patterns generated with it than that produced out of Daewoo fonts
          included in X11 distribution. Perhaps, johabm16.bdf or iyagi16.bdf would
          be a good match for the rest of glyphs in your font.

          Thank you again for your work,

          Jungshik Shin

          P.S. I may try to extract bitmap patterns from Postscript fonts used by
          HLaTeX 0.98 sooner or later.



          ----------------------Cut-------------Here-------------------
          #!/usr/bin/perl -w

          # johab2ucs2.pl
          # This script(working as filter) converts Hangul "Johab encoded fonts"
          # with an unofficial XLFD name "-johab" in BDF format
          # to UCS-2 encoded font in a format defined by
          # Roman Czyborra <roman@...> at
          # http://czyborra.com/unifont/

          # 'hanterm304font.tar.gz contains about a dozen of
          # "Johab-encoded" fonts. The package is available
          # ftp://ftp.kaist.ac.kr/hangul/terminal/hanterm//hanterm304beta/fonts
          # Please, note that this script only works with fonts whose
          # XLFD name end with
          #
          # --16-160-75-75-c-160-johab-1
          # (and whose file name in the package doesn't include 's' or 'sh' preceding
          # '(m|g)16.bdf'. )
          #
          # There are four of them :
          # johabg16.bdf,johabm16.bdf,johabp16.bdf,iyagi16.bdf.
          #
          # Fonts in the package with other XLFD names
          # (johabs and johabsh) contain glyphs for about 5000 Hanjas and special symbols
          # defined in KS C 5601-1987.

          # Sep. 29, 1998
          # Jungshik Shin <jshin@...>

          # A more complete routine which not only covers
          # *modern* pre-composed Hangul syllables in UAC00-UD7A3
          # but also supports dynamic rendering of
          # Hangul syllables(medivial as well as modern)
          # using Hangul comibining Jamos at [U1100-U11FF]
          # was made by Deog-tae Kim <dtkim@...>
          # to be used in Java font-properties file.
          # It's available at http://calab.kaist.ac.kr/~dtkim/java/


          # Conversion routine from Hangul Jamo index to glyph index
          # of Hangul "Johab-encoded" fonts as used by
          # Hangul xterm, hanterm.
          # The following routine is based on Hanterm by Song, Jaekyung
          # available at ftp://ftp.kaist.ac.kr/hangul/terminal/hanterm



          # The base font index for leading consonants
          @lconBase= (
          1, 11, 21, 31, 41, 51,
          61, 71, 81, 91, 101, 111,
          121, 131, 141, 151, 161, 171,
          181
          );

          # The base font index for vowels

          @vowBase = (
          0,311,314,317,320,323, # (Fill), A, AE, YA, YAE, EO
          326,329,332,335,339,343, # E, YEO, YE, O, WA, WAE
          347,351,355,358,361,364, # OI, YO, U, WEO, WE, WI
          367,370,374,378 # YU, EU, UI, I
          );

          # The base font index for trailing consonants

          @tconBase = (
          # modern trailing consonants (filler + 27)
          0,
          405, 409, 413, 417, 421,
          425, 429, 433, 437, 441,
          445, 459, 453, 457, 461,
          465, 469, 473, 477, 481,
          485, 489, 493, 497, 501,
          505, 509
          );

          # The mapping from vowels to leading consonant type
          # in absence of trailing consonant

          @lconMap1 = (
          0,0,0,0,0,0, # (Fill), A, AE, YA, YAE, EO
          0,0,0,1,3,3, # E, YEO, YE, O, WA, WAE
          3,1,2,4,4,4, # OI, YO, U, WEO, WE, WI
          2,1,3,0 # YU, EU, UI, I
          );

          # The mapping from vowels to leading consonant type
          # in presence of trailing consonant

          @lconMap2 = (
          5,5,5,5,5,5, # (Fill), A, AE, YA, YAE, EO
          5,5,5,6,8,8, # E, YEO, YE, O, WA, WAE
          8,6,7,9,9,9, # OI, YO, U, WEO, WE, WI
          7,6,8,5 # YU, EU, UI, I
          );

          # vowel type ; 1 = o and its alikes, 0 = others

          @vowType = (
          0,0,0,0,0,0,
          0,0,0,1,1,1,
          1,1,0,0,0,0,
          0,1,1,0
          );

          # The mapping from trailing consonants to vowel type

          @tconType = (
          0, 1, 1, 1, 2, 1,
          1, 1, 1, 1, 1, 1,
          1, 1, 1, 1, 1, 1,
          1, 1, 1, 1, 1, 1,
          1, 1, 1, 1
          );

          # The mapping from vowels to trailing consonant type

          @tconMap = (
          0, 0, 2, 0, 2, 1, # (Fill), A, AE, YA, YAE, EO
          2, 1, 2, 3, 0, 0, # E, YEO, YE, O, WA, WAE
          0, 3, 3, 1, 1, 1, # OI, YO, U, WEO, WE, WI
          3, 3, 0, 1 # YU, EU, UI, I
          );



          # read in BITMAP patterns for Jamos from JOHAB-encoded BDF font file
          # thru STDIN

          $BITMAP=0;
          while (<>) {
          if (/^ENCODING\s+(\d+)/) { $i = $1; $jamo[$i]=""; }
          elsif (/^BITMAP/) { $BITMAP=1; }
          elsif (/^ENDCHAR/) { $BITMAP=0;
          }
          elsif ($BITMAP) {
          y/a-f/A-F/;
          s/\n$//;
          $jamo[$i] = $jamo[$i] . $_;
          }
          }

          for ( $j=0 ; $j < 11172 ; $j++ ) {

          $init = int( $j / 21 / 28) ;
          $medial = int($j / 28 ) % 21+1 ;
          $final = $j % 28;

          printf ("%04X: %64s\n", $j+0xAC00, &compose_hangul($init,$medial,$final));

          }

          sub compose_hangul
          {
          local($l,$m,$f) = @_;

          @l_bit = unpack("a2" x 32, $jamo[&get_ind($l,$m,$f,1)]);
          @m_bit = unpack("a2" x 32, $jamo[&get_ind($l,$m,$f,2)]);
          @f_bit = unpack("a2" x 32, $jamo[&get_ind($l,$m,$f,3)]);


          for ( $i = 0; $i < 32; $i++) {
          $bit[$i]=sprintf("%02X",
          hex($l_bit[$i]) | hex($m_bit[$i]) | hex($f_bit[$i]) );
          }

          return pack("a2" x 32, @bit );

          }

          sub get_ind
          {
          local($l,$m,$f,$wh) = @_;

          # ($l >= 0 && $l < 19 && $m >=0 && $m < 21 && $f >=0 && $f < 28) or
          # die ("$0: get_ind() : invalid Jamo index\n");

          if ( $wh == 1 ) { # leading consonant index
          return $lconBase[$l] +
          ($f > 0 ? $lconMap2[$m] : $lconMap1[$m] ) ;
          }
          elsif ( $wh == 2 ) { # medial vowel index

          $ind = $vowBase[$m];
          if ( $vowType[$m] == 1 ) {
          # For vowels 'o' and alikes,
          # Giyeok and Kieuk get special treatment
          $ind += ( ($l==0 || $l == 15) ? 0 : 1)
          + ($f > 0 ? 2 : 0 );
          }
          else {
          $ind+= $tconType[$f];
          }
          return $ind;
          }
          else {
          return $tconBase[$f] + $tconMap[$m];
          }
          }


          ------------------------------------------------------------------------
          eGroup home: http://www.eGroups.com/list/gnu-unifont
          Free Web-based e-mail groups by eGroups.com
        Your message has been successfully submitted and would be delivered to recipients shortly.