Loading ...
Sorry, an error occurred while loading the content.

Re: CSCD to text file conversion utility

Expand Messages
  • Bhikkhu Pesala
    Unicode is not a font, but an international standard www.unicode.org for the allocation of characters in most languages. At the moment, Pali scholars use all
    Message 1 of 8 , Jun 9, 2004
    • 0 Attachment
      Unicode is not a font, but an international standard www.unicode.org for
      the allocation of characters in most languages. At the moment, Pali
      scholars use all kinds of different character mappings. Most are limited
      to just the ANSI character set, which is not enough. Unicode fonts use
      double-byte encoding to allow for more than 64,000 characters. The Pali
      characters are in LatinExtendedA and LatinExtendedAdditional character
      sets. Windows Unicode fonts like TNR and Verdana have the Pali vowels but
      not the consonants, which are all in LatinExtendedAdditional:

      http://homepage.ntlworld.com/bpesala/clipboard/LatinExtendedAdditional.png

      Not all applications support Unicode yet, but it will come in time. The
      current mish mash of incompatible fonts makes life difficult. If they had
      used Unicode for the CSCD Tipitaka, conversion utilities would not have
      been necessary. Ideally, we need to persuade VRI to bring out a Unicode
      version. Unicode supports Devanagiri, Myanmar, Sinhalese, Thai, Khmer,
      Mongolian, and Romanized Pali.

      You can find some ANSI and Unicode fonts on my website: The Titus Unicode
      font is pretty comprehensive.

      http://www.aimwell.org/Fonts/fonts.html
    • Hans Van Slooten
      Dear Bhante, Thanks for the information. It should be fairly trivilal to modify my application to output a UTF-8 text file with the proper Unicode encodings.
      Message 2 of 8 , Jun 9, 2004
      • 0 Attachment
        Dear Bhante,

        Thanks for the information. It should be fairly trivilal to modify my application to output a UTF-8 text file with the proper Unicode encodings. I may have something in within the next couple of days for the list to examine.

        Thanks again for the information,
        Hans


        On Wednesday, June 09, 2004, at 08:25AM, Bhikkhu Pesala <pesala@...> wrote:

        >Unicode is not a font, but an international standard www.unicode.org for
        >the allocation of characters in most languages. At the moment, Pali
        >scholars use all kinds of different character mappings. Most are limited
        >to just the ANSI character set, which is not enough. Unicode fonts use
        >double-byte encoding to allow for more than 64,000 characters. The Pali
        >characters are in LatinExtendedA and LatinExtendedAdditional character
        >sets. Windows Unicode fonts like TNR and Verdana have the Pali vowels but
        >not the consonants, which are all in LatinExtendedAdditional:
        >
        >http://homepage.ntlworld.com/bpesala/clipboard/LatinExtendedAdditional.png
        >
        >Not all applications support Unicode yet, but it will come in time. The
        >current mish mash of incompatible fonts makes life difficult. If they had
        >used Unicode for the CSCD Tipitaka, conversion utilities would not have
        >been necessary. Ideally, we need to persuade VRI to bring out a Unicode
        >version. Unicode supports Devanagiri, Myanmar, Sinhalese, Thai, Khmer,
        >Mongolian, and Romanized Pali.
        >
        >You can find some ANSI and Unicode fonts on my website: The Titus Unicode
        >font is pretty comprehensive.
        >
        >http://www.aimwell.org/Fonts/fonts.html
        >
        >
        >
        >
        >- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        >[Homepage] http://www.tipitaka.net
        >[Send Message] pali@yahoogroups.com
        >Paaliga.na - a community for Pali students
        >Yahoo! Groups members can set their delivery options to daily digest or web only.
        >Yahoo! Groups Links
        >
        >
        >
        >
        >
        >
        >
      • Phra Noah Yuttadhammo
        ... Dear Bhante Pesala, Greetings venerable sir :) I just want to let you and others know that Mr. Snow s utility DOES convert the files to Unicode. I just
        Message 3 of 8 , Jun 10, 2004
        • 0 Attachment
          >
          > I have it, but don't use it. What might be useful is a utility that
          > converted the files to Unicode.
          >

          Dear Bhante Pesala,

          Greetings venerable sir :) I just want to let you and others know that Mr. Snow's utility DOES convert the files to Unicode. I
          just downloaded the file cscdconv3.zip and successfully converted the files to UTF-8 Unicode.

          Best wishes,

          Yuttadhammo (Phra Noah)

          --------------------------------------------------------------------------------
          Chom Tong Insight Meditation Center
          Wat Phradhatu Sri Chom Tong Voravihara
          T. Ban Luang, A. Chom Tong
          Chiang Mai, Thailand, 50160
          Website: - www.sirimangalo.org
          Tel: (66 - int'l.) (0 - in Thailand) 53 342 184
          --------------------------------------------------------------------------------
        • Bhikkhu Pesala
          Thanks for the info. The utility is very simple to use, and as you say, it does already convert to Unicode or other formats. I could not believe how fast it
          Message 4 of 8 , Jun 11, 2004
          • 0 Attachment
            Thanks for the info. The utility is very simple to use, and as you say, it
            does already convert to Unicode or other formats. I could not believe how
            fast it was. Converting the Patisambhidamagga took about 2 seconds.

            It requires the Indic Times Font, which can be downloaded from:

            http://jbe.gold.ac.uk/fonts/tcrUnicode.TTF

            It views OK in Internet Explorer, but one cannot enlarge the font. Display
            in Opera is poor, though one can then zoom in. One would need to modify
            the stylesheet I suspect, though I wouldn't know how to do this.
          • Bhikkhu Pesala
            I figured out how to edit the style sheet. The problem lies with the Indic Times font. I also find 12pt rather small for body text, so I enlarged it to 16pt
            Message 5 of 8 , Jun 12, 2004
            • 0 Attachment
              I figured out how to edit the style sheet. The problem lies with the Indic
              Times font. I also find 12pt rather small for body text, so I enlarged it
              to 16pt and changed the font to my own Unicode Optimist. This is the
              revised style sheet, which is suitable for Opera or Internet Explorer.
              Copy the text and save it as tipitaka4.css after backing up the original.
              It needs to be in the same directory as the converted HTML files. You
              could replace "Unicode Optimist" with the font of your choice such as
              "TITUS Cyberbit Basic."

              /* This is the stylesheet for Roman UTF-8 Unicode encoding.
              */

              body { font-family:"Unicode Optimist";
              background:white; }

              SPAN {}
              .variant {color: blue}

              p {
              border-top: 0in; border-bottom: 0in;
              padding-top: 0in; padding-bottom: 0in;
              margin-top: 0in; margin-bottom: 0.5cm;
              }
              /* */
              .c01 { font-size: 16pt; text-indent: 2em; margin-bottom: 0em;}


              .c02 { font-size: 16pt;}


              .c03 { font-size: 16pt; text-indent: 2em;}

              /* Namo tassa, and nitthita -- no unique structural distinction */
              .c06 { font-size: 16pt; text-align:center;}

              /* Unindented text */
              .c07 { font-size: 16pt;}

              /* Book */
              .c10 { font-size: 21pt; text-align:center; font-weight: bold;}

              /* Sutta */
              .c11 { font-size: 18pt; text-align:center; font-weight: bold;}

              /* Nikaaya */
              .c12 { font-size: 24pt; text-align:center; font-weight: bold;}

              /* Section (above 14)*/
              .c13 { font-size: 16pt; text-align:center; font-weight: bold;}

              /* Section */
              .c14 { font-size: 16pt; text-align:center; font-weight: bold;}

              /* Gatha line */
              .c21 { font-size: 16pt; text-indent: 4em; margin-bottom: 0em;}

              /* Gatha line */
              .c22 { font-size: 16pt; text-indent: 7em; margin-bottom: 0.5cm;}

              /* Gatha line */
              .c26 { font-size: 16pt; text-indent: 4em; margin-bottom: 0em;}

              /* Gatha line */
              .c27 { font-size: 16pt; text-indent: 7em; margin-bottom: 0em;}


              /*DN Muula Structure
              {
              06 Namo tassa..
              12 Nikaaya
              10 Book
              }
              |
              |___11 Sutta
              | |
              | |___14 Section
              | |
              | |___ Text paras (01,02,03,07), gathas (21,22,26,27), various
              nitthitas (06)
              |
              |___11 Sutta
              (etc.)

              */
            Your message has been successfully submitted and would be delivered to recipients shortly.