Loading ...
Sorry, an error occurred while loading the content.
 

Re: [Pali] Re: CSCD to text file conversion utility

Expand Messages
  • Hans Van Slooten
    Dear Bhante, Yes, it s because of the work that Frank Snow did (and his explanation of the file format) that allowed me to write this utility. If you are able
    Message 1 of 8 , Jun 8 5:46 PM
      Dear Bhante,

      Yes, it's because of the work that Frank Snow did (and his explanation
      of the file format) that allowed me to write this utility.

      If you are able to tell me which Unicode font you would like to be able
      to convert to, I am certain that I would be able to figure it out and
      would be willing to put some time into it (I'm a software developer by
      trade). The format is actually very simple to convert, so it is easy
      to test out changes fairly easily.

      Regards,
      Hans

      On Jun 8, 2004, at 4:22 PM, Bhikkhu Pesala wrote:

      > You are most probably aware of this utility:
      >
      > The CSCD Conversion Utility (CSCDCONV) is designed to be used with the
      > Vipassana Research Institute's Chattha Sangayana CD-ROM. For
      > information
      > on obtaining the CD, see:
      > http://www.tipitaka.org or http://www.vri.dhamma.org
      >
      > For the latest version of this program, see:
      > http://www.fsnow.com/pali/
      >
      > The output files produced by CSCDCONV are for personal use, and SHOULD
      > NOT
      > BE DISTRIBUTED.
      >
      > CSCDCONV, copyright 2001, Frank Snow, fsnow@...
      > CSCDCONV may be distributed freely.
      >
      > I have it, but don't use it. What might be useful is a utility that
      > converted the files to Unicode.
      >
    • Bhikkhu Pesala
      Unicode is not a font, but an international standard www.unicode.org for the allocation of characters in most languages. At the moment, Pali scholars use all
      Message 2 of 8 , Jun 9 6:25 AM
        Unicode is not a font, but an international standard www.unicode.org for
        the allocation of characters in most languages. At the moment, Pali
        scholars use all kinds of different character mappings. Most are limited
        to just the ANSI character set, which is not enough. Unicode fonts use
        double-byte encoding to allow for more than 64,000 characters. The Pali
        characters are in LatinExtendedA and LatinExtendedAdditional character
        sets. Windows Unicode fonts like TNR and Verdana have the Pali vowels but
        not the consonants, which are all in LatinExtendedAdditional:

        http://homepage.ntlworld.com/bpesala/clipboard/LatinExtendedAdditional.png

        Not all applications support Unicode yet, but it will come in time. The
        current mish mash of incompatible fonts makes life difficult. If they had
        used Unicode for the CSCD Tipitaka, conversion utilities would not have
        been necessary. Ideally, we need to persuade VRI to bring out a Unicode
        version. Unicode supports Devanagiri, Myanmar, Sinhalese, Thai, Khmer,
        Mongolian, and Romanized Pali.

        You can find some ANSI and Unicode fonts on my website: The Titus Unicode
        font is pretty comprehensive.

        http://www.aimwell.org/Fonts/fonts.html
      • Hans Van Slooten
        Dear Bhante, Thanks for the information. It should be fairly trivilal to modify my application to output a UTF-8 text file with the proper Unicode encodings.
        Message 3 of 8 , Jun 9 7:19 AM
          Dear Bhante,

          Thanks for the information. It should be fairly trivilal to modify my application to output a UTF-8 text file with the proper Unicode encodings. I may have something in within the next couple of days for the list to examine.

          Thanks again for the information,
          Hans


          On Wednesday, June 09, 2004, at 08:25AM, Bhikkhu Pesala <pesala@...> wrote:

          >Unicode is not a font, but an international standard www.unicode.org for
          >the allocation of characters in most languages. At the moment, Pali
          >scholars use all kinds of different character mappings. Most are limited
          >to just the ANSI character set, which is not enough. Unicode fonts use
          >double-byte encoding to allow for more than 64,000 characters. The Pali
          >characters are in LatinExtendedA and LatinExtendedAdditional character
          >sets. Windows Unicode fonts like TNR and Verdana have the Pali vowels but
          >not the consonants, which are all in LatinExtendedAdditional:
          >
          >http://homepage.ntlworld.com/bpesala/clipboard/LatinExtendedAdditional.png
          >
          >Not all applications support Unicode yet, but it will come in time. The
          >current mish mash of incompatible fonts makes life difficult. If they had
          >used Unicode for the CSCD Tipitaka, conversion utilities would not have
          >been necessary. Ideally, we need to persuade VRI to bring out a Unicode
          >version. Unicode supports Devanagiri, Myanmar, Sinhalese, Thai, Khmer,
          >Mongolian, and Romanized Pali.
          >
          >You can find some ANSI and Unicode fonts on my website: The Titus Unicode
          >font is pretty comprehensive.
          >
          >http://www.aimwell.org/Fonts/fonts.html
          >
          >
          >
          >
          >- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
          >[Homepage] http://www.tipitaka.net
          >[Send Message] pali@yahoogroups.com
          >Paaliga.na - a community for Pali students
          >Yahoo! Groups members can set their delivery options to daily digest or web only.
          >Yahoo! Groups Links
          >
          >
          >
          >
          >
          >
          >
        • Phra Noah Yuttadhammo
          ... Dear Bhante Pesala, Greetings venerable sir :) I just want to let you and others know that Mr. Snow s utility DOES convert the files to Unicode. I just
          Message 4 of 8 , Jun 10 6:20 PM
            >
            > I have it, but don't use it. What might be useful is a utility that
            > converted the files to Unicode.
            >

            Dear Bhante Pesala,

            Greetings venerable sir :) I just want to let you and others know that Mr. Snow's utility DOES convert the files to Unicode. I
            just downloaded the file cscdconv3.zip and successfully converted the files to UTF-8 Unicode.

            Best wishes,

            Yuttadhammo (Phra Noah)

            --------------------------------------------------------------------------------
            Chom Tong Insight Meditation Center
            Wat Phradhatu Sri Chom Tong Voravihara
            T. Ban Luang, A. Chom Tong
            Chiang Mai, Thailand, 50160
            Website: - www.sirimangalo.org
            Tel: (66 - int'l.) (0 - in Thailand) 53 342 184
            --------------------------------------------------------------------------------
          • Bhikkhu Pesala
            Thanks for the info. The utility is very simple to use, and as you say, it does already convert to Unicode or other formats. I could not believe how fast it
            Message 5 of 8 , Jun 11 5:20 AM
              Thanks for the info. The utility is very simple to use, and as you say, it
              does already convert to Unicode or other formats. I could not believe how
              fast it was. Converting the Patisambhidamagga took about 2 seconds.

              It requires the Indic Times Font, which can be downloaded from:

              http://jbe.gold.ac.uk/fonts/tcrUnicode.TTF

              It views OK in Internet Explorer, but one cannot enlarge the font. Display
              in Opera is poor, though one can then zoom in. One would need to modify
              the stylesheet I suspect, though I wouldn't know how to do this.
            • Bhikkhu Pesala
              I figured out how to edit the style sheet. The problem lies with the Indic Times font. I also find 12pt rather small for body text, so I enlarged it to 16pt
              Message 6 of 8 , Jun 12 9:27 AM
                I figured out how to edit the style sheet. The problem lies with the Indic
                Times font. I also find 12pt rather small for body text, so I enlarged it
                to 16pt and changed the font to my own Unicode Optimist. This is the
                revised style sheet, which is suitable for Opera or Internet Explorer.
                Copy the text and save it as tipitaka4.css after backing up the original.
                It needs to be in the same directory as the converted HTML files. You
                could replace "Unicode Optimist" with the font of your choice such as
                "TITUS Cyberbit Basic."

                /* This is the stylesheet for Roman UTF-8 Unicode encoding.
                */

                body { font-family:"Unicode Optimist";
                background:white; }

                SPAN {}
                .variant {color: blue}

                p {
                border-top: 0in; border-bottom: 0in;
                padding-top: 0in; padding-bottom: 0in;
                margin-top: 0in; margin-bottom: 0.5cm;
                }
                /* */
                .c01 { font-size: 16pt; text-indent: 2em; margin-bottom: 0em;}


                .c02 { font-size: 16pt;}


                .c03 { font-size: 16pt; text-indent: 2em;}

                /* Namo tassa, and nitthita -- no unique structural distinction */
                .c06 { font-size: 16pt; text-align:center;}

                /* Unindented text */
                .c07 { font-size: 16pt;}

                /* Book */
                .c10 { font-size: 21pt; text-align:center; font-weight: bold;}

                /* Sutta */
                .c11 { font-size: 18pt; text-align:center; font-weight: bold;}

                /* Nikaaya */
                .c12 { font-size: 24pt; text-align:center; font-weight: bold;}

                /* Section (above 14)*/
                .c13 { font-size: 16pt; text-align:center; font-weight: bold;}

                /* Section */
                .c14 { font-size: 16pt; text-align:center; font-weight: bold;}

                /* Gatha line */
                .c21 { font-size: 16pt; text-indent: 4em; margin-bottom: 0em;}

                /* Gatha line */
                .c22 { font-size: 16pt; text-indent: 7em; margin-bottom: 0.5cm;}

                /* Gatha line */
                .c26 { font-size: 16pt; text-indent: 4em; margin-bottom: 0em;}

                /* Gatha line */
                .c27 { font-size: 16pt; text-indent: 7em; margin-bottom: 0em;}


                /*DN Muula Structure
                {
                06 Namo tassa..
                12 Nikaaya
                10 Book
                }
                |
                |___11 Sutta
                | |
                | |___14 Section
                | |
                | |___ Text paras (01,02,03,07), gathas (21,22,26,27), various
                nitthitas (06)
                |
                |___11 Sutta
                (etc.)

                */
              Your message has been successfully submitted and would be delivered to recipients shortly.