Loading ...
Sorry, an error occurred while loading the content.
 

Re: CSCD to text file conversion utility

Expand Messages
  • Bhikkhu Pesala
    You are most probably aware of this utility: The CSCD Conversion Utility (CSCDCONV) is designed to be used with the Vipassana Research Institute s Chattha
    Message 1 of 8 , Jun 8, 2004
      You are most probably aware of this utility:

      The CSCD Conversion Utility (CSCDCONV) is designed to be used with the
      Vipassana Research Institute's Chattha Sangayana CD-ROM. For information
      on obtaining the CD, see:
      http://www.tipitaka.org or http://www.vri.dhamma.org

      For the latest version of this program, see:
      http://www.fsnow.com/pali/

      The output files produced by CSCDCONV are for personal use, and SHOULD NOT
      BE DISTRIBUTED.

      CSCDCONV, copyright 2001, Frank Snow, fsnow@...
      CSCDCONV may be distributed freely.

      I have it, but don't use it. What might be useful is a utility that
      converted the files to Unicode.
    • Hans Van Slooten
      Dear Bhante, Yes, it s because of the work that Frank Snow did (and his explanation of the file format) that allowed me to write this utility. If you are able
      Message 2 of 8 , Jun 8, 2004
        Dear Bhante,

        Yes, it's because of the work that Frank Snow did (and his explanation
        of the file format) that allowed me to write this utility.

        If you are able to tell me which Unicode font you would like to be able
        to convert to, I am certain that I would be able to figure it out and
        would be willing to put some time into it (I'm a software developer by
        trade). The format is actually very simple to convert, so it is easy
        to test out changes fairly easily.

        Regards,
        Hans

        On Jun 8, 2004, at 4:22 PM, Bhikkhu Pesala wrote:

        > You are most probably aware of this utility:
        >
        > The CSCD Conversion Utility (CSCDCONV) is designed to be used with the
        > Vipassana Research Institute's Chattha Sangayana CD-ROM. For
        > information
        > on obtaining the CD, see:
        > http://www.tipitaka.org or http://www.vri.dhamma.org
        >
        > For the latest version of this program, see:
        > http://www.fsnow.com/pali/
        >
        > The output files produced by CSCDCONV are for personal use, and SHOULD
        > NOT
        > BE DISTRIBUTED.
        >
        > CSCDCONV, copyright 2001, Frank Snow, fsnow@...
        > CSCDCONV may be distributed freely.
        >
        > I have it, but don't use it. What might be useful is a utility that
        > converted the files to Unicode.
        >
      • Bhikkhu Pesala
        Unicode is not a font, but an international standard www.unicode.org for the allocation of characters in most languages. At the moment, Pali scholars use all
        Message 3 of 8 , Jun 9, 2004
          Unicode is not a font, but an international standard www.unicode.org for
          the allocation of characters in most languages. At the moment, Pali
          scholars use all kinds of different character mappings. Most are limited
          to just the ANSI character set, which is not enough. Unicode fonts use
          double-byte encoding to allow for more than 64,000 characters. The Pali
          characters are in LatinExtendedA and LatinExtendedAdditional character
          sets. Windows Unicode fonts like TNR and Verdana have the Pali vowels but
          not the consonants, which are all in LatinExtendedAdditional:

          http://homepage.ntlworld.com/bpesala/clipboard/LatinExtendedAdditional.png

          Not all applications support Unicode yet, but it will come in time. The
          current mish mash of incompatible fonts makes life difficult. If they had
          used Unicode for the CSCD Tipitaka, conversion utilities would not have
          been necessary. Ideally, we need to persuade VRI to bring out a Unicode
          version. Unicode supports Devanagiri, Myanmar, Sinhalese, Thai, Khmer,
          Mongolian, and Romanized Pali.

          You can find some ANSI and Unicode fonts on my website: The Titus Unicode
          font is pretty comprehensive.

          http://www.aimwell.org/Fonts/fonts.html
        • Hans Van Slooten
          Dear Bhante, Thanks for the information. It should be fairly trivilal to modify my application to output a UTF-8 text file with the proper Unicode encodings.
          Message 4 of 8 , Jun 9, 2004
            Dear Bhante,

            Thanks for the information. It should be fairly trivilal to modify my application to output a UTF-8 text file with the proper Unicode encodings. I may have something in within the next couple of days for the list to examine.

            Thanks again for the information,
            Hans


            On Wednesday, June 09, 2004, at 08:25AM, Bhikkhu Pesala <pesala@...> wrote:

            >Unicode is not a font, but an international standard www.unicode.org for
            >the allocation of characters in most languages. At the moment, Pali
            >scholars use all kinds of different character mappings. Most are limited
            >to just the ANSI character set, which is not enough. Unicode fonts use
            >double-byte encoding to allow for more than 64,000 characters. The Pali
            >characters are in LatinExtendedA and LatinExtendedAdditional character
            >sets. Windows Unicode fonts like TNR and Verdana have the Pali vowels but
            >not the consonants, which are all in LatinExtendedAdditional:
            >
            >http://homepage.ntlworld.com/bpesala/clipboard/LatinExtendedAdditional.png
            >
            >Not all applications support Unicode yet, but it will come in time. The
            >current mish mash of incompatible fonts makes life difficult. If they had
            >used Unicode for the CSCD Tipitaka, conversion utilities would not have
            >been necessary. Ideally, we need to persuade VRI to bring out a Unicode
            >version. Unicode supports Devanagiri, Myanmar, Sinhalese, Thai, Khmer,
            >Mongolian, and Romanized Pali.
            >
            >You can find some ANSI and Unicode fonts on my website: The Titus Unicode
            >font is pretty comprehensive.
            >
            >http://www.aimwell.org/Fonts/fonts.html
            >
            >
            >
            >
            >- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
            >[Homepage] http://www.tipitaka.net
            >[Send Message] pali@yahoogroups.com
            >Paaliga.na - a community for Pali students
            >Yahoo! Groups members can set their delivery options to daily digest or web only.
            >Yahoo! Groups Links
            >
            >
            >
            >
            >
            >
            >
          • Phra Noah Yuttadhammo
            ... Dear Bhante Pesala, Greetings venerable sir :) I just want to let you and others know that Mr. Snow s utility DOES convert the files to Unicode. I just
            Message 5 of 8 , Jun 10, 2004
              >
              > I have it, but don't use it. What might be useful is a utility that
              > converted the files to Unicode.
              >

              Dear Bhante Pesala,

              Greetings venerable sir :) I just want to let you and others know that Mr. Snow's utility DOES convert the files to Unicode. I
              just downloaded the file cscdconv3.zip and successfully converted the files to UTF-8 Unicode.

              Best wishes,

              Yuttadhammo (Phra Noah)

              --------------------------------------------------------------------------------
              Chom Tong Insight Meditation Center
              Wat Phradhatu Sri Chom Tong Voravihara
              T. Ban Luang, A. Chom Tong
              Chiang Mai, Thailand, 50160
              Website: - www.sirimangalo.org
              Tel: (66 - int'l.) (0 - in Thailand) 53 342 184
              --------------------------------------------------------------------------------
            • Bhikkhu Pesala
              Thanks for the info. The utility is very simple to use, and as you say, it does already convert to Unicode or other formats. I could not believe how fast it
              Message 6 of 8 , Jun 11, 2004
                Thanks for the info. The utility is very simple to use, and as you say, it
                does already convert to Unicode or other formats. I could not believe how
                fast it was. Converting the Patisambhidamagga took about 2 seconds.

                It requires the Indic Times Font, which can be downloaded from:

                http://jbe.gold.ac.uk/fonts/tcrUnicode.TTF

                It views OK in Internet Explorer, but one cannot enlarge the font. Display
                in Opera is poor, though one can then zoom in. One would need to modify
                the stylesheet I suspect, though I wouldn't know how to do this.
              • Bhikkhu Pesala
                I figured out how to edit the style sheet. The problem lies with the Indic Times font. I also find 12pt rather small for body text, so I enlarged it to 16pt
                Message 7 of 8 , Jun 12, 2004
                  I figured out how to edit the style sheet. The problem lies with the Indic
                  Times font. I also find 12pt rather small for body text, so I enlarged it
                  to 16pt and changed the font to my own Unicode Optimist. This is the
                  revised style sheet, which is suitable for Opera or Internet Explorer.
                  Copy the text and save it as tipitaka4.css after backing up the original.
                  It needs to be in the same directory as the converted HTML files. You
                  could replace "Unicode Optimist" with the font of your choice such as
                  "TITUS Cyberbit Basic."

                  /* This is the stylesheet for Roman UTF-8 Unicode encoding.
                  */

                  body { font-family:"Unicode Optimist";
                  background:white; }

                  SPAN {}
                  .variant {color: blue}

                  p {
                  border-top: 0in; border-bottom: 0in;
                  padding-top: 0in; padding-bottom: 0in;
                  margin-top: 0in; margin-bottom: 0.5cm;
                  }
                  /* */
                  .c01 { font-size: 16pt; text-indent: 2em; margin-bottom: 0em;}


                  .c02 { font-size: 16pt;}


                  .c03 { font-size: 16pt; text-indent: 2em;}

                  /* Namo tassa, and nitthita -- no unique structural distinction */
                  .c06 { font-size: 16pt; text-align:center;}

                  /* Unindented text */
                  .c07 { font-size: 16pt;}

                  /* Book */
                  .c10 { font-size: 21pt; text-align:center; font-weight: bold;}

                  /* Sutta */
                  .c11 { font-size: 18pt; text-align:center; font-weight: bold;}

                  /* Nikaaya */
                  .c12 { font-size: 24pt; text-align:center; font-weight: bold;}

                  /* Section (above 14)*/
                  .c13 { font-size: 16pt; text-align:center; font-weight: bold;}

                  /* Section */
                  .c14 { font-size: 16pt; text-align:center; font-weight: bold;}

                  /* Gatha line */
                  .c21 { font-size: 16pt; text-indent: 4em; margin-bottom: 0em;}

                  /* Gatha line */
                  .c22 { font-size: 16pt; text-indent: 7em; margin-bottom: 0.5cm;}

                  /* Gatha line */
                  .c26 { font-size: 16pt; text-indent: 4em; margin-bottom: 0em;}

                  /* Gatha line */
                  .c27 { font-size: 16pt; text-indent: 7em; margin-bottom: 0em;}


                  /*DN Muula Structure
                  {
                  06 Namo tassa..
                  12 Nikaaya
                  10 Book
                  }
                  |
                  |___11 Sutta
                  | |
                  | |___14 Section
                  | |
                  | |___ Text paras (01,02,03,07), gathas (21,22,26,27), various
                  nitthitas (06)
                  |
                  |___11 Sutta
                  (etc.)

                  */
                Your message has been successfully submitted and would be delivered to recipients shortly.