Loading ...
Sorry, an error occurred while loading the content.

CSCD to text file conversion utility

Expand Messages
  • Hans Van Slooten
    Hello everyone, I m fairly new to this list and have only been lurking, but I thought I d let you know that I threw together a small Perl utility for
    Message 1 of 8 , Jun 6, 2004
    • 0 Attachment
      Hello everyone,

      I'm fairly new to this list and have only been lurking, but I thought
      I'd let you know that I threw together a small Perl utility for
      converting the CSCD files to text format so you don't need a special
      font installed to view them.

      Anyway, it's not fully tested and I'm going to make it a little more
      user-friendly, but if you are early-adopter, you can get it here:

      http://www.hansvanslooten.com/cscd2txt.zip

      You will need to have a Perl scripting engine installed to use it. Mac
      OS X already has one, so it should work fine. In Windows, you can
      install ActivePerl from ActiveState, which is free:

      http://www.activestate.com/Products/ActivePerl

      Finally, to use the utility, run it like so from the command-line:

      perl cscd2txt.pl <input directory (where CSCD files are)> <output dir>

      e.g.

      perl cscd2txt.pl d:\cscd e:\tempdir

      Any suggestions or questions are welcome.

      Regards,
      Hans
    • Bhikkhu Pesala
      You are most probably aware of this utility: The CSCD Conversion Utility (CSCDCONV) is designed to be used with the Vipassana Research Institute s Chattha
      Message 2 of 8 , Jun 8, 2004
      • 0 Attachment
        You are most probably aware of this utility:

        The CSCD Conversion Utility (CSCDCONV) is designed to be used with the
        Vipassana Research Institute's Chattha Sangayana CD-ROM. For information
        on obtaining the CD, see:
        http://www.tipitaka.org or http://www.vri.dhamma.org

        For the latest version of this program, see:
        http://www.fsnow.com/pali/

        The output files produced by CSCDCONV are for personal use, and SHOULD NOT
        BE DISTRIBUTED.

        CSCDCONV, copyright 2001, Frank Snow, fsnow@...
        CSCDCONV may be distributed freely.

        I have it, but don't use it. What might be useful is a utility that
        converted the files to Unicode.
      • Hans Van Slooten
        Dear Bhante, Yes, it s because of the work that Frank Snow did (and his explanation of the file format) that allowed me to write this utility. If you are able
        Message 3 of 8 , Jun 8, 2004
        • 0 Attachment
          Dear Bhante,

          Yes, it's because of the work that Frank Snow did (and his explanation
          of the file format) that allowed me to write this utility.

          If you are able to tell me which Unicode font you would like to be able
          to convert to, I am certain that I would be able to figure it out and
          would be willing to put some time into it (I'm a software developer by
          trade). The format is actually very simple to convert, so it is easy
          to test out changes fairly easily.

          Regards,
          Hans

          On Jun 8, 2004, at 4:22 PM, Bhikkhu Pesala wrote:

          > You are most probably aware of this utility:
          >
          > The CSCD Conversion Utility (CSCDCONV) is designed to be used with the
          > Vipassana Research Institute's Chattha Sangayana CD-ROM. For
          > information
          > on obtaining the CD, see:
          > http://www.tipitaka.org or http://www.vri.dhamma.org
          >
          > For the latest version of this program, see:
          > http://www.fsnow.com/pali/
          >
          > The output files produced by CSCDCONV are for personal use, and SHOULD
          > NOT
          > BE DISTRIBUTED.
          >
          > CSCDCONV, copyright 2001, Frank Snow, fsnow@...
          > CSCDCONV may be distributed freely.
          >
          > I have it, but don't use it. What might be useful is a utility that
          > converted the files to Unicode.
          >
        • Bhikkhu Pesala
          Unicode is not a font, but an international standard www.unicode.org for the allocation of characters in most languages. At the moment, Pali scholars use all
          Message 4 of 8 , Jun 9, 2004
          • 0 Attachment
            Unicode is not a font, but an international standard www.unicode.org for
            the allocation of characters in most languages. At the moment, Pali
            scholars use all kinds of different character mappings. Most are limited
            to just the ANSI character set, which is not enough. Unicode fonts use
            double-byte encoding to allow for more than 64,000 characters. The Pali
            characters are in LatinExtendedA and LatinExtendedAdditional character
            sets. Windows Unicode fonts like TNR and Verdana have the Pali vowels but
            not the consonants, which are all in LatinExtendedAdditional:

            http://homepage.ntlworld.com/bpesala/clipboard/LatinExtendedAdditional.png

            Not all applications support Unicode yet, but it will come in time. The
            current mish mash of incompatible fonts makes life difficult. If they had
            used Unicode for the CSCD Tipitaka, conversion utilities would not have
            been necessary. Ideally, we need to persuade VRI to bring out a Unicode
            version. Unicode supports Devanagiri, Myanmar, Sinhalese, Thai, Khmer,
            Mongolian, and Romanized Pali.

            You can find some ANSI and Unicode fonts on my website: The Titus Unicode
            font is pretty comprehensive.

            http://www.aimwell.org/Fonts/fonts.html
          • Hans Van Slooten
            Dear Bhante, Thanks for the information. It should be fairly trivilal to modify my application to output a UTF-8 text file with the proper Unicode encodings.
            Message 5 of 8 , Jun 9, 2004
            • 0 Attachment
              Dear Bhante,

              Thanks for the information. It should be fairly trivilal to modify my application to output a UTF-8 text file with the proper Unicode encodings. I may have something in within the next couple of days for the list to examine.

              Thanks again for the information,
              Hans


              On Wednesday, June 09, 2004, at 08:25AM, Bhikkhu Pesala <pesala@...> wrote:

              >Unicode is not a font, but an international standard www.unicode.org for
              >the allocation of characters in most languages. At the moment, Pali
              >scholars use all kinds of different character mappings. Most are limited
              >to just the ANSI character set, which is not enough. Unicode fonts use
              >double-byte encoding to allow for more than 64,000 characters. The Pali
              >characters are in LatinExtendedA and LatinExtendedAdditional character
              >sets. Windows Unicode fonts like TNR and Verdana have the Pali vowels but
              >not the consonants, which are all in LatinExtendedAdditional:
              >
              >http://homepage.ntlworld.com/bpesala/clipboard/LatinExtendedAdditional.png
              >
              >Not all applications support Unicode yet, but it will come in time. The
              >current mish mash of incompatible fonts makes life difficult. If they had
              >used Unicode for the CSCD Tipitaka, conversion utilities would not have
              >been necessary. Ideally, we need to persuade VRI to bring out a Unicode
              >version. Unicode supports Devanagiri, Myanmar, Sinhalese, Thai, Khmer,
              >Mongolian, and Romanized Pali.
              >
              >You can find some ANSI and Unicode fonts on my website: The Titus Unicode
              >font is pretty comprehensive.
              >
              >http://www.aimwell.org/Fonts/fonts.html
              >
              >
              >
              >
              >- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              >[Homepage] http://www.tipitaka.net
              >[Send Message] pali@yahoogroups.com
              >Paaliga.na - a community for Pali students
              >Yahoo! Groups members can set their delivery options to daily digest or web only.
              >Yahoo! Groups Links
              >
              >
              >
              >
              >
              >
              >
            • Phra Noah Yuttadhammo
              ... Dear Bhante Pesala, Greetings venerable sir :) I just want to let you and others know that Mr. Snow s utility DOES convert the files to Unicode. I just
              Message 6 of 8 , Jun 10, 2004
              • 0 Attachment
                >
                > I have it, but don't use it. What might be useful is a utility that
                > converted the files to Unicode.
                >

                Dear Bhante Pesala,

                Greetings venerable sir :) I just want to let you and others know that Mr. Snow's utility DOES convert the files to Unicode. I
                just downloaded the file cscdconv3.zip and successfully converted the files to UTF-8 Unicode.

                Best wishes,

                Yuttadhammo (Phra Noah)

                --------------------------------------------------------------------------------
                Chom Tong Insight Meditation Center
                Wat Phradhatu Sri Chom Tong Voravihara
                T. Ban Luang, A. Chom Tong
                Chiang Mai, Thailand, 50160
                Website: - www.sirimangalo.org
                Tel: (66 - int'l.) (0 - in Thailand) 53 342 184
                --------------------------------------------------------------------------------
              • Bhikkhu Pesala
                Thanks for the info. The utility is very simple to use, and as you say, it does already convert to Unicode or other formats. I could not believe how fast it
                Message 7 of 8 , Jun 11, 2004
                • 0 Attachment
                  Thanks for the info. The utility is very simple to use, and as you say, it
                  does already convert to Unicode or other formats. I could not believe how
                  fast it was. Converting the Patisambhidamagga took about 2 seconds.

                  It requires the Indic Times Font, which can be downloaded from:

                  http://jbe.gold.ac.uk/fonts/tcrUnicode.TTF

                  It views OK in Internet Explorer, but one cannot enlarge the font. Display
                  in Opera is poor, though one can then zoom in. One would need to modify
                  the stylesheet I suspect, though I wouldn't know how to do this.
                • Bhikkhu Pesala
                  I figured out how to edit the style sheet. The problem lies with the Indic Times font. I also find 12pt rather small for body text, so I enlarged it to 16pt
                  Message 8 of 8 , Jun 12, 2004
                  • 0 Attachment
                    I figured out how to edit the style sheet. The problem lies with the Indic
                    Times font. I also find 12pt rather small for body text, so I enlarged it
                    to 16pt and changed the font to my own Unicode Optimist. This is the
                    revised style sheet, which is suitable for Opera or Internet Explorer.
                    Copy the text and save it as tipitaka4.css after backing up the original.
                    It needs to be in the same directory as the converted HTML files. You
                    could replace "Unicode Optimist" with the font of your choice such as
                    "TITUS Cyberbit Basic."

                    /* This is the stylesheet for Roman UTF-8 Unicode encoding.
                    */

                    body { font-family:"Unicode Optimist";
                    background:white; }

                    SPAN {}
                    .variant {color: blue}

                    p {
                    border-top: 0in; border-bottom: 0in;
                    padding-top: 0in; padding-bottom: 0in;
                    margin-top: 0in; margin-bottom: 0.5cm;
                    }
                    /* */
                    .c01 { font-size: 16pt; text-indent: 2em; margin-bottom: 0em;}


                    .c02 { font-size: 16pt;}


                    .c03 { font-size: 16pt; text-indent: 2em;}

                    /* Namo tassa, and nitthita -- no unique structural distinction */
                    .c06 { font-size: 16pt; text-align:center;}

                    /* Unindented text */
                    .c07 { font-size: 16pt;}

                    /* Book */
                    .c10 { font-size: 21pt; text-align:center; font-weight: bold;}

                    /* Sutta */
                    .c11 { font-size: 18pt; text-align:center; font-weight: bold;}

                    /* Nikaaya */
                    .c12 { font-size: 24pt; text-align:center; font-weight: bold;}

                    /* Section (above 14)*/
                    .c13 { font-size: 16pt; text-align:center; font-weight: bold;}

                    /* Section */
                    .c14 { font-size: 16pt; text-align:center; font-weight: bold;}

                    /* Gatha line */
                    .c21 { font-size: 16pt; text-indent: 4em; margin-bottom: 0em;}

                    /* Gatha line */
                    .c22 { font-size: 16pt; text-indent: 7em; margin-bottom: 0.5cm;}

                    /* Gatha line */
                    .c26 { font-size: 16pt; text-indent: 4em; margin-bottom: 0em;}

                    /* Gatha line */
                    .c27 { font-size: 16pt; text-indent: 7em; margin-bottom: 0em;}


                    /*DN Muula Structure
                    {
                    06 Namo tassa..
                    12 Nikaaya
                    10 Book
                    }
                    |
                    |___11 Sutta
                    | |
                    | |___14 Section
                    | |
                    | |___ Text paras (01,02,03,07), gathas (21,22,26,27), various
                    nitthitas (06)
                    |
                    |___11 Sutta
                    (etc.)

                    */
                  Your message has been successfully submitted and would be delivered to recipients shortly.