Loading ...
Sorry, an error occurred while loading the content.
 

VIPER Volume 1 document(s) now online, RFC (COSMAC VIP)

Expand Messages
  • sbirdasn
    Ok, I m a little slow... I finally managed to get everything working at once and the spare time to scan in one issue of VIPER volume 1 and make a nice clean
    Message 1 of 5 , Jun 13, 2005
      Ok, I'm a little slow...

      I finally managed to get everything working at once and the spare time
      to scan in one issue of VIPER volume 1 and make a nice clean document
      of the results.

      I have posted the resulting files on my web page and would be
      interested in whether others would be interested in using them to make
      more compact versions for general posting in the forum or on other
      COSMAC sites.

      I don't have the tools to make true native PDF files, so the files
      tend to be very big (we've had this file size discussion before). The
      point is, even though the results are pretty clear, they are very big
      files. So, to properly trim them down would require some loving
      attention by someone who has access to full Adobe Acrobat or
      equivalent tools and treat it as a "from scratch" document generation
      to keep the content/look/feel substantially the same while avoiding
      the bloat. Bloat that is directly caused by the fact that the present
      PDF file is basically just a PDF wrapper around a 600 dpi B&W scan
      graphic that fits the active area of an HP Laser Jet 1100 printer.

      Note that the files are being made available under the conditions set
      by the VIPER publisher, so I expect everyone who uses them to abide by
      those conditions as I have outlined on my web site.

      Here's the link:

      http://www.xmission.com/~a_naef/cosmac/viper1.html

      Click on the thumbnail to get to the download page.

      So, what does everyone think?

      Is my chosen scanned image formats acceptable? Or should I supply
      grayscale scans to improve OCR accuracy or the raw text pre-OCR'ed by
      me (and proofread) and a low resolution scan to show formatting?

      My next priority would be Vol. 1 Issue 3 (VIP ROM listing w/
      commentary), followed by Vol. 1, Issues 6-8 which contain the Studio
      II conversion project that was discussed not so long ago. Followed by
      the rest of the issues and the index to Volume 1.

      But, I don't want to waste a lot of time scanning in stuff that is not
      in a format that will make other people's jobs easier to squash down
      to more reasonable file sizes (assuming anyone is interested in doing
      so that has the right tools).

      Then there is the issue of printing errors and using subsequent
      published errata in later issues to make corrections to the issues
      that have problems so that the final results have accurate code
      listings without having to jump around to the various issues to get a
      working program.

      I'm thinking that putting in corrections is somewhat like
      re-publishing the issue, but if proper attribution is given in the new
      document version and the changes are duly noted, it would be fine (for
      cosmacelf club group use).

      Comments invited.

      Tony.
    • Bill Rowe
      I would have to say this was beautifully done and the content is very interesting. Perhaps someone is able to do a better pdf conversion but the zipped files
      Message 2 of 5 , Jun 17, 2005
        I would have to say this was beautifully done and the content is very
        interesting. Perhaps someone is able to do a better pdf conversion but the
        zipped files are not all that big.
        ----- Original Message -----
        From: "sbirdasn" <sbirdasn@...>
        To: <cosmacelf@yahoogroups.com>
        Sent: Monday, June 13, 2005 11:16 PM
        Subject: [cosmacelf] VIPER Volume 1 document(s) now online, RFC (COSMAC VIP)


        > Ok, I'm a little slow...
        >
        > I finally managed to get everything working at once and the spare time
        > to scan in one issue of VIPER volume 1 and make a nice clean document
        > of the results.
        >
        > I have posted the resulting files on my web page and would be
        > interested in whether others would be interested in using them to make
        > more compact versions for general posting in the forum or on other
        > COSMAC sites.
        >
        > I don't have the tools to make true native PDF files, so the files
        > tend to be very big (we've had this file size discussion before). The
        > point is, even though the results are pretty clear, they are very big
        > files. So, to properly trim them down would require some loving
        > attention by someone who has access to full Adobe Acrobat or
        > equivalent tools and treat it as a "from scratch" document generation
        > to keep the content/look/feel substantially the same while avoiding
        > the bloat. Bloat that is directly caused by the fact that the present
        > PDF file is basically just a PDF wrapper around a 600 dpi B&W scan
        > graphic that fits the active area of an HP Laser Jet 1100 printer.
        >
        > Note that the files are being made available under the conditions set
        > by the VIPER publisher, so I expect everyone who uses them to abide by
        > those conditions as I have outlined on my web site.
        >
        > Here's the link:
        >
        > http://www.xmission.com/~a_naef/cosmac/viper1.html
        >
        > Click on the thumbnail to get to the download page.
        >
        > So, what does everyone think?
        >
        > Is my chosen scanned image formats acceptable? Or should I supply
        > grayscale scans to improve OCR accuracy or the raw text pre-OCR'ed by
        > me (and proofread) and a low resolution scan to show formatting?
        >
        > My next priority would be Vol. 1 Issue 3 (VIP ROM listing w/
        > commentary), followed by Vol. 1, Issues 6-8 which contain the Studio
        > II conversion project that was discussed not so long ago. Followed by
        > the rest of the issues and the index to Volume 1.
        >
        > But, I don't want to waste a lot of time scanning in stuff that is not
        > in a format that will make other people's jobs easier to squash down
        > to more reasonable file sizes (assuming anyone is interested in doing
        > so that has the right tools).
        >
        > Then there is the issue of printing errors and using subsequent
        > published errata in later issues to make corrections to the issues
        > that have problems so that the final results have accurate code
        > listings without having to jump around to the various issues to get a
        > working program.
        >
        > I'm thinking that putting in corrections is somewhat like
        > re-publishing the issue, but if proper attribution is given in the new
        > document version and the changes are duly noted, it would be fine (for
        > cosmacelf club group use).
        >
        > Comments invited.
        >
        > Tony.
        >
        >
        >
        >
        > ========================================================
        > Visit the COSMAC ELF website at http://www.cosmacelf.com
        > Yahoo! Groups Links
        >
        >
        >
        >
        >
        >
        >
      • sbirdasn
        ... very ... but the ... Thanks for the compliment. I put in a lot of effort to clean up the scans to get maximum clarity considering the condition of the
        Message 3 of 5 , Jun 18, 2005
          --- In cosmacelf@yahoogroups.com, "Bill Rowe" <bill_rowe@r...> wrote:
          > I would have to say this was beautifully done and the content is
          very
          > interesting. Perhaps someone is able to do a better pdf conversion
          but the
          > zipped files are not all that big.

          Thanks for the compliment. I put in a lot of effort to clean up the
          scans to get maximum clarity considering the condition of the
          originals. I had to experiment with various scanner controls and tried
          various tricks to get the scans to a level that would make other
          people's job easier in converting accurately.

          That being said, I often find that grayscale scans often work better
          in OCR tools. I sacrificed file size for resolution, so the images
          I've posted may not OCR as well as one might hope.

          Size is a relative thing. Once all ten issues are scanned at the
          present resolution of 600 DPI, the storage is quite substantial. The
          cosmacelf group storage budget would be busted on just VIPER Vol. 1
          alone, I think.

          I'm not averse to hosting the files permanently, and being linked to
          by the various COSMAC web sites. But it's nice to have things strongly
          related in close proximity to each other for ease of access. Squashing
          the content down for permanent archive storage would be a kindness for
          whoever becomes the final repository for the files. That, and
          downloading all ten issues will be painful for those on dialup if the
          present format is retained.

          Tony.
        • Bill Rowe
          Ok, as I say, the content is great and I bet the printed parts would OCR well but I m still not convinced it s a good idea. An OCR- PDF exercise would
          Message 4 of 5 , Jun 19, 2005
            Ok, as I say, the content is great and I bet the printed parts would OCR
            well but I'm still not convinced it's a good idea. An OCR->PDF exercise
            would probably be even more of a labour of love than your initial scanning.
            Your 600 DPI scans PDF'd to 2.9M and zipped to 2.5. If you degraded them
            to 300DPI (probably still lovely for anything other than OCR) you might be
            down in the 750K range. My cat sneezes bigger than that.

            I did try my OCR program (OmniPage 9.0) but it coughed on the TIFFs. I'll
            see if there's a later version.

            ----- Original Message -----
            From: "sbirdasn" <sbirdasn@...>
            To: <cosmacelf@yahoogroups.com>
            Sent: Saturday, June 18, 2005 8:45 PM
            Subject: [cosmacelf] Re: VIPER Volume 1 document(s) now online, RFC (COSMAC
            VIP)


            > --- In cosmacelf@yahoogroups.com, "Bill Rowe" <bill_rowe@r...> wrote:
            > > I would have to say this was beautifully done and the content is
            > very
            > > interesting. Perhaps someone is able to do a better pdf conversion
            > but the
            > > zipped files are not all that big.
            >
            > Thanks for the compliment. I put in a lot of effort to clean up the
            > scans to get maximum clarity considering the condition of the
            > originals. I had to experiment with various scanner controls and tried
            > various tricks to get the scans to a level that would make other
            > people's job easier in converting accurately.
            >
            > That being said, I often find that grayscale scans often work better
            > in OCR tools. I sacrificed file size for resolution, so the images
            > I've posted may not OCR as well as one might hope.
            >
            > Size is a relative thing. Once all ten issues are scanned at the
            > present resolution of 600 DPI, the storage is quite substantial. The
            > cosmacelf group storage budget would be busted on just VIPER Vol. 1
            > alone, I think.
            >
            > I'm not averse to hosting the files permanently, and being linked to
            > by the various COSMAC web sites. But it's nice to have things strongly
            > related in close proximity to each other for ease of access. Squashing
            > the content down for permanent archive storage would be a kindness for
            > whoever becomes the final repository for the files. That, and
            > downloading all ten issues will be painful for those on dialup if the
            > present format is retained.
            >
            > Tony.
            >
            >
            >
            >
            >
            > ========================================================
            > Visit the COSMAC ELF website at http://www.cosmacelf.com
            > Yahoo! Groups Links
            >
            >
            >
            >
            >
            >
          • sbirdasn
            ... would OCR ... exercise ... scanning. Agreed, I made mention from the start that a true PDF document will be a labor of love and quite time intensive.
            Message 5 of 5 , Jun 19, 2005
              --- In cosmacelf@yahoogroups.com, "Bill Rowe" <bill_rowe@r...> wrote:
              > Ok, as I say, the content is great and I bet the printed parts
              would OCR
              > well but I'm still not convinced it's a good idea. An OCR->PDF
              exercise
              > would probably be even more of a labour of love than your initial
              scanning.

              Agreed, I made mention from the start that a true PDF document will be
              a labor of love and quite time intensive. However, the result is very
              small files if the fonts are native and graphics are only used
              judiciously. As it is now, the whole document is just one giant
              (somewhat) compressed graphic.

              > Your 600 DPI scans PDF'd to 2.9M and zipped to 2.5. If you
              degraded them
              > to 300DPI (probably still lovely for anything other than OCR) you
              might be
              > down in the 750K range. My cat sneezes bigger than that.

              Actually, the 600 DPI choice was for several reasons, some entirely
              selfish--

              First, I wanted to have my own electronic archive that is as close to
              the originals in quality possible with reasonable (to me) local
              storage requirements so that I can print them out and use them instead
              of wearing out the originals, or replace them in case of damage
              to/loss thereof. 600 DPI is also my laser printer's native resolution,
              thus I get a close approximation to a high-end photocopy result.

              Also, 300 DPI grayscale is double the bytes/inch^2 of 600 DPI B&W, so
              you have to drop down to 150 DPI just to break even in the storage
              requirements, and that sounds like a significant loss of quality for
              what I'm trying to accomplish.

              I resisted the urge to do 1200 DPI grayscale (!) with the files
              bulging the sides of my hard disk subsystem. That's the native (square
              pixel) scanner resolution and doesn't end up using interpolation
              tricks that really don't do much for you in the final analysis when
              quality is the main goal.

              Putting my versions online was essentially a freebee, and required no
              extra work on my part than I was already willing to invest for my own
              use.

              Second, I wanted to demonstrate the quality/condition possible from
              the originals for those who may want to "do the labor of love" and
              make a true PDF version.

              Third, I'm not sure what results in better copy, scanning at higher
              resolution then converting to lower resolution, or simply scanning in
              the final (lower) resolution and using it directly.

              I'm not a graphic design artist by trade, so I'm not claiming any real
              knowledge of what I'm doing here, or the best way or tools to do it.

              300 DPI gets a bit ratty on some of the hand drawn line art and some
              of the worst pages that suffered from poor originals as they came from
              the print shop. The cleanup gets quite extensive as the resolution
              goes down to make it look nice.

              > I did try my OCR program (OmniPage 9.0) but it coughed on the TIFFs.
              I'll
              > see if there's a later version.

              Don't bother with OnmiPage 9, as I already have V11, and it is what
              was used to make the present PDF's. It will undoubtedly do better in
              OCR mode than V9. I think that V9 came with my scanner, and I bought
              the full retail V11 later. I found that V11 was noticeably better
              accuracy in doing OCR.

              If you're using OmniPage, then try the .max (ScanSoft products file
              format) stacks instead of TIFF. I chose TIFF instead of JPEG because
              it is not lossy, and I wanted to show the quality of the scans without
              risking quality issues. I didn't know if anyone else had ScanSoft
              products so I put up images in what I thought would be a fairly
              mainstream file format for use by high-end tools that I don't have.

              And one more thing about OmniPage- I have found that it actually does
              better when working with 150 DPI grayscale than higher resolution B&W
              images for OCR accuracy. And I have not gone to the trouble of doing
              those scans yet. But I don't know if that would be a factor for OCR
              tools others might have available.

              It will be interesting to see if others with access to more powerful
              tools and more skills in such things will weigh in on what
              format/resolution is best for the group is.

              Tony.
            Your message has been successfully submitted and would be delivered to recipients shortly.