Loading ...
Sorry, an error occurred while loading the content.

Re: [cosmacelf] Re:

Expand Messages
  • Lee Hart
    ... Drat; I see what you mean. This is about what I saw 10 years ago with the software available then. I was hoping it had improved. -- Lee A. Hart | Ring the
    Message 1 of 7 , Apr 30, 2010
    • 0 Attachment
      On 4/30/2010 8:47 PM, schultdw wrote:
      > I did take a shot at using the tools unpaper and gocr but the results were not encouraging.
      >
      > C_nlr_l Proc4sing Unil. The c_nlral procrssor for
      > i the CDPl8S6_8 Micraboard Computer is lh_ 8-bil
      > ; silicon-gale CMOS RCA COSMAC Microproctssor
      > I CDP l802. Thc C0P l802 has l6 gentral-purpost r_g-
      > isl_rs _ach l 6 bits wid_. Any onc oF lh_s_ regist_rs m8y
      > ! b_ dynamically desi_n8__d as lhe program counler
      > _h_r_by giving lhe _'yslem muIliple program stales.

      Drat; I see what you mean. This is about what I saw 10 years ago with
      the software available then. I was hoping it had improved.

      --
      Lee A. Hart | Ring the bells that still can ring
      814 8th Ave N | Forget the perfect offering
      Sartell MN 56377 | There is a crack in everything
      leeahart earthlink.net | That's how the light gets in -- Leonard Cohen
    • Richard
      Years ago I worked for a company which produces software for archiving documents. Scanning and trying to get the content with OCR always was the last choice.
      Message 2 of 7 , May 1, 2010
      • 0 Attachment
        Years ago I worked for a company which produces software for archiving documents. Scanning and trying to get the content with OCR always was the last choice. When archiving many thousands of documents each day it's impossible to manually correct the errors even if only one percent of them does not work. Instead we always parsed more generic formats like printer streams whenever that was possible.


        Unfortunately that is not an option here. Eventually you may get better results if you try different scans at different resolutions to get less OCR errors.

        --- In cosmacelf@yahoogroups.com, "schultdw" <david.schultz@...> wrote:
        >
        > -- In cosmacelf@yahoogroups.com, Lee Hart <leeahart@> wrote:
        > > Are there still no decent image-to-text converters? Most of these
        > > manuals are virtually all text.
        > >
        >
        > I did take a shot at using the tools unpaper and gocr but the results were not encouraging.
        >
        > example:
        >
        > C_nlr_l Proc4sing Unil. The c_nlral procrssor for
        > i the CDPl8S6_8 Micraboard Computer is lh_ 8-bil
        > ; silicon-gale CMOS RCA COSMAC Microproctssor
        > I CDP l802. Thc C0P l802 has l6 gentral-purpost r_g-
        > isl_rs _ach l 6 bits wid_. Any onc oF lh_s_ regist_rs m8y
        > ! b_ dynamically desi_n8__d as lhe program counler
        > _h_r_by giving lhe _'yslem muIliple program stales.
        >
        >
        >
        > Getting usable results would require a lot of cleanup and double checking. Given my estimate of the usefulness of the information and the size of the target audience, it simply isn't worth the effort.
        >
      • Richard
        OCR basically is pattern recognition. The artificial intelligence stuff I ranted about a few days ago is often used for it. The problem is that a trained AI
        Message 3 of 7 , May 1, 2010
        • 0 Attachment
          OCR basically is pattern recognition. The artificial intelligence stuff I ranted about a few days ago is often used for it. The problem is that a trained AI algorithm usually is acceptably good at recognizing characters at a certain resolution and also often prefers certain fonts over others. So experimenting with different OCRs and scans at different resolutions may improve the results. Up to now those algorithms by far do not play in the same league as the pattern recognition in our brains.

          --- In cosmacelf@yahoogroups.com, Lee Hart <leeahart@...> wrote:
          >
          > On 4/30/2010 8:47 PM, schultdw wrote:
          > > I did take a shot at using the tools unpaper and gocr but the results were not encouraging.
          > >
          > > C_nlr_l Proc4sing Unil. The c_nlral procrssor for
          > > i the CDPl8S6_8 Micraboard Computer is lh_ 8-bil
          > > ; silicon-gale CMOS RCA COSMAC Microproctssor
          > > I CDP l802. Thc C0P l802 has l6 gentral-purpost r_g-
          > > isl_rs _ach l 6 bits wid_. Any onc oF lh_s_ regist_rs m8y
          > > ! b_ dynamically desi_n8__d as lhe program counler
          > > _h_r_by giving lhe _'yslem muIliple program stales.
          >
          > Drat; I see what you mean. This is about what I saw 10 years ago with
          > the software available then. I was hoping it had improved.
          >
          > --
          > Lee A. Hart | Ring the bells that still can ring
          > 814 8th Ave N | Forget the perfect offering
          > Sartell MN 56377 | There is a crack in everything
          > leeahart earthlink.net | That's how the light gets in -- Leonard Cohen
          >
        • RickC
          I m not sure if you saw it but there was news of using spam defeating software to decode everything from the Dead Sea Scrolls to handwriting. What they did was
          Message 4 of 7 , May 1, 2010
          • 0 Attachment
            I'm not sure if you saw it but there was news of using spam defeating software to decode everything from the Dead Sea Scrolls to handwriting. What they did was had a two word format with a known and something they wanted to decode/recognize. They would use something like
            [Toy boat]
            With 'Toy' known and 'boat' being a clip of a photoscanned document. If people got 'Toy' right their guess at 'boat' got weighted heavier and got pooled with others that got 'Toy' right.

            Be nice if the general public could submit scans like this to a general decoding database and get the results. We'd probably be last on the list behind historical documents, Dead Sea Scrolls, that kind of stuff.

            --- In cosmacelf@yahoogroups.com, "Richard" <r.dienstknecht@...> wrote:
            >
            > OCR basically is pattern recognition. The artificial intelligence stuff I ranted about a few days ago is often used for it. The problem is that a trained AI algorithm usually is acceptably good at recognizing characters at a certain resolution and also often prefers certain fonts over others. So experimenting with different OCRs and scans at different resolutions may improve the results. Up to now those algorithms by far do not play in the same league as the pattern recognition in our brains.
            <snip>
          Your message has been successfully submitted and would be delivered to recipients shortly.