Loading ...
Sorry, an error occurred while loading the content.

Scanning a LOT of 3x5 File cards

Expand Messages
  • RAILDATA
    Hi All, I am a railroad historian & researcher. I mostly scan railroad issued paper items including diagram books, Employee Timetables etc. I have scanned tens
    Message 1 of 7 , May 26 6:03 PM
    • 0 Attachment
      Hi All,

      I am a railroad historian & researcher. I mostly scan railroad issued paper items including diagram books, Employee Timetables etc. I have scanned tens of thousands of pages and have the equipment, software & techniques down pretty well for my needs.

      I have just been given the opportunity to scan the database of one of the premier steam locomotive historians. The database of his day (1950s on) was 3x5 file cards. They are handwritten with data on both sides. The handwriting pretty well rules out OCR, which is not a big concern.

      The collection size is what is raising questions for me. I estimate it to be in excess of 350,000 cards. Has anyone had experience with this or thoughts they would share? I have found card scanners, but most dont have the durability to live through such a task. I am on a limited budget and need to stay below 400$ if at all possible. I have found several at the >$1000 range but they are out of the question. One at a time on a flatbed is also not not feasable as I wouldnt live long enough to finish it.

      Any suggestions at all would be most appreciated.

      Allen Stanley
      Greer, SC
    • Jean Pec
      This may be way too elementary, but have you considered laying the cards out in the style that the NYPL and the Mansell National Union Catalog use? At this
      Message 2 of 7 , May 27 5:32 AM
      • 0 Attachment
        This may be way too elementary, but have you considered laying the cards out in the style that the NYPL and the Mansell National Union Catalog use? At this moment I'm not sure whether the pages are 4 card columns wide and 6 to 8 cards long or some other "dimension," however, the cards displayed on the pages are easy to follow. Then you could do the versos in the same style. While I'm not sure how large your flatbed is, you could work out a pattern even if you had to rotate the display in your final product. Of course, if you scan the front and then the verso of each card, information is kept together.
        In any event, your project sounds both interesting and time consuming.
        Good luck from a boilermaker's daughter, Jean

        Jean A. Pec, Head of Preservation,
        Gelman Library, George Washington University
        2130 H. Street, N. W., Washington, D.C. 20052
        202-994-8886 fax 202-994-6464

        ----- Original Message -----
        From: RAILDATA <raildata@...>
        Date: Wednesday, May 26, 2010 10:49 pm
        Subject: [digital-text] Scanning a LOT of 3x5 File cards
        To: digital-text@yahoogroups.com


        > Hi All,
        >
        > I am a railroad historian & researcher. I mostly scan railroad issued
        > paper items including diagram books, Employee Timetables etc. I have
        > scanned tens of thousands of pages and have the equipment, software &
        > techniques down pretty well for my needs.
        >
        > I have just been given the opportunity to scan the database of one of
        > the premier steam locomotive historians. The database of his day
        > (1950s on) was 3x5 file cards. They are handwritten with data on both
        > sides. The handwriting pretty well rules out OCR, which is not a big
        > concern.
        >
        > The collection size is what is raising questions for me. I estimate
        > it to be in excess of 350,000 cards. Has anyone had experience with
        > this or thoughts they would share? I have found card scanners, but
        > most dont have the durability to live through such a task. I am on a
        > limited budget and need to stay below 400$ if at all possible. I have
        > found several at the >$1000 range but they are out of the question.
        > One at a time on a flatbed is also not not feasable as I wouldnt live
        > long enough to finish it.
        >
        > Any suggestions at all would be most appreciated.
        >
        > Allen Stanley
        > Greer, SC
        >


        [Non-text portions of this message have been removed]
      • Lars Aronsson
        ... First, get out of the limited budget thinking. This sounds like an interesting project, and you should be able to get some more money somewhere, somehow.
        Message 3 of 7 , May 27 6:19 AM
        • 0 Attachment
          Allen Stanely wrote:
          > I have just been given the opportunity to scan the database of one of the premier steam locomotive historians. The database of his day (1950s on) was 3x5 file cards. They are handwritten with data on both sides. The handwriting pretty well rules out OCR, which is not a big concern.
          >
          > The collection size is what is raising questions for me. I estimate it to be in excess of 350,000 cards. Has anyone had experience with this or thoughts they would share? I have found card scanners, but most dont have the durability to live through such a task. I am on a limited budget and need to stay below 400$ if at all possible. I have found several at the >$1000 range but they are out of the question. One at a time on a flatbed is also not not feasable as I wouldnt live long enough to finish it.

          First, get out of the "limited budget" thinking. This sounds
          like an interesting project, and you should be able to
          get some more money somewhere, somehow. You need to
          invest enough of your own time in this, anyway, that
          maybe you should get a group of people helping you.

          Scanning cards is not different from book pages for those
          books that lost their spine. Many desktop document
          scanners are able to scan business cards, such as
          the Fujitsu ScanSnap S1500 or Canon DR-2050 models.
          These are in the range $ 300-600 and if they give up
          when you are half ways, you only need to get another one.

          For books, which often have less than 800 pages, it is
          useful to make one PDF or Djvu document for the
          whole book, containing images and maybe OCR text.
          With the right scanner and compression settings, these
          files stay well under 100 megabytes each.

          For 350,000 cards, each maybe having two sides, all
          700,000 images could not be stored in one file, so you
          need to consider if you should produce 700,000 files,
          one for each image, or if you can find some other
          logical grouping. If there's no other way to group
          them, maybe one day's worth of scanning is a group.




          --
          Lars Aronsson (lars@...)
          Aronsson Datateknik - http://aronsson.se

          Project Runeberg - free Nordic literature - http://runeberg.org/
        • Terry Smythe
          Allen Stanleyraildata@yahoo.com says: I am a railroad historian& researcher. I mostly scan railroad issued paper items including diagram books, Employee
          Message 4 of 7 , May 27 8:26 AM
          • 0 Attachment
            Allen Stanleyraildata@... says:

            I am a railroad historian& researcher. I mostly scan railroad issued paper items including diagram books, Employee Timetables etc. I have scanned tens of thousands of pages and have the equipment, software& techniques down pretty well for my needs.

            I have just been given the opportunity to scan the database of one of the premier steam locomotive historians. The database of his day (1950s on) was 3x5 file cards. They are handwritten with data on both sides. The handwriting pretty well rules out OCR, which is not a big concern.

            The collection size is what is raising questions for me. I estimate it to be in excess of 350,000 cards. Has anyone had experience with this or thoughts they would share? I have found card scanners, but most dont have the durability to live through such a task. I am on a limited budget and need to stay below 400$ if at all possible. I have found several at the>$1000 range but they are out of the question. One at a time on a flatbed is also not not feasable as I wouldnt live long enough to finish it.

            +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

            Allen,

            Sounds like you and I have followed similar paths. I too am an amateur historian with a special interest in automatic musical instruments of the 1885 to 1935 era, focusing on such things as player pianos, music boxes, nickelodeons, orchestrions, fairground organs, etc. Over the past 5-6 years, I have scanned some 30,000 pages of original literature of that era, such as owner's manuals, promotions, technical manuals, catalogs and trade journals, etc. All have been done the hard way, with a flat bed scanner over mind-numbing hours and hours, ad nauseum. To get a feeling for what I've been up to, see:

            http://members.shaw.ca/paud122/docs.htm

            I'm now faced with a new opportunity to scan about 150,000 pages of a weekly trade journal of that era. All are 11" x 15", all in bound-book form, all visually in good condition. The owner is most reluctant to dis-bind the collection for any kind of automatic scanning.

            As a retired civil servant, doing this is a committed volunteer, I too am on a tight budget. I believe there are many of us out there, but hard to link up with one another.

            After doing this for about 6 years, I now have 4 flat bed scanners, each useful for a specific purpose. So far, I have not encountered a need for s scanner such as you describe. However, you might consider using a digital camera. Faster than a flat bed scanner, slower than a scanner with an automatic document feeder. A new device has just come to my attention that might be useful. See:

            http://tinyurl.com/2wkwfjx

            With a device of this nature, you cannot avoid massive document handling to capture both sides of your index cards, but at very least it is dramatically faster than conventional flat bed scanners. I am considering such a device for what I have in mind for an equally massive digital archiving initiative.

            My objective is to make it possible for researchers to freely access this data, using the magic and power of the internet. My current website has attracted numerous researchers, ordinarily frustrated by institutions holding such data, but imposing impossible access restrictions, and simultaneously refusing to digitize it for easier access.

            There is another rather convoluted approach you might consider. My working life included the design of record keeping system involving the use of microfilm. In those days, back in the 50's, there were microfilm cameras capable of reading both sides of documents at high speed. I suspect banks have contemporary equipment today to scan and capture both sides of banknotes for special applications, such as ransom demands.

            The resulting microfilm is not ordinarily easily accessible, and it is likely not very good quality images. However, many libraries are now equipped with microfilm camera/scanners capable of scanning microfilm reels 16mm and 35mm, or microfiche, and converting the image to a graphic format of your choice. These devices are equipped with a USB port making it possible to do a conversion at little or no cost.

            A good example of this is the production records of the Rudolph Wurlitzer Co in North Tonawanda, NY. These records were converted to 35mm roll microfilm 'levety-7 years ago. A sample reel is on its way to me to experiment with this process, using the microfilm camera/scanner/computer in my local main library. Again, the objective is to get these records into an easily accessible format, most likely PDF, freely available on the 'net for researchers.

            The designer of the digital document scanner I referenced above also makes a version of his device aimed at capturing very good quality images off microfilm or microfiche. He has sent me some image samples which are quite impressive. That assumes of course that the film image is good to begin with. These devises can be seen on his web site at:

            http://www.docsondisc.com.au/

            Your thoughts?

            Regards,

            Terry Smythe


            --
            Terry Smythe (204) 832-3982
            55 Rowand Avenue (204) 981-3229 (cell)
            Winnipeg, MB smythe@...
            Canada R3J2N6 www.amica.org
          • Nick Hodson
            Regarding the scanning of 350,000 3x5 cards: I have been running a Visioneer Strobe XP 300 for several years now. It reads both sides of the paper
            Message 5 of 7 , May 27 9:59 AM
            • 0 Attachment
              Regarding the scanning of 350,000 3x5 cards:

              I have been running a Visioneer Strobe XP 300 for several years now. It reads both sides of the paper simultaneously. It would take less than 10 seconds to read each card. I bought it not only to read books that have come to pieces, which allows me to work with poor quality, thus cheaper, originals, but also to work through a huge volume of family correspondence going back over 100 years. It does both jobs extremely well. Just occasionally the read-head needs to be cleaned with a special paper that looks like a coarse stiff sheet of blotting paper, moistened with rubbing alcohol. But even so the size of the task is a bit daunting. If you fed cards in for 15 hours a day, six days a week, it would still take you 11 weeks. The device is not very expensive at $350, and possibly you could find a second-hand one, or borrow one.
              http://www.visioneer.com/products/item.asp?PN=90-0518-200

              With letters or cards like these the best thing is to store them as bundles of DjVu images, but how to access them is the problem. You would have this problem ahead of you however you scanned the cards, and in what ever format you stored them. And to be honest, the work you'd have to do after the 11 weeks scanning would take far longer than 11 weeks.

              Kind regards, Nick Hodson

              ----- Original Message -----
              From: Lars Aronsson
              To: digital-text@yahoogroups.com
              Sent: Thursday, May 27, 2010 2:19 PM
              Subject: Re: [digital-text] Scanning a LOT of 3x5 File cards



              Allen Stanely wrote:
              > I have just been given the opportunity to scan the database of one of the premier steam locomotive historians. The database of his day (1950s on) was 3x5 file cards. They are handwritten with data on both sides. The handwriting pretty well rules out OCR, which is not a big concern.
              >
              > The collection size is what is raising questions for me. I estimate it to be in excess of 350,000 cards. Has anyone had experience with this or thoughts they would share? I have found card scanners, but most dont have the durability to live through such a task. I am on a limited budget and need to stay below 400$ if at all possible. I have found several at the >$1000 range but they are out of the question. One at a time on a flatbed is also not not feasable as I wouldnt live long enough to finish it.

              First, get out of the "limited budget" thinking. This sounds
              like an interesting project, and you should be able to
              get some more money somewhere, somehow. You need to
              invest enough of your own time in this, anyway, that
              maybe you should get a group of people helping you.

              Scanning cards is not different from book pages for those
              books that lost their spine. Many desktop document
              scanners are able to scan business cards, such as
              the Fujitsu ScanSnap S1500 or Canon DR-2050 models.
              These are in the range $ 300-600 and if they give up
              when you are half ways, you only need to get another one.

              For books, which often have less than 800 pages, it is
              useful to make one PDF or Djvu document for the
              whole book, containing images and maybe OCR text.
              With the right scanner and compression settings, these
              files stay well under 100 megabytes each.

              For 350,000 cards, each maybe having two sides, all
              700,000 images could not be stored in one file, so you
              need to consider if you should produce 700,000 files,
              one for each image, or if you can find some other
              logical grouping. If there's no other way to group
              them, maybe one day's worth of scanning is a group.

              --
              Lars Aronsson (lars@...)
              Aronsson Datateknik - http://aronsson.se

              Project Runeberg - free Nordic literature - http://runeberg.org/





              [Non-text portions of this message have been removed]
            • RAILDATA
              Hi Lars, Jean, Terry & All, Thank you for the kind suggestions. Flatbed scanning is not really an option for me. My scanners only have a 8.5x11 glass.
              Message 6 of 7 , May 27 8:07 PM
              • 0 Attachment
                Hi Lars, Jean, Terry & All,

                Thank you for the kind suggestions. Flatbed scanning is not really an option for me. My scanners only have a 8.5x11" glass. Handling them twice would not only be too burdensome but also intorduces the chance of mixing the order when I flip them and I cant afford that, and I only want to handle them once.

                I accidently stumbled on an e-mail in one of the searches I was doing that got me into a group of messages of people scanning baseball cards with the exact same problems, but on a much smaller scale. They suggested the same scanners as Lars and several others. I had budgeted up to $350 for a scanner and that was stretching it, as I am also having to pay for having the collection shipped to me. Living on Social Security and a part time job does not leave a lot of discretionary funds. A few friends are pitching in to help too, and I think I can swing the 450-500$ which now brings these scanners mentioned into the realm of reality. I also found a lot of scanners designed to scan insurance cards, business cards etc, and they are all hand fed but the speed is very fast- up to 600/hour. They are not durable enough but at 100-150$ could be considered throwaways and just use several of them as needed. They also capture both sides in one pass.

                I will probably scan at 300dpi, and OCR is probably not possible and not desired as they are all handwritten and would have to be verified and again time is a factor. The only use for the scans is to compare against records in a database already in place. Since the goal is to try and get the database records as accurate as possible the introduction of OCR misreads even if they could read the handwriting is unacceptable. I am only going to the trouble of scanning them so that the job can be spread out to involve others around the world. The cards themselves will be donated to a museum and archives at Rocky Mount, NC and it is imperitave they be kept in exact order. They will be shipped out almost as fast as I can scan them as I dont have the room to keep them.

                The cards will be in many groups to begin with. First are the cards of the builders records. It will indicate data such as date built or shipped, customer, dimensional information, type fuel (wood, coal, oil etc) There are over 100 different builders involved, ranging from very large to only building 1 or 2 locos. This generates the fist owner card, a particular railroad. Each resale will generate another card with all its history on that particular line. These cards were then grouped by owner so a roster for that road was created. Some roads were small, 1-2 locos and others owned thousands.


                My situation is probably a bit different than most. Spending time hunting grants etc is out of the question as I have little time to begin with and wouldnt know where to start, and this project came on me very quickly and I need to turn it very quickly. I can scan a couple of thousand loose pages in an evening without breaking a sweat, but the cards are requiring a different solution. Once the funds are together I think the hardware is available. I was hoping someone else had encountered this and had some quick easy solution.

                Thanks for the suggestions and if there are any other possibilities I am certainly. With enough money one can do most anything;~) By the way, this project will be in the public domain, but the cards probably not as their need would be negated once the database errors have been corrected as best we can from this collection. We are working on the same kinds of info from several other collections but they were typed or handwritten on letter sized pages which were easily scanned. Can you imagine copying by hand the Baldwin Locomotive Cos sales records of over 60,000 locos and then typing them all out? This was done for every builder in N America starting in the very early 1940s when a few could see the handwriting on the wall for steam locos. These same folks also sat in railroad company offices doing the same thing with people thinking they were crazy. Almost every one of those original records is long gone.

                Our goal is to bring all the work those few dedicated souls did to the digital age. We realize our audience will be limited at best and continually shrinking, but if we dont do it none else will, and if someone came along years down the road wanting to do it where would or could they find the files? Just our bit to preserve a slice of American history. The steam locomotive was one of the single greatest contributors to the great country we have become.

                Thanks again for any and all suggestions.

                Allen Stanley
                Railroad Data Exchange
                Greer, SC
              • Eric Jacobs
                You may want to consider a digital camera mounted in a copy stand with lighting. Since resolution and linearity only need to be sufficient for transcription
                Message 7 of 7 , May 28 7:56 AM
                • 0 Attachment
                  You may want to consider a digital camera mounted in a copy stand with
                  lighting. Since resolution and linearity only need to be sufficient for
                  transcription OCR, this may work as well as a scanner, and you can load and
                  capture an image probably faster than loading and scanning with a flat bed.

                  Also, consider talking to Brewster Kahle at the Internet Archive
                  (www.archive.org), a non-profit. They rapidly scan books using automated
                  scanners (which use cameras, not flat beds) and also perform OCR. I believe
                  the bindings can be removed from the books (loose pages), so not unlike
                  loose 3x5 cards. They might be able to reconfigure a book scanner to
                  process your 3x5 cards and do the OCR quite affordably or possibly even free
                  if they can identify grant money for the project (if you have the patience
                  for a grant cycle or two).
                  Eric Jacobs

                  The Audio Archive, Inc.
                  tel: 408.221.2128
                  fax: 408.549.9867
                  mailto:EricJ@...
                  http://www.TheAudioArchive.com
                  Disc and Tape Audio Transfer Services and Preservation Consulting



                  -----Original Message-----
                  From: digital-text@yahoogroups.com [mailto:digital-text@yahoogroups.com]On
                  Behalf Of RAILDATA
                  Sent: Thursday, May 27, 2010 8:07 PM
                  To: digital-text@yahoogroups.com
                  Subject: [digital-text] Re: Scanning a LOT of 3x5 File cards



                  Hi Lars, Jean, Terry & All,

                  Thank you for the kind suggestions. Flatbed scanning is not really an
                  option for me. My scanners only have a 8.5x11" glass. Handling them twice
                  would not only be too burdensome but also intorduces the chance of mixing
                  the order when I flip them and I cant afford that, and I only want to handle
                  them once.

                  I accidently stumbled on an e-mail in one of the searches I was doing that
                  got me into a group of messages of people scanning baseball cards with the
                  exact same problems, but on a much smaller scale. They suggested the same
                  scanners as Lars and several others. I had budgeted up to $350 for a scanner
                  and that was stretching it, as I am also having to pay for having the
                  collection shipped to me. Living on Social Security and a part time job does
                  not leave a lot of discretionary funds. A few friends are pitching in to
                  help too, and I think I can swing the 450-500$ which now brings these
                  scanners mentioned into the realm of reality. I also found a lot of scanners
                  designed to scan insurance cards, business cards etc, and they are all hand
                  fed but the speed is very fast- up to 600/hour. They are not durable enough
                  but at 100-150$ could be considered throwaways and just use several of them
                  as needed. They also capture both sides in one pass.

                  I will probably scan at 300dpi, and OCR is probably not possible and not
                  desired as they are all handwritten and would have to be verified and again
                  time is a factor. The only use for the scans is to compare against records
                  in a database already in place. Since the goal is to try and get the
                  database records as accurate as possible the introduction of OCR misreads
                  even if they could read the handwriting is unacceptable. I am only going to
                  the trouble of scanning them so that the job can be spread out to involve
                  others around the world. The cards themselves will be donated to a museum
                  and archives at Rocky Mount, NC and it is imperitave they be kept in exact
                  order. They will be shipped out almost as fast as I can scan them as I dont
                  have the room to keep them.

                  The cards will be in many groups to begin with. First are the cards of the
                  builders records. It will indicate data such as date built or shipped,
                  customer, dimensional information, type fuel (wood, coal, oil etc) There are
                  over 100 different builders involved, ranging from very large to only
                  building 1 or 2 locos. This generates the fist owner card, a particular
                  railroad. Each resale will generate another card with all its history on
                  that particular line. These cards were then grouped by owner so a roster for
                  that road was created. Some roads were small, 1-2 locos and others owned
                  thousands.

                  My situation is probably a bit different than most. Spending time hunting
                  grants etc is out of the question as I have little time to begin with and
                  wouldnt know where to start, and this project came on me very quickly and I
                  need to turn it very quickly. I can scan a couple of thousand loose pages in
                  an evening without breaking a sweat, but the cards are requiring a different
                  solution. Once the funds are together I think the hardware is available. I
                  was hoping someone else had encountered this and had some quick easy
                  solution.

                  Thanks for the suggestions and if there are any other possibilities I am
                  certainly. With enough money one can do most anything;~) By the way, this
                  project will be in the public domain, but the cards probably not as their
                  need would be negated once the database errors have been corrected as best
                  we can from this collection. We are working on the same kinds of info from
                  several other collections but they were typed or handwritten on letter sized
                  pages which were easily scanned. Can you imagine copying by hand the Baldwin
                  Locomotive Cos sales records of over 60,000 locos and then typing them all
                  out? This was done for every builder in N America starting in the very early
                  1940s when a few could see the handwriting on the wall for steam locos.
                  These same folks also sat in railroad company offices doing the same thing
                  with people thinking they were crazy. Almost every one of those original
                  records is long gone.

                  Our goal is to bring all the work those few dedicated souls did to the
                  digital age. We realize our audience will be limited at best and continually
                  shrinking, but if we dont do it none else will, and if someone came along
                  years down the road wanting to do it where would or could they find the
                  files? Just our bit to preserve a slice of American history. The steam
                  locomotive was one of the single greatest contributors to the great country
                  we have become.

                  Thanks again for any and all suggestions.

                  Allen Stanley
                  Railroad Data Exchange
                  Greer, SC






                  [Non-text portions of this message have been removed]
                Your message has been successfully submitted and would be delivered to recipients shortly.