Loading ...
Sorry, an error occurred while loading the content.
 

Double Key Entry: Accuracy and costs?

Expand Messages
  • Jon Noring
    Everyone, There are essentially two methods to digitize ink-on-paper texts: manual key entry, and scanning/OCR. David Sewell, in a prior message to this group:
    Message 1 of 34 , Dec 7, 2007
      Everyone,

      There are essentially two methods to digitize ink-on-paper texts:
      manual key entry, and scanning/OCR.

      David Sewell, in a prior message to this group:

      http://groups.yahoo.com/group/digital-text/message/17

      noted that double key entry, combined with software "diffing" and
      software+human reconciliation of differences, is still "the gold
      standard for capturing text from hardcopy" with regards to accuracy.

      For those not familiar with "double key entry", the text is literally
      retyped (or rekeyed) into a text editor, character-by-character, by
      two different people. The resulting two digital texts are digitally
      compared ("diffed"), and any differences are reconciled by both
      software and humans.

      Obviously double key entry will lead to very high accuracy when done
      right. It can also better handle unusual situations that strain the
      capability of OCR engines.

      I have a few questions to ask of everyone. Feel free to contribute by
      answering any of the following, or to answer other important questions
      I do not ask below:

      1) What is the typical accuracy of double key entry as explained
      above?

      (Does "triple key entry" offer any benefits when very high accuracy
      text is desired, or is that unnecessary?)

      2) Of the errors that remain after double key entry, what are they
      typically?

      Lee Passey suggested to me that oftentimes they would be common
      transposition of letters which is the bane of most typists. For
      example, "teh" instead of "the".

      3) If someone hired a commercial company to do double key entry, what
      is the typical going rate? (Let's only consider the rekeying stage,
      not the digital comparison/processing stage.)

      I would presume a lot of rekeying is done in countries with low
      labor rates, such as India.

      4) Does anyone see a role, as David Sewell suggested in his message
      (see the above link), of adding OCR to the mix so as to reduce
      costs for a given level of accuracy?

      For example, let's consider scanning the text pages, have a single
      key entry done, and also produce a high-grade OCR version (maybe
      itself produced by a combination of OCR engines as David mentioned)
      and "diff" those two.


      Thanks!

      Jon Noring
    • Ray Saintonge
      ... The Comerford example is very specific to a particular text. A person unfamiliar with that text would be at odds to know which is correct. But for the
      Message 34 of 34 , Dec 17, 2007
        Jon Jermey wrote:
        > Accuracy of proofreading can be improved at a minimal cost by an
        > intelligent reading of the text. For instance, if 'Mr Comerford' is
        > mentioned on page 1 and 'Mr Cornerford' on pages 2 to 10, an intelligent
        > English-speaking proofreader can use their knowledge of English
        > orthography to decide that 'Comerford' is correct and "Cornerford' is
        > incorrect. If they are properly equipped with software they can then do
        > a global change from 'Cornerford' to 'Comerford' and correct what may be
        > dozens of errors without even needing to see them. Other global changes
        > -- 'tlie' to 'the', for instance -- are even more obvious. In fact I
        > have set up a Word macro which globally corrects about thirty common
        > errors of this kind, and I run this whenever I start proofing a new text.
        >
        The 'Comerford' example is very specific to a particular text. A person
        unfamiliar with that text would be at odds to know which is correct.
        But for the 'Mr" I would have guessed it to be the name of a small
        village. While changing 'tlie' tp 'the' may be the most common
        correction, who's to say that the error was not from transposing the
        letters in 'tile'? Failing to see these situations does not exactly
        produce an intelligent reading. The search can be automated, but each
        instance should be given a reality check.
        > This is one of my objections to DP - that the same error needs to be
        > manually corrected each time, no matter how obvious the alteration is.
        That strikes me as a lesser evil. It weighs greater accuracy against
        the acceleration of an otherwise tedious task.

        Ray
      Your message has been successfully submitted and would be delivered to recipients shortly.