Loading ...
Sorry, an error occurred while loading the content.

scanning format

Expand Messages
  • Adrien Verlee
    The best scanning format of the future (for OCR processing)? Does anyone have an opinion? Because I want a format, which I only after more than 10 years to be
    Message 1 of 8 , Mar 14, 2011
    • 0 Attachment
      The best scanning format of the future (for OCR processing)? Does anyone
      have an opinion?
      Because I want a format, which I only after more than 10 years to be
      able to work.
      Tiff, jpg, pdf (image-pdf)?? Other formats?

      Thanks for any idea!
      --
      Adrien
    • john041650
      TIFF, it s a well established lossless standard that s compatible with pretty much every graphics program. john
      Message 2 of 8 , Mar 14, 2011
      • 0 Attachment
        TIFF, it's a well established lossless standard that's compatible with pretty much every graphics program.

        john


        --- In ntb-OffTopic@yahoogroups.com, Adrien Verlee <adrien.verlee@...> wrote:
        >
        > The best scanning format of the future (for OCR processing)? Does anyone
        > have an opinion?
        > Because I want a format, which I only after more than 10 years to be
        > able to work.
        > Tiff, jpg, pdf (image-pdf)?? Other formats?
        >
        > Thanks for any idea!
        > --
        > Adrien
        >
      • Axel Berger
        ... PNG is the best supported lossless(!) graphics format and the one I always scan to and sometimes convert later. For pure black and white and if reduction
        Message 3 of 8 , Mar 14, 2011
        • 0 Attachment
          Adrien Verlee wrote:
          > The best scanning format of the future (for OCR processing)?

          PNG is the best supported lossless(!) graphics format and the one I
          always scan to and sometimes convert later. For pure black and white and
          if reduction to 256 colours is possible the files are also smaller than
          high quality Jpeg. If you should need full colour depth and file size is
          an issue (isn't it always?), then Jpeg at a quality of 95 to 100 % (as
          used in Irfanview, these values are not standardized across programs)
          yields negligible loss at significantly smaller files.

          Axel
        • Axel Berger
          ... Agreed, but it is totally uncompressed which means huge. You can ZIP Tiffs to about 2 %, i.e. fifty times smaller, but I prefer a lossless and compressed
          Message 4 of 8 , Mar 14, 2011
          • 0 Attachment
            john041650 wrote:
            > TIFF, it's a well established lossless standard

            Agreed, but it is totally uncompressed which means huge. You can ZIP
            Tiffs to about 2 %, i.e. fifty times smaller, but I prefer a lossless
            and compressed graphics format like PNG.

            N.B: Adrian: PDF is not a raster graphics format but only a wrapper
            around it. Most PDF generators will internally use bad and low quality
            Jpeg, so that one is decidedly off.

            Axel
          • Adrien Verlee
            ... In fact, the problem as to which format the OCR software can import within a decade. Crystal ball! -- adrien
            Message 5 of 8 , Mar 17, 2011
            • 0 Attachment
              Op 15/03/2011 3:39, Axel Berger schreef:
              > PNG is the best supported lossless(!) graphics format and the one I
              > always scan to and sometimes convert later. For pure black and white and

              In fact, the problem as to which format the OCR software can import
              within a decade. Crystal ball!
              --
              adrien
            • Dave
              Hi If you check most documents you have OCR ed they will be TIF ,the images are usually the way the owner want them saved in B/W or COLOUR. THANKYOU DAVE-211
              Message 6 of 8 , Mar 17, 2011
              • 0 Attachment
                Hi
                If you check most documents you have OCR'ed they will be TIF ,the images are
                usually the way the owner want them saved in B/W or COLOUR.
                THANKYOU DAVE-211

                ----- Original Message -----
                From: "Adrien Verlee" <adrien.verlee@...>
                To: <ntb-OffTopic@yahoogroups.com>
                Sent: Thursday, March 17, 2011 9:16 PM
                Subject: Re: [NTO] scanning format


                > Op 15/03/2011 3:39, Axel Berger schreef:
                >> PNG is the best supported lossless(!) graphics format and the one I
                >> always scan to and sometimes convert later. For pure black and white and
                >
                > In fact, the problem as to which format the OCR software can import
                > within a decade. Crystal ball!
                > --
                > adrien
                >
                >
                > ------------------------------------
                >
                > Yahoo! Groups Links
                >
                >
                >
              • Axel Berger
                ... Anything that is massive now won t go away too soon, as too many customers would be miffed. Look at GIF or IE6. I d agree that Jpeg will possibly always be
                Message 7 of 8 , Mar 17, 2011
                • 0 Attachment
                  Adrien Verlee wrote:
                  > Crystal ball!

                  Anything that is massive now won't go away too soon, as too many
                  customers would be miffed. Look at GIF or IE6. I'd agree that Jpeg will
                  possibly always be better supported than PNG but the difference is not
                  enough to forego the advantages of the latter.

                  Axel
                • Axel Berger
                  ... Dave, I m not sure what you are saying here, but I guess you re talking about graphics extracted from PDFs, where the OCRed text is behind the page image.
                  Message 8 of 8 , Mar 17, 2011
                  • 0 Attachment
                    Dave wrote:
                    > If you check most documents you have OCR'ed they will be TIF,
                    > the images are usually the way the owner want them saved in
                    > B/W or COLOUR.

                    Dave, I'm not sure what you are saying here, but I guess you're talking
                    about graphics extracted from PDFs, where the OCRed text is behind the
                    page image. If so I disagree. It is true that the extraction, I use
                    pdfimages.exe from the XPDF package, yields uncompressed Tiffs. The
                    reason seems to be simply that the act of reading (and displaying)
                    includes the decompression already and the extractor makes do without
                    the extra complication of a compression step (rightly so, one task, one
                    tool).

                    But in all the PDFs I generate myself I notice that there is no recoding
                    but all my graphic material is included as is. Whatever I include, Jpeg
                    in all kinds of quality, PNG in B/W, gray, full colour, 256 colours, 16
                    colours (great for diagrams and extremely small, but makes PDF
                    exceedingly slow, so I don't use it), the size of the PDF follows
                    exactly the sum of the sizes of embedded graphics.

                    So my guess is you are a victim of a misconception here, but I may have
                    misunderstood or be wrong.

                    Axel
                  Your message has been successfully submitted and would be delivered to recipients shortly.