sorry for starting a new thread/subject, but i somehow don't find an
appropriate point to jump into the discussion again ;-)
First of all: The schematic files i referred to earlier are not
mine. I didn't scan them, i don't have the 'raw material'- i just
looked around what are the bigger files and what might be
convertable to smaller files. Why i converted to gif: see below.
Second, i did a Very Bad Thing (because i didn't ask): I copied the
images out of the cdp1877.pdf (originally to print 'em- can't
convince Acrobat reader to print without the large border), did a
little gamma correction and saved as b&w gifs. Now it's 220K in size
alltogether- i can put a zip file in the files section if anybody
wants it/ wants to pdf-ify it again.
Of course, the quality is much lower now, and some words every now
and then are barely readable. Maybe photoshop could do better
(especially in color depth reduction the free IrfanView does a poor
job...), and i did only have access to the pdf file, not the
original scan images if that matters.
Don't get me wrong: The guys who provided the 1877 and 1862 pdfs did
a good job- i have done the same before, and i know how difficult it
is to get something readable and small out of an aged book.
And now for the file formats:
ASCII (pure text)
One of the first world-wide, manufacturer independand standards.
Ideal for anything text-only (at least if there are no international
characters or symbols) and probably readable forever. If file size
is an issue, it compresses well (and if a compression standard is
going to die, it can easily be taken out and compressed with
The ideal format if there are any graphical elements; careful use
of hyperlinks allow easy navigation (hey- that is what HTML was
meant for, in the first place!). I just think it's convenient to
click the index entries and *hop* to the paragraph :-)
If you don't use MS Word to generate it, a HTML file is only little
larger than the respective text file. It is based on an independand
standard and compresses well... see above.
One drawback: A HTML document that contains graphics will give a
bunch of files. For transport issues, these can be put together in a
zip file or something.
No, it is not independant- nor is it really free. But it is _very_
widely used, so the possibility that future generations will be able
to convert it into whatever format is high. Also it is well
GIF files are most suitable for anything that has few colors and
wide areas- a.k.a. line art, and scans of text. (Just in case one
can't or doesn't want to OCR or re-type something... like large
tables in data sheets with heavy use of formula signs and
Also, GIF does lossless compresson. See 'jpg' why this is a Really
(Note: If you have an image that refuses to be readable in b&w gif,
usually 16-color or even 8-color is not distinguishable from a 32-
bit scan, but still gives relatively small files)
*the* standard for anything picture. Color, size... everything
compresses quite well with jpg. But you lose detail- the smaller you
want the resulting file, the more detail is lost. I have seen large
JPGs of schematic diagrams that were barely readable because the
litteral 'fine print' went under in compression artifacts. No fun.
And maybe i've missed something, or my image cnverter is silly...
but with line art, i *always* get significantly larger files at
lower quality than with gifs.
DOC (a.k.a. MS Word)
Well, everything has been said- highly version dependant,
formattinbg breaks if font not installed... Word might be good to
create a 'similar-looking' text version of a scanned document. But
it is not well suitable to preserve the formatting across the
borders of versions and platforms- let alone across the borders of
Plus, it is a proprietary standard- if microsoft decides to drop
support for Word and the doc file reader, you're out of luck.
Well known for ist inter-platform operability. Basically, it's
nearly as good as a printed copy to preserve not only the content,
but also the layout. And that is good.
But what is bad: it's a proprietary standard, too. I'm not sure
whether all details of the file format are freely available. And
Acrobat Reader is not available for every platform- i can't see much
sense to generally outrule DOS users, C64 surfers, blind or visually
handicapped form reading a text about ancient mocroelectronics.
According to file sizes, we seem to get to no compromise. Maybe
Acrobat can produce very small files, but seemingly most other
converters can't. And file sizes was the reason for all the
If one wants to preserve the layout, PDF is the way to go- no doubt.
But we always have to keep in mind that it is proprietary, that it
has problems with large (asin 'many pages') documents, and that it
requires careful work to get small files.
For conserving content, it's (simple) HTML for text, and appropriate
GIF or jpg for images. Hey, we do it to actually _work_ with these
documents, right? So let's benefit from HTML as documenting tool,
not only as a means to put graphics in a text.
(No, i don't like HTML e-mails either- but that is a problem of the
e-mail format much more than that of HTML as such)