Apologies to the list for spending yet more bandwidth on this, but Bj?rn
actually presented an alternative in a reasonable way (thanks!), and I think
it is important to address that...
From: "Bj?rn Vermo" <bv at norbionics.com>
I have used Acrobat since the first beta.
...
I am writing this with the help of Display PDF, the
engine used for the
display in Mac OS X. It is much superior to the inelegant kludges that
make up other graphical display systems. Adobe is a company that
understands about nice font rendering and the presentation of things.
That is also what Acrobat is about, it is a way to make sure that the
print shop is printing your document the way you intended. That is what
a PDF file is good for, and in every other respect it is an inferior
format.
...
If you scan a book, you end up with bitmaps of the
pages. If you stuff
these bitmaps into a PDF container, the only value you add is that they
are kept together in sequence. The value you subtract is that they are
no longer readily available for everybody, and anybody who wants to OCR
them to make any kind of index or cross reference will have to use
proprietary software or get them extracted from the container they are
put in.
Now, by your own admission, I have also added value to the way it will be
displayed and printed by Acrobat users. This is also consistent with my
experience. Acrobat really does understand scaling and viewability, to a
degree generally not matched elsewhere.
There is also an argument whether PDF format actually makes the document
unavailable to more people, or available to more people.
Now, if you do it the simple way, you use a suitably
named directory
instead of the PDF file. In that directory, you can keep individual
PNGs for each page you scanned, named P000 and up (use whatever
starting value is suitable to reflect the page numbers in the book). If
it is structured that way, you make a subdirectory for each chapter and
appendix, and place the page scans in there instead. Thus you retain
the structure of the original without using any proprietary format, and
everybody with a graphical display will be able to use the scanned
book. There are numerous image viewers available for all platforms,
many will be graphical browsers which will navigate your pages and
directories rather better than an Acrobat reader.
Here I completely disagree. I have a graphical display, and I can't view a
TIFF file or a PNG file easily at all. All the viewers the I have have
non-uniform, kludgy interfaces. They also require me to start a separate
application or plug-in, just as PDF does. None of them render the pages as
well as Acrobat, and the interfaces for browsing directories are laughable.
Many suitable viewers
are available from open source projects, so you can build one even on
platforms which have been neglected for years.
Yeah, I guess. If I wanted to download, install, and maintain all that
infrastructure, just to read documents on my screen.
The most elegant solution, though, is to use an
ordinary web browser to
access the pages. It is trivially simple to make a website out of this
directory structure, and there are many free server-side products
available to make access user friendly without any effort by the
maintainer.
Here, I'd point out that my web browser is awful at viewing image data. It
wants to present the image pixel-for-pixel the way it is stored, or the way
your web page told it to. Which knows nothing of my screen resolution, let
alone my prefered work style. With the Acrobat plug-in, at least I can set
the zoom and still read the work in question.
It is in the web scenario we most clearly see the
"value subtracted"
nature of PDF. If I want to look at the information of page 52, I have
to get the whole document. That will waste bandwidth, and it makes the
server more expensive to operate.
Well, sorta. What I actually do, and I doubt I am alone in this, is
download the thing, and read it locally (performs better), which means I
don't have to come back to you the next time I refer to the document. Since
the usual case is that I will read most or all of the document, the PDF
actually saves you significant server bandwidth.
Plus, it automatically creates a back-up of the document, with a nearly
trivial effort on my part.
Besides, I get thrown out of the
normal working mode for my web browser and into the different mode of >
the
Acrobat plugin (if that is supported on the platform I use,
otherwise it will be the standalone reader or
GhostView or something).
True, but this happens for essentially everything except GIF and JPEG.
(Most of the formats you are talking about will invoke the hideous Quicktime
plugin by default.)
The next important feature of the open solution is
that it encourages a
collaborative effort to add useful thigs like indexes, cross references
and even full text versions. Take a look at Wikipedia to see what it is
possible to accomplish when things are kept in open, universally
accessible form. A repository of technical information could be set up
the same way, and it would become gradually more useful as people added
their comments and index hints for the scanned pages as
metainformation. To OCR the pages just for use as aids to searching and
indexing would be simple, the raw OCR output could be given the same
name as the scanned page just with a .txt extension. If somebody later
on were to proofread and mark it up, that would lead to a .xml
document. These kinds of possibilities are only available if the
documents are kept in a simple, logical structure that is accessible to
as many as possible, not just for reading but for further refinement.
I concur that working on the document is easier with the pages split up.
In order to avoid technical lock-in today, my
preferred document format
is XML with CSS styling, either with an XHTML DTD or, ideally, a DTD
tailored to the usage area and reflected by the stylesheets. Bitmap
images are ideally PNGs, photographs JPEG 2000, and vector images are
SVG. There exists a plethora of free tools to work with, transform and
generate this kind of document.
Here you are mistaken, on several fronts. XML with CSS fails to "keep it
simple" or make it easy for me to contribute. PNG is only renderable on my
system with the nearly useless Quicktime plugin. I don't know what SVG is,
but I doubt I can render it at all. Free tools are well and good until you
count up the time it will cost me to bring up a PC-based development
environment and keep it and the "free" tools working.
The most harmful things for anybody who wants a
useable, syntactical
web, are lock-in formats. The worst by far is Flash, with Microsoft
Office formats closely following (even when the output is supposed to
be HTML), but PDF is a good third.
I agree on the hideousness of Flash and Office. (What is a "useable,
syntactical web", and why do I want one?)
Vince