manuals in pdf (resolution, compression)

27 Jun 2004

On Sun, 2004-06-27 at 19:40, Antonio Carlini wrote:
...
  (Well, xpdf on OpenVMS VAX is slow, but then I
 guess my expectations are at fault there :-)) 
It seems to be bloody horrible on modern versions of Linux, too :-(
(well, at least on Redhat 9) Slow as hell, plus the rendering quality is
pretty awful.
True that most systems have PDF viewers, but they're more likely to have
an image displayer and a text editor ;-)
...
  I believe (but have not tried) that you can go
 from PDF to text in this case without any great
 difficulty (I don't recall what happens to images). 
What's the current licencing for PDF tools? I've pretty much avoided it
since the days when the reader was free, but anything (at least from
Adobe) which created or manipulated PDF files cost $$$
I believe the data format itself was copyrighted - but presumably isn't
these days what with all the 3rd-party viewers out there?
...
   Actually, I
suppose seperate images can help here too as people can
 navigate straight away to what they want, plus they don't need to
 download the whole of a huge pdf file before they can start reading. 
 I prefer to grab the whole thing anyway. Today I might
 just want the frobozz pinout, but tomorrow I'm almost
 certain to need the lead engineer's middle initial,
 by which time I'll have forgotten where I found the
 docs in the first place. 
Oh sure, me too. I make use of wget an awful lot to create local copies
of useful bits of websites for instance, but if I'm looking for
something then and there then it's nice to be able to at least look at
the navigation up-front (particularly to see if the whole thing's
actually relevant anyway!) and quickly start reading the most-useful
bits whilst downloading the whole lot as a background job.
...
   problem is
that
 you need to be *really* sure that your OCR versions are good
 before you
 can risk taking the raw scans offline, which means having a lot of 
 Once I've generated a raw scan (or picked up someone elses)
 I expect to keep it around essentially forever. OCR has improved
 immensely in the last few years, but not to the point where
 I can throw a scan of a poor quality photocopy at it and expect
 something that looks like the original with zero errors.
 (The Module/Options list that Eric Smith scanned would be
 an excellent torture test for any candidate "perfect" OCR program).
 Another point is that if you have high quality scans, why
 keep them to yourself? By all means have low-res versions
 available for those who just need a page or two or just
 need to look something up quickly and don't care about
 the artefacts, but make the "masters" available too. If you
 don't have the space yourself, there are people on this list
 who seem to have no problem with online disk space. 
Fair point. I'd never completely delete high-quality scans - but as you
say, there are quite a few people around who seem to be set up for
hosting huge amounts of data!
Hmm, how editable are PDF files by the way? On the OCR front, I'd expect
anyone OCRing anything to proofread it afterwards and correct mistakes
(which is of course vital for technical data anyway - technical data
with mistakes in is useless!). So unless wordprocessor-like tools exist
to edit PDF files then I wouldn't think they're much good as an
intermediate format, because people need to be able to go in there and
easily correct mistakes made by the OCR software.
cheers
Jules

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

manuals in pdf (resolution, compression)