On Aug 28, 2021, at 12:22 PM, Al Kossow via cctalk
<cctalk at classiccmp.org> wrote:
On 8/28/21 8:57 AM, Antonio Carlini via cctalk wrote:
Neatly solved in the document's future (but
our past and present) by having documents that are born digital.
Good luck translating documents that HP, DEC and IBM produced in their proprietary
"bookreader" formats so
they don't look like crap.
The same goes for PDF, in many cases. But I have found that running a PDF document
through an OCR program that handles page formatting (like FineReader) can work quite well.
The OCR function itself of course works very nicely when you have input like that -- no
variability in the letter shapes. So you're really dealing with the conversion from
page geometry to text flow that those programs also offer.