paper -> HTML (and The First PC)

6 Jan 1999

At 01:49 AM 12/30/98 -0600, you wrote:
...
  1) do a color scan to grab images
        2) clean up images
        3) resize based on guess at a good size and res for web pages 
Don't think you can do much about these steps.  I usually shoot for 320x240
pixels for web images -- on most monitors that's about 4" by 3" or so.  It
used to be (not sure if this is still true) that the default width for
netscape on the mac gave you 400 pixels across; I believe on the PC it was
480?  (I'll have to dig up that article again.)  Also, that's a manageable
size for downloads.
Anyway, on a 640x480 screen, you lose some width for scroll bars and all;
plus you need a border/margin...  Sure, you can design your web pages for
800x600, if you don't care that most people won't be able to see it all at
once.
...
  4) scan again as B/W line art
        5) OCR
        6) clean up OCR
        7) create HTML combining OCR'd text and images
I don't much like PDF for web docs, so an HTML solution would be best.  It
looks like the "pro" version of Xerox's OCR software might automate the
task somewhat.  Any recommendations?  
Well, your main issue is getting the text into machine-readable format.  My
current belief is the best way (especially for lower quality originals) is
to read them into a word processor using dragon dictate or similar.  Once
you've got a text file, there are several options to get them HTML'ized,
including things like MS Word, and HTML editors.  (I prefer doing it
manually.)  Depending on what you're doing, a CGI program that
reads/formats text files, inserting images as necessary, might be the way
to go.  (See http://www.sinasohn.com/clascomp/index2.htm) for an example.)
--------------------------------------------------------------------- O-
Uncle Roger                       "There is pleasure pure in being mad
roger(a)sinasohn.com                           that none but madmen know."
Roger Louis Sinasohn & Associates
San Francisco, California                       http://www.sinasohn.com/

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

paper -> HTML (and The First PC)