On Sun, 2004-06-27 at 20:41, der Mouse wrote:
   If you've
OCRed the data, HTML is probably fine for pure text. 
 Ugh!  Please, no!  For pure text, use - pure text! 
 
Ahh, my point was that this would be text + images (well, typically).
I *hate* HTML for any document type stuff, but in this case it seems
like it *might* be the best option, considering the alternatives :-(
If it's simple markup without all sorts of extra HTML crap then *maybe*
it'd be OK. I honestly don't know :-)
Of course then there's the question - do you preserve an old document in
the format which the author intended (complete with typos, any bad
layout etc.), or is the intention to just preserve textual/graphical
data - possibly losing font and layout information in the process?
cheers
Jules