It all depends on wether you want the data to be still
"mostly "there if
something goes wrong with the file.
AFAIK PDF and TIFF will be totally unusable with any one byte missing.
whereas HTML XML and other text based format will still have useable and
recognizable data.
The other way of looking at this is that your text/html files could
be utter garbage and you would not know any better. Or more likely, one
or two characters could get corrupted and produce something that is
still readable but incorrect ...
The real advantage of text (I think) is that we are still likely to be
able to do something with it 100 years from now. The medium is most
likely to be the problem (7-track tapes anyone..?)
MD5SUM and CDcheck will help detect errors years down the line (I'm
assuming that you write stuff to CD and immediately verify against the
original sources ... and obviously you keep duplicates of all the really
important CDs ...)
If getting the stuff back matters, you can try looking at the various
tools that can produce "parity files" for you. I think they are geared
towards having N files of a set size and adding P parity files to
regenerate lost data, but it's a start. Much better to keep a master
copy of your CDs somewhere IMHO.
Antonio