der Mouse wrote:
As a simple example, if the high bit of a compressed
codon is 1, the
next bit is significantly more likely to be 0 than 1 - at least until
the table fills up. For a more complex example, consider the Shannon
estimate of one bit per letter for normal connected English text. At
With all due respect, we're talking about LZW, not Shannon's english language
estimate -- I don't think your comments are appropriate in the context of this
discussion. Pop on over to comp.compression and I'd be happy to discuss the
amount of entropy present in human language, but once data gets transformed by
LZW it has very little entropy left. (If it didn't, by your argument, you
could still recompress it with a bitwise encoder like PAQ -- you can't (by a
significant margin, anyway)).
What I think
you're getting at is that, since the source data has a
lot of redundancy *and* is human parsable, it can be reconstructed
more easily in the case of mangled data.
Well, yeah; "is human parsable" is a form of redundancy, but one that
is almost impossible for programs to take advantage of - certainly
Actually, WinRK uses a dictionary to currently achieve the very best Calgary
Corpus score, so it is most definitely exploitable.
--
Jim Leonard (trixter at
oldskool.org)
http://www.oldskool.org/
Want to help an ambitious games project?
http://www.mobygames.com/
Or check out some trippy MindCandy at
http://www.mindcandydvd.com/