By analogy,
consider what would be needed to archive paper documents.
If you were trying to design a spec that could replicate
dollar bills, I think that you would probably need to
store a lot of pixels, and complex data about paper and
ink composition.
But if you were storing text (such as this discussion?),
you would need the ASCII data stream, plus [optionally] a
tiny amount of formatting/font data. Using the storage
format that contains all of the pixels and complex data
about paper and ink composition would be inappropriate.
An extensible format, that could expand to include paper
and ink data, and image pixels is great, but the extensible
format should also be able to shrink down and slough off
unneeded, irrelevant stuff when appropriate.