Let's develop an open-source media archive standard

11 Aug 2004

----- Original Message -----
From: "Jules Richardson" &lt;julesrichardsonuk(a)yahoo.co.uk&gt;
To: "General Discussion: On-Topic and Off-Topic Posts"
&lt;cctalk(a)classiccmp.org&gt;
Sent: Wednesday, August 11, 2004 10:50 AM
Subject: Re: Let's develop an open-source media archive standard
...
  "could be kept" in zip files, yes - but then
that's no use in 50 years
 time if someone stumbles across a compressed file and has no idea how to
 decompress it in order to read it and see what it is :-)
 Hence keeping the archive small would seem sensible so that it can be
 left as-is without any compression. My wild guesstimate on archive size
 would be to aim for 110 - 120% of the raw data size if possible... 
I'd say ZIP is so ubiquitous, if the format itself is dead and gone there
should be a pile of documentation on how to extract it, especially if you
keep to the basic "deflate" algorithm, used in everything from PNG to gzip
to Jar.
...
  My background (in XML terms) is with Java - but
I've not come across
 software that mixes a human-readable format in with a large amount of
 binary data though (whether encoded or not). Typically the metadata's
 kept seperate from the binary data itself, either in parallel files (not
 suitable in our case) or as seperate sections within the same file. 
Fork it. Make it a binary file, but keep the metadata in the header
ASCII/Unicode/XML/Whatever.
That way a simple cat or less or opening in notepad will show you what's in
it.
...
  See my posting the other week when I was trying to
convert ASCII-based
 hex data back into binary on a Unix platform :-) There's no *standard*
 utility to do it (which means there certainly isn't on Windows). If the
 data within the file is raw binary, then it's just a case of using dd to
 extract it even if there's no high-level utility available to do it. 
On the macs there's Binhex which has been the standard since file transfers
have begun. Also, there are dozens of *nix utilties to deal with Base64
(MIME) encoded data.
...
  But the data describing all aspects of disk image
would be readable by a
 human; it's only the raw data itself that wouldn't be - for both
 efficiency and for ease of use. The driving force for having
 human-readable data in the archive is so that it can be reconstructed at
 a later date, possibly without any reference to any spec, is it not? If
 it was guaranteed that a spec was *always* going to be available, having
 human-readable data at all wouldn't make much sense as it just
 introduces bloat; a pure binary format would be better. 
IMHO if you have a file laying around and you have no idea what it is, it's
still nice to be able to
read it in as text and see what's going on, the Unix "file" command
notwithstanding.
...

 I'm not quite sure what having binary data represented as hex for the
 original disk data gives you over having the raw binary data itself -
 all it seems to do is make the resultant file bigger and add an extra
 conversion step into the decode process. 
Usually text moves around multiple machines/oses a lot easier than binary
data. If you can dump a text file to a machine somehow, you can transfer a
media archive and a .c program to compile and decode it.
...

 Yep, I'm with you there. CRC's are a nice idea. Question: does it make
 sense to make CRC info a compulsory section in the archive file? Does it
 make sense to have it always present, given that it's *likely* that
 these archive files will only ever be transferred from place to place
 using modern hardware? I'm not sure. If you're spitting data across a
 buggy serial link, then the CRC info is nice to have - but maybe it
 should be an optional inclusion rather than mandatory, so that in a lot
 of cases archive size can be kept down? (and the assumption made that
 there exists source code / spec for a utility to add CRC info to an
 existing archive file if desired) 
I think CRC's should be compulsorary, with an external hash optional.
Corruption can happen on new hardware as well as old :)
...

 cheers
 Jules

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Let's develop an open-source media archive standard