as Sellam had suggested, the data size we represent the information in is not that
important. I would encode binary data as hex to keep everything ascii. Data size would
expand, but the data would also be compressable so things could be kept in ZIP files of
whatever choice a person would want to for their archiving purposes.
XML has become a storage format choice for a lot of different commercial packages. My
knowledge is more based on the windows world, but I would doubt that other computer
software houses are avoiding XML. Sun/Java certainly embraces it.
I don't quite understand why representing binary as hex would affect the ability to
have command line utilitities. Certainly more cpu cycles are needed for conversion and
image file size is larger, but we need a readable format and I would think that cpu cycles
is not as much of a concern or file size. A utility to create a disk would only have to
run through the conversion once to buffer a representation of the floppy disk (unless we
are talking about a hard drive image of course). The file size to re-create a floppy disk
is only going to be 2 to 3meg at the most (if thinking about a 1.2meg floppy with
fortmatting info).
The only difference I see in the sections that were described is that the first one
encompasses the format info and the data. My description had the first one as being a big
block that contained the two other sections as well as length and CRC info to verify data
consistency. Adding author, etc to the big block would make perfect sense.
As for GCR, that would have been covered under etc... I am not familiar with GCR, but I
would guess that it has to deal at least with physical tracks and heads. In this case, a
track would consist of whatever the format needed plus the data blocks required for the
track.
best regards, Steve Thatcher
-----Original Message-----
From: Jules Richardson <julesrichardsonuk(a)yahoo.co.uk>
Sent: Aug 11, 2004 8:08 AM
To: "General Discussion: On-Topic and Off-Topic Posts"
<cctalk(a)classiccmp.org>
Subject: Re: Let's develop an open-source media archive standard
On Wed, 2004-08-11 at 10:50, Steve Thatcher wrote:
Hi all, after reading all this morning's posts, I
thought I would throw out some thoughts.
XML as a readable format is a great idea.
I haven't done any serious playing with XML in the last couple of years,
but back when I did, my experience was that XML is not a good format for
mixing human-readable and binary data within the XML structure itself.
To make matters worse, the XML spec (at least at the time) did not
define whether it was possible to pass several XML documents down the
same data stream (or, as we'd likely need for this, XML documents mixed
with raw binary). Typically, parsers of the day expected to take control
of the data stream and expected it to contain one XML document only -
often closing the stream themselves afterwards.
I did end up writing my own parser in a couple of KB of code which was a
little more flexible in data stream handling (so XML's certainly not a
heavyweight format, and could likely be handled on pretty much any
machine), but it would be nice to make use of off-the-shelf parsers for
platforms that have them where possible.
As you've also said, my initial thought for a data format was to keep
human-readable config seperate from binary data. The human-readable
config would contain a table of lengths/offsets for the binary data
giving the actual definition. This does have the advantage that if the
binary data happens to be a linear sequence of blocks (sectors in the
case of a disk image) then the raw image can easily be extracted if
needs be (say, to allow conversion to a different format)
Personally, I'm not a fan of mixing binary data in with the
human-readable parts because then there are issues of character escaping
as well as the structure detracting from the readability. And if encoded
binary data is used instead (say, hexadecimal representation) then
there's still an issue of readability, plus the archive ends up bloated
and extra CPU cycles are needed to decode data. Neither of those two
approaches lend themselves to simply being able to use common
command-line utilities to extract the data, either. I'm prefectly
willing to be convinced, though :)
I looked at the CAPS format and in part that would be
okay. I would like
to throw in an idea of whatever we create as a standard actually have
three sections to it.
So, first section is all the 'fuzzy' data (author, date, version info,
description etc.), second section describes the layout of the binary
data (offsets, surfaces, etc.), and the third section is the raw binary
data itself? If so, I'm certainly happy with that :-)
One aside - what's the natural way of defining data on a GCR floppy? Do
heads/sectors/tracks still make sense as an addressing mode, but it's
just that the number of sectors per track varies according to the track
number? Or isn't it that simple?
cheers
Jules