On May 31, 22:05, Sellam Ismail wrote:
Responding to an older message...
On Tue, 30 May 2000, Tony Duell wrote:
[...]
> Which means the archive format would have to
allow for :
[...]
> It may be a _very_ unusual format, but a proposed
archive format should
> be able to handle _anything_.
Wel, I agree with that, so far as it's possible.
I suppose a sub-format byte wouldn't hurt. What I
don't like about it is
that it will require that someone be maintaining a database of all the
sub-formats. But I guess since we have a machine identifier and this
will
have to be maintained as well, a sub-format byte is
not too demanding.
Of course, there will have to be a central person who is responsible for
receiving input for new machine and sub-format types, updating the
database with the new computer types and sub-format types, and
disseminating this from a website.
That's part of the reason I think an encoded format is a bad idea. Hans'
suggestion of a tagged format using XML (or something else) is much better.
It allows for decoding without referring to a central archive, and it's
much more flexible and extensible. Sure, it takes more space, but is that
a problem? The tags don't all need to be ASCII text, things like the
sector size could be integers, and field lengths could be limited. I'd
envisage something like nested objects (borrowing from Sellam's slightly
later mail):
{
Disk Descriptor Header, containing:
Host computer type string
"Hard"/"soft" sector flag
Number of tracks (1 byte)
Disk drive RPM
...
{
Track Descriptor Header, containing:
Track number (with fraction)
Track format "logical"/"raw"
Track size in bytes
Sectors in this track (1 byte)
Offset to next Track Descriptor Header
...
{
Sector header descriptor, containing:
Sector header format FM/MFM/GCR/...
Sector data format FM/MFM/GCR/... [1]
Sector number as encoded on the original disk
Track number as encoded on original disk
Head number as encoded on original disk
Physical sector number
Sector size
...
{
sector data (binary, hex-coded, whatever)
}
{
Sector header descriptor
}
{
sector data
}
}
}
The nesting tagging allows you to specify things like RX02 floppies, where
the headers are FM but the data is MFM. It also allows you to specify
different sector sizes on different tracks, or data written in the headers
that doesn't match physical track/sector/side on the original. It also
means that if the database is lost, damaged, incomplete or otherwise
inaccesible, an archive can still be understood, and there's no chance of
inconsistency because two people tried to add new formats at about the same
time, or someone rolled their own.
I've seen too many data formats where the decoding information was
unavailable, or was hard to get, or was "location unknown at this time", or
the prospective user simply didn't now where to look. If the information
is in the archive itself, anyone can work out what do do with it, any time.
--
Pete Peter Turnbull
Dept. of Computer Science
University of York