On Jun 1, 11:10, Sellam Ismail wrote:
On Thu, 1 Jun 2000, Pete Turnbull wrote:
But a lot more volumnious. But this is just my
prejudice speaking. Even
though I find HTML useful, I hate it.
It needn't be a whole lot more voluminous. The tags should be concise,
there's no need to write an essay for each part. Keywords might be a good
idea. Tags would be omitted if irrelevant (as many would be for a "raw"
archive, or for a common format with no "funnies"). So a disk descriptor
might look something like this:
{Apple
][<00>soft<00>trks:<40><00>rpm:<15><255><00>{trk:<00><00>logical<00>
length:<12><34><00>sectors:<10>{sector:<00><00>{sync{bytes:<16><00>value:<255><00>}{header:GCR<00>trk:<00><00>sec:<00><00>physsec:<00><00>head:<00><00>size:<00><01><00>}{data:<
---256 binary bytes---- >crc:<xx><xx><00>}}sector: [repeat as reqd]
}}{track: [repeat as reqd] }}
I can't remember some details like the size of a DOS 3.3 track or what the
sync bytes are so that's just an stylistic example.
The opening "{" marks the start of an object and is matched by a closing
"}"; braces are nested because objects are nested.
Variable-length strings like "Apple ][" are terminated by some agreed
control character (I used ASCII NUL, <00>). Numeric values are stored in
binary (actually it might make more sense to store them in ASCII where they
follow a string description, but probably not for a block of sector data).
So "rpm" is stored as a 2-byte representation of 360. Hmm, we'd need to
decide if it's little-endian or big-endian -- or add another tag!
> a problem? The tags don't all need to be
ASCII text, things like the
> sector size could be integers, and field lengths could be limited. I'd
> envisage something like nested objects (borrowing from Sellam's
slightly
later mail):
I don't like the idea of storing the actual sector data as text though.
I hadn't meant to imply that; I mean you could hexify it if you wanted, but
I don't see any need. Actually one of the things I was thinking of earlier
today, was Acorn's "DrawFile" format, which uses similar objects, but the
data is still binary (it's a computer program that reads the data, not a
human). If a human really did need to read it, you could always use a hex
editor.
I guess in this
day and age it doesn't matter much anymore but when I was growing up you
had to make every byte count, and I know more than 95% of us here can
relate to that.
Yup, I was too, but I think here the benefits greatly outweigh the
disadvantage of extra storage requirement. We want this to be as useful as
possible, and the easier it is to use for unexpected formats (to create
*and* to read), the more it will get used.
> It also
> means that if the database is lost, damaged, incomplete or otherwise
> inaccesible, an archive can still be understood, and there's no chance
of
> inconsistency because two people tried to add new
formats at about the
same
time, or
someone rolled their own.
I agree with that. Human readability is definitely a compelling
advantage
as is the elimination of the need for a centralized
database of system
descriptions.
It would still be good to have a central repository. At the very least, it
would allow those who know where to look, to see what has already been
dealt with, and save a lot of design effort if the format they want is
already there. It would be the place to store the explanation of the tag
system. Plus, the bigger it gets, the more it will encourage others to
archive their treasures, too.
--
Pete Peter Turnbull
Dept. of Computer Science
University of York