I am not assuming anything about the data. The usual use is to have files... in the case of a paper tape emulator system used for CNC, the disk structure may not resemble a normal file structure. It still contains one or more blocks of data. you can apply whatever name to that you want to. The boot sector on a cp/m 8" disk doesn't have a name, but it is a block of data that is separate from everything else. Personally, I would want to be able to "read" the boot sector and potentially even write it back to an image file. It really doesn't make a difference whether you access track 0 and sector 1 or a data block inside the image file that contains the boot code.
best regards, Steve Thatcher
-----Original Message-----
From: "Dwight K. Elvey" <dwight.elvey(a)amd.com>
Sent: Aug 11, 2004 1:39 PM
To: melamy(a)earthlink.net, cctalk(a)classiccmp.org
Subject: RE: Let's develop an open-source media archive standard
>From: "Steve Thatcher" <melamy(a)earthlink.net>
>
>I agree with Sellam on the point about using it both for media re-creation and
emulation. The trouble with the approach below of just using raw data on a
track sector basis is that now you have created a file that can only be used
with an emulator that understands the physical format and OS access for the
computer system you are emulating. My earlier point of separating the data and
the format information allows a single file (that would not be much bigger that
the one described below) to contain multiple platform specific files that can be
"read" by a simple utility that does not require any knowledge of the OS or the
platform.
>
>best regards, Steve Thatcher
Hi Steve
You seem to be assuming that the particular disk you are
archiving has a file structure. This is not always the case.
Dwight
comments embedded below...
best regards, Steve Thatcher
-----Original Message-----
From: Hans Franke <Hans.Franke(a)siemens.com>
Sent: Aug 12, 2004 10:06 AM
To: Steve Thatcher <melamy(a)earthlink.net>,
"General Discussion: On-Topic and Off-Topic Posts" <cctalk(a)classiccmp.org>
Subject: RE: Let's develop an open-source media archive standard
Am 11 Aug 2004 14:47 meinte Steve Thatcher:
> I realize that the idea is to create a format to make re-creation
> of media possible for a variety of platforms. We can certainly do
> that and have its only function be to maintain a physical data
> format. My added idea is that the data and the formatting be
> separated so that a simple utility on a non-target platform could
> extract data from the image file.
Well, physical format and data is not always seperable. In some
circumstances the physical format is part of the onformation an
application needs, and differs form media to media (e.g. copy
protection schemes)
*** the separated data that I am talking about is what would be accessed through normal OS channels. Copy
protection schemes, special destroyed sectors, etc are not accessible through the OS.
> If we create a physical description only and do not abstract
> the data then any emulator must understand the OS file
> structure in order to retrieve any internal file representation.
> My idea would make the file re-creation simple in that the xml
> image file would be parsed for the actual file data that an
> emulator would need. This makes the emulator easier.
But what if the emulator needs the physical format information?
*** I have not proposed that the physical format information be excluded...
> To retrieve a file from the physical layout that it at the end
> of this message, the emulator must know the actual disk format
> that is used on the target system (the one the image file was
> made for). I have seen cp/m systems where the actual physical
> sectors were sequential on disk and the OS file sector was
> actually virtual to increase speed. Not my idea of the way to
> do it. It is much easier to make the physical sectors slewed
> so that a physical sector is a file sector. These are the types
> of issues you will have to overcome if an emulator must totally
> understand each and every file system for a cp/m version for
> example.
At least within a CP/M system it usualy doesn't matter at all
how the files are stored on a disk. Except for some odd apps
who tried to implement system specific copy protection schemes,
all and every CP/M app accesses files just via BDOS which already
hides the real disk strukture.
*** talk to other people here and one of their arguements for keeping things all together was being able to
access track and sector directly. As for BDOS, that is fine, but keep in mind that BDOS was customized for nearly every cp/m platform
that would be fine. Your <archivetype> is what <media> was all about though i.e. rom definition etc. The archive really doesn't have a type, it does however contain media information and then the data to be accessed or put onto the defined media.
best regards, Steve Thatcher
-----Original Message-----
From: Jason McBrien <jbmcb(a)hotmail.com>
Sent: Aug 12, 2004 10:04 AM
To: Steve Thatcher <melamy(a)earthlink.net>,
"General Discussion: On-Topic and Off-Topic Posts" <cctalk(a)classiccmp.org>
Subject: Re: archive file format exmaple
How about breaking up the sections a bit, like HTML <head><body> sections?
Putting card-catalog information up top in it's own section would speed
indexing and searches. Like this:
<archive image file version="1.0">
<catalog>
<definitions>
a place to put whatever relevant information is needed about the format or
achive
or your favorite poem or joke
</defintions>
<archiver>Pete Smith</archiver>
<method>ROM Dumper V4.12</method>
<recdate>02004-02-02</recdate>
<title>Omega Race</title>
<author>Midway</author>
<platform>
<mfg>Commodore</mfg>
<system>VIC-20</system>
</platform>
<archivetype>
<fmtcategory>ROM</fmtcategory>
<fmttype>Game Cartridge</fmttype>
</archivetype>
</catalog>
<format>
<default-sector-data value="$AA"/>
....
</format>
<data>
....etc
I realize that the idea is to create a format to make re-creation of media possible for a variety of platforms. We can certainly do that and have its only function be to maintain a physical data format. My added idea is that the data and the formatting be separated so that a simple utility on a non-target platform could extract data from the image file.
If we create a physical description only and do not abstract the data then any emulator must understand the OS file structure in order to retrieve any internal file representation. My idea would make the file re-creation simple in that the xml image file would be parsed for the actual file data that an emulator would need. This makes the emulator easier.
To retrieve a file from the physical layout that it at the end of this message, the emulator must know the actual disk format that is used on the target system (the one the image file was made for). I have seen cp/m systems where the actual physical sectors were sequential on disk and the OS file sector was actually virtual to increase speed. Not my idea of the way to do it. It is much easier to make the physical sectors slewed so that a physical sector is a file sector. These are the types of issues you will have to overcome if an emulator must totally understand each and every file system for a cp/m version for example.
Sellam, let me know if you would like to discuss this via telephone so I can convery the idea that I am proposing.
best regards, Steve Thatcher
-----Original Message-----
From: Vintage Computer Festival <vcf(a)siconic.com>
Sent: Aug 11, 2004 3:00 PM
To: Steve Thatcher <melamy(a)earthlink.net>,
"General Discussion: On-Topic and Off-Topic Posts" <cctalk(a)classiccmp.org>
Subject: RE: Let's develop an open-source media archive standard
On Wed, 11 Aug 2004, Steve Thatcher wrote:
> I agree with Sellam on the point about using it both for media
> re-creation and emulation. The trouble with the approach below of just
> using raw data on a track sector basis is that now you have created a
> file that can only be used with an emulator that understands the
> physical format and OS access for the computer system you are emulating.
That is the point, really. What we are attempting to do is describe as
faithfully as possible a physical media with logical data in a purely
logical form. The goal would be that the physical media could be
re-created from the imagefile if need be. The parameters of the physcial
media are specified so that this can be possible.
> My earlier point of separating the data and the format information
> allows a single file (that would not be much bigger that the one
> described below) to contain multiple platform specific files that can be
> "read" by a simple utility that does not require any knowledge of the OS
> or the platform.
I'm not quite understanding you here. Or maybe I am. An image in the
format shown below could be read by any emulator. Making sense of the
data with respect to that emulator is a different issue altogether, but it
does make it possible for, say, a Northstar Horizon emulator to load up an
Apple ][ disk image and then try to access it.
Anyway, I don't think I am quite getting the point you are trying to make.
> <MEDIA TYPE=FLOPPY SIZE=5.25 SIDES=1 DENSITY=SINGLE FORMAT=GCR TRACKS=35
> SECTORS=16 SECTORSIZE=256>
>
> <VOLUME>Apple ][ System Disk</VOLUME>
>
> </MEDIA>
>
>
> <DATA>
> <TRACK 0><SECTOR 0>
>
> HERE WOULD BE THE ASCII HEX DATA FOR TRACK 0, SECTOR 0
>
> </SECTOR></TRACK>
>
> ...
>
> <TRACK 34><SECTOR 15>
>
> HERE WOULD BE THE ASCII HEX DATA FOR TRACK 34, SECTOR 15
>
> </SECTOR></TRACK>
> </DATA>
--
Sellam Ismail Vintage Computer Festival
------------------------------------------------------------------------------
International Man of Intrigue and Danger http://www.vintage.org
[ Old computing resources for business || Buy/Sell/Trade Vintage Computers ]
[ and academia at www.VintageTech.com || at http://marketplace.vintage.org ]
The following is a XML definition I did some 3+ years ago
during a discusion, here on classiccomp (back when there
was one list *G*), as an example what a XML storage good
for everything from punch card to CD could be.
This example shows two CCDD structures, one showing an
IBMish tape, the other a disk for a popular 8 Bit micro.
(Back then nobody came up with the systems name :).
Someone asked recently how to handle multiple XML within
one file ... well, that's exactly the way it works :)
Gruss
H.
CCDD stands for Classic Computer Device Data
-------------------------
<?xml version="1.0" standalone='yes' ?>
<!DOCTYPE CCDD [
<!ELEMENT CCDD (VORSPANN?, META?, (CHANNEL* | DEVICE* | MEDIA))>
<!ELEMENT VORSPANN (#PCDATA)>
<!ELEMENT META (#PCDATA | SYSTEM | OS)*>
<!ELEMENT SYSTEM (#PCDATA)>
<!ELEMENT OS (#PCDATA)>
<!ELEMENT CHANNEL (META?, DEVICE*)>
<!ELEMENT DEVICE (META?, MEDIA*)>
<!ELEMENT MEDIA (META?, (RAW | HEAD*) )>
<!ELEMENT HEAD (RAW | TRACK*)>
<!ELEMENT TRACK (RAW | BLOCK*)>
<!ELEMENT BLOCK (RAW | DATA*)>
<!ELEMENT DATA (#PCDATA)>
<!ELEMENT RAW (#PCDATA)>
<!ATTLIST CHANNEL
ID ID #IMPLIED>
<!ATTLIST DEVICE
ID ID #IMPLIED>
<!ATTLIST MEDIA
ID ID #IMPLIED
LFD CDATA #IMPLIED
SIZE CDATA #IMPLIED
FILLER CDATA #IMPLIED
FORMAT CDATA #IMPLIED>
<!ATTLIST HEAD
LFD CDATA #IMPLIED
SIZE CDATA #IMPLIED
FILLER CDATA #IMPLIED
FORMAT CDATA #IMPLIED>
<!ATTLIST TRACK
LFD CDATA #IMPLIED
SIZE CDATA #IMPLIED
FILLER CDATA #IMPLIED
FORMAT CDATA #IMPLIED>
<!ATTLIST BLOCK
LFD CDATA #IMPLIED
SIZE CDATA #IMPLIED
FILLER CDATA #IMPLIED
TYPE (DATA|HEADER|UNDEF) "DATA">
<!ATTLIST DATA
SIZE CDATA #IMPLIED
FILLER CDATA #IMPLIED
ENCODING (CHAR|BIN|SED|INTEL|MOT) "SED">
<!ATTLIST RAW
SIZE CDATA #IMPLIED
FILLER CDATA #IMPLIED
CONTENT (DATA|PYSICAL) "DATA"
ENCODING (CHAR|BIN|SED|INTEL|MOT) "SED">
]>
<CCDD>
<META>
Example for a tape mounted on Drive D0 on Channel 1.
</META>
<CHANNEL ID="C_1">
<META>
Standard type 1 channel
</META>
<DEVICE ID="D_D0">
<META>
T9G (6250bpi)
</META>
<MEDIA LFD="0" FORMAT="T6250">
<META>
First Tape in Device
</META>
<HEAD LFD="0" SIZE="36" FILLER="00">
<TRACK LFD="16" SIZE="16" FILLER="00">
<BLOCK TYPE="HEADER">
<DATA SIZE="80" ENCODING="CHAR" FILLER=" ">VOL1TAPE001 BS2000 TSOS 4</DATA>
</BLOCK>
<BLOCK TYPE="HEADER">
<DATA SIZE="80" ENCODING="CHAR" FILLER=" ">UVL1PRIVATE LABEL</DATA>
</BLOCK>
<BLOCK TYPE="HEADER">
<DATA SIZE="80" ENCODING="CHAR" FILLER=" ">HDR1FILE1 00010001000100000102000102 000000BS2000</DATA>
</BLOCK>
<BLOCK TYPE="HEADER">
<DATA SIZE="80" ENCODING="CHAR" FILLER=" ">HDR2U020480204841 00</DATA>
</BLOCK>
<BLOCK TYPE="HEADER">
<DATA SIZE="80" ENCODING="CHAR" FILLER=" ">HDR3TSOS COMPLETE.FILE.NAME.OF.FILE1 0</DATA>
</BLOCK>
<BLOCK TYPE="DATA">
<DATA SIZE="2048" ENCODING="CHAR" FILLER="�">NO REAL DATA INSIDE THIS BLOCK</DATA>
</BLOCK>
<BLOCK TYPE="HEADER">
<DATA SIZE="80" ENCODING="CHAR" FILLER=" ">EOF1FILE1 00010001000100000102000102 000001BS2000</DATA>
</BLOCK>
<BLOCK TYPE="HEADER">
<DATA SIZE="80" ENCODING="CHAR" FILLER=" ">EOF2U020480204841 00</DATA>
</BLOCK>
<BLOCK TYPE="HEADER">
<DATA SIZE="80" ENCODING="CHAR" FILLER=" ">EOF33TSOS COMPLETE.FILE.NAME.OF.FILE1 0</DATA>
</BLOCK>
</TRACK>
</HEAD>
</MEDIA>
</DEVICE>
</CHANNEL>
</CCDD>
<CCDD>
<META>
This is another CCDD File for a FD of
<SYSTEM>XXXXX</SYSTEM> running under <OS>yyyy</OS>.
</META>
<MEDIA LFD="0" SIZE="2" FORMAT="GCR">
<META>
Floppy disk for xxxxxx
</META>
<HEAD LFD="0" SIZE="36" FILLER="00">
<TRACK LFD="16" SIZE="16" FILLER="00">
<BLOCK LFD="14">
<DATA SIZE="256" ENCODING="SED" FILLER="00">
</DATA>
</BLOCK>
<BLOCK LFD="15">
<DATA SIZE="256" ENCODING="SED" FILLER="00">
000000000000000000000000100E
</DATA>
</BLOCK>
</TRACK>
<TRACK LFD="17" SIZE="16" FILLER="00">
<BLOCK LFD="0">
<DATA SIZE="256" ENCODING="SED" FILLER="00">
04110F030000FE000000000000000000
00000000000000000000000000000000
000000000000007A0000000000000000
23010000231001000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
000000000000000380010000
</DATA>
</BLOCK>
<BLOCK LFD="15">
<DATA SIZE="256" ENCODING="SED" FILLER="00">
0000000000000000000000100F02C8C5
CCCCCFA0A0A0A0A0A0A0A0A0A0A0A0A0
A0A0A0A0A0A0A0A0A0A0A00001
</DATA>
</BLOCK>
</TRACK>
</HEAD>
</MEDIA>
</CCDD>
--
VCF Europa 6.0 am 30.April und 01.Mai 2005 in Muenchen
http://www.vcfe.org/
the keyword arrangement was just a waking thought this morning and is not cast in codecrete.
Required fields is definitely a good thing as long as the info is really necessary.
The only data that was in the media section was data that was specific to to media such as fill bytes, address marks, etc.
Also, I kept a sub block arrangement because a single datablock was supposed to represent a complete data entity such as a file, boot routine, OS itself for example. In order to retrieve any block of data, one only had to find a "datablock" and then concatanate the dataitems.
best regards, Steve Thatcher
-----Original Message-----
From: Jules Richardson <julesrichardsonuk(a)yahoo.co.uk>
Sent: Aug 12, 2004 7:50 AM
To: cctalk(a)classiccmp.org
Subject: Re: archive file format exmaple
On Thu, 2004-08-12 at 11:11, Steve Thatcher wrote:
> here is a "crude" example of what I was talking about.
> [snip]
That looks good. How's about moving things like author into the
definitions section? I'm not sure that it belongs at the top level, but
more in the section containing archive info (description, creation date
etc.)
I'd suggest making certain fields mandatory in what you've called the
definitions section. People tend to be lazy, which means the temptation
is there to create an archive and not bother to include much of the
information, thinking that they'll know what it is a few years down the
line. We've all be caught out by that one! (author, date, description
are good candidates for mandatory fields; there'll be others too)
I'm not so sure about having the actual data within the archive element;
I'd rather have it afterwards. Data in your datamap section still points
to chunks of data, using whatever scheme is understood by the
storage/compression system used by the archive. But then the possibility
is there for using an encoding method that doesn't mix with XML data if
needs be. It also makes it easy to scan the archive section by eye as
it's not mixed up with huge chunks of encoded media data. Actually, you
could probably build collections of media archives (say for some common
platform) all within the same file that way, without breaking the format
- the data in the datamap section of the first archive section just
points to other archives within the same file. That's kinda neat.
eg. (simplifying a little for example purposes):
<archive image file version="1.0">
<definitions>
... as before ...
</defintions>
<media>
... as before ...
</media>
<datamap>
<head physical="0">
<track physical="0">
<sector logical="1" physical="4" id="DB1" />
<sector logical="2" physical="8" id="DB2" />
...
</track>
</head>
</datamap>
</archive>
<datablock id="DB1" type="boot">
....
</datablock>
<datablock id="DB2" type="boot">
....
</datablock>
.. that way anything following the archive section doesn't have to be
XML data even, providing the definition within the datamap section for
the encoding/compression scheme used by the archive can reach it. It
might be zipped data, other archive definitions, whatever.
cheers,
Jules
I haven't read the entire thread on this but I did read Steve Thatcher's
idea and it describes about where I was coming out on this myself.
I might have missed what the ultimate use of this archive would be. Will the
archive be used to (1) re-generate original media; (2) operate with
emualtors; (3) both?
To ensure integrity of the data I would propose recording the data in the
Intel Hex format -- it's text-based and has built-in CRC. Now, we'd have to
modify the standard format a bit to accommodate a larger address space and
to add some sort of standardized header (a "Hardware Descriptor"). This data
would be used by the de-archiver to interpret the stream of data read from
the data area (the "Hex Block").
I agree that a multi-layer approach offers the best combination of platform
neutrality and portability. I don't really know if we need two or three
layers as Steve described to describe the file in a standard fashion. Using
an Intel Hex-like format would increase the "de-archiving" time, but in my
view it's a fair trade-off. De-archiving software could translate the
platform-neutral file into another format better suited for use in
emulators.
I think that we should start compiling a list of the various media we want
represented and how that media is organized natively. I don't mean "well, it
has blocks and sectors" either. We should examine the exact format down to
the actual numbers (i.e., "2048 blocks of 256-bytes recorded twice"). Seeing
how the various data stores are organized should bring some clarity to how
we should represent it.
Just my $0.02.
Rich
Second try
------- Weitergeleitete Nachricht / Forwarded message -------
Von: Hans Franke <Hans.Franke(a)mch20.sbs.de>
An: "General Discussion: On-Topic and Off-Topic Posts" <cctalk(a)classiccmp.org>
Betreff: Re: Let's develop an open-source media archive standard
Datum: Wed, 11 Aug 2004 15:16:36 +0200
Am 11 Aug 2004 12:08 meinte Jules Richardson:
> On Wed, 2004-08-11 at 10:50, Steve Thatcher wrote:
> > Hi all, after reading all this morning's posts, I thought I would throw out some thoughts.
> > XML as a readable format is a great idea.
> I haven't done any serious playing with XML in the last couple of years,
> but back when I did, my experience was that XML is not a good format for
> mixing human-readable and binary data within the XML structure itself.
Only if you intend to keep it 100% human readable.
> To make matters worse, the XML spec (at least at the time) did not
> define whether it was possible to pass several XML documents down the
> same data stream (or, as we'd likely need for this, XML documents mixed
> with raw binary). Typically, parsers of the day expected to take control
> of the data stream and expected it to contain one XML document only -
> often closing the stream themselves afterwards.
Now, that's a feature of the reading application. XML does not
stat what happens next since this is outside the scope. It is
perfectly op to look for the next Document, or the next start
tag of the same document type, or for whatever.
> I did end up writing my own parser in a couple of KB of code which was a
> little more flexible in data stream handling (so XML's certainly not a
> heavyweight format, and could likely be handled on pretty much any
> machine), but it would be nice to make use of off-the-shelf parsers for
> platforms that have them where possible.
Right, but especialy when we're coming down to classic platforms,
such building blocks are not always usable, and in general way
oversized. On a 48k Apple (or a 64 K 4 MHz CP/M machine) we don't
have the space to just port a C-app that has 'only' 100k of code
size. So reader/writer applications for the original environment
have to be small and special to type.
> As you've also said, my initial thought for a data format was to keep
> human-readable config seperate from binary data. The human-readable
> config would contain a table of lengths/offsets for the binary data
> giving the actual definition. This does have the advantage that if the
> binary data happens to be a linear sequence of blocks (sectors in the
> case of a disk image) then the raw image can easily be extracted if
> needs be (say, to allow conversion to a different format)
Well, that is only true if you define binary data as 8 Bit and
all means of transport as 100% transparent. Just, this hasn't
worked that way in the past, and I doubt that we will be safe
>from changes in the future.
As for the character size: we had in the past everything from
6 to 12 Bit (ok, I can't remember 11 Bit characters/words) as
'binary' characters. Of course 6,7 and 8 Bit Bytes can be easy
stored in a 8 Bit Byte, but what about 9 Bit (Bull) or 12 (DEC)?
At that point you already have to incooperate speciual trans-
formation rules which are not necersary transparent.
Also for the requirement of a transparent transport: When
transfering files between different architectures we usualy
have code or even format conversions. Most notable code
conversion would be, for example, ISO 8859-1 <-> EBCDIC which
totally destroys the 'binary' part. Or take format conversions
as done on the way between Unix style files and (Win-)DOS, LF
vs CR/LF. Whenever you leave the A-Z and 0-9 range we are
likely to encounter such problems.
Shure, one could code an app capable to read ASCII/Binary on
a EBCDIC Machine and vice versa, but to my experience (doing
programming since 25 years in mixed environments) it's not
only a boring job, but also one of the most sensitive to
errors.
Any kind of standard format must be true machine independent.
Thus (at least when using the recommended representation) be
able to be transferred across all platforms thinkable of.
> > I looked at the CAPS format and in part that would be okay. I would like
> > to throw in an idea of whatever we create as a standard actually have
> > three sections to it.
> So, first section is all the 'fuzzy' data (author, date, version info,
> description etc.), second section describes the layout of the binary
> data (offsets, surfaces, etc.), and the third section is the raw binary
> data itself? If so, I'm certainly happy with that :-)
I would rather go for an anoted format, where more detailed
information can be added at any point, and not necersary
in certain sections. Especialy since the 'fuzzy' data is
usualy not needed for the job itself.
> One aside - what's the natural way of defining data on a GCR floppy? Do
> heads/sectors/tracks still make sense as an addressing mode, but it's
> just that the number of sectors per track varies according to the track
> number? Or isn't it that simple?
Well, that's already outside of what a standard definition
can define without doubt.
To my understanding interpretation of Data is always part
of a real application. As soon as it touches machine or
format specific implementation details a standard may only
give guidelines how to store them properly, but not how to
interprete. That's part of an actual reader implementation.
And each rader will of course only understand parts he's
made for - e.g. a Apple DOS 3.3 reader will have no idea
what a tape label for a IBM tape is not to mention be able
to differentiate between the various header types.
Reader/Writer apps will always be as specific as they are
right now, when handling a proprietary format. The big
advantage is that intermediate tools, like archiving,
indexing, etc.pp can be shared. Well, in fact it's the
only advantage, except the fact that one doesn't have to
figure out a new format each time, and the simple format
does allow the ad hoc inclusion of new machines/systems.
Gruss
H.
--- Ende der weitergeleiteten Nachricht / End of forwarded message ---
--
VCF Europa 6.0 am 30.April und 01.Mai 2005 in Muenchen
http://www.vcfe.org/
>From: "Fred Cisin" <cisin(a)xenosoft.com>
>
>truly demented idea:
>build a hardware device that can read the image file, connects
>via 34 or 50 pin cable to an FDC, and that produces pulses
>that look like disk data to the FDC.
>
>
>
Hi
It has already been done. I can look up the web
page if you like.
Dwight