How would one actually go about re-generating an original media from
the metafile? Do we contemplate connecting some future computer's I/O port
to a 34-pin ribbon cable connected to a 1980's vintage floppy drive? At some
point in this process we're going to have to make some detailed assumptions
on how the metadata will be used 50 or 100 years from now.
Also, the metafile not only has to include information about the
"user" data areas of the disk but also the system areas (the stuff written
to the media by the controller -- address marks, gaps, sync bits, etc.).
This would require us to not only compile the general media format
data but also data on the controller used to generate the media (chip specs,
data gleaned from examining the "format" programs used, etc.).
The reason why I ask is that somehow we're going to have to test the
archival/restoration process to see that it works. It's like making tape
backups but never testing them with a restore.
This might be obvious, but I've been accused of stating that before
:-)
Rich
-----Original Message-----
From: cctalk-bounces(a)classiccmp.org
[mailto:cctalk-bounces@classiccmp.org]On Behalf Of Steve Thatcher
Sent: Thursday, August 12, 2004 10:17 AM
To: General Discussion: On-Topic and Off-Topic Posts
Subject: RE: Let's develop an open-source media archive standard
comments embedded below...
best regards, Steve Thatcher
-----Original Message-----
From: Hans Franke <Hans.Franke(a)siemens.com>
Sent: Aug 12, 2004 10:06 AM
To: Steve Thatcher <melamy(a)earthlink.net>,
"General Discussion: On-Topic and Off-Topic Posts"
<cctalk(a)classiccmp.org>
Subject: RE: Let's develop an open-source media archive standard
Am 11 Aug 2004 14:47 meinte Steve Thatcher:
> I realize that the idea is to create a format to make re-creation
> of media possible for a variety of platforms. We can certainly do
> that and have its only function be to maintain a physical data
> format. My added idea is that the data and the formatting be
> separated so that a simple utility on a non-target platform could
> extract data from the image file.
Well, physical format and data is not always seperable. In some
circumstances the physical format is part of the onformation an
application needs, and differs form media to media (e.g. copy
protection schemes)
*** the separated data that I am talking about is what would be accessed
through normal OS channels. Copy
protection schemes, special destroyed sectors, etc are not accessible
through the OS.
> If we create a physical description only and do not abstract
> the data then any emulator must understand the OS file
> structure in order to retrieve any internal file representation.
> My idea would make the file re-creation simple in that the xml
> image file would be parsed for the actual file data that an
> emulator would need. This makes the emulator easier.
But what if the emulator needs the physical format information?
*** I have not proposed that the physical format information be excluded...
> To retrieve a file from the physical layout that it at the end
> of this message, the emulator must know the actual disk format
> that is used on the target system (the one the image file was
> made for). I have seen cp/m systems where the actual physical
> sectors were sequential on disk and the OS file sector was
> actually virtual to increase speed. Not my idea of the way to
> do it. It is much easier to make the physical sectors slewed
> so that a physical sector is a file sector. These are the types
> of issues you will have to overcome if an emulator must totally
> understand each and every file system for a cp/m version for
> example.
At least within a CP/M system it usualy doesn't matter at all
how the files are stored on a disk. Except for some odd apps
who tried to implement system specific copy protection schemes,
all and every CP/M app accesses files just via BDOS which already
hides the real disk strukture.
*** talk to other people here and one of their arguements for keeping things
all together was being able to
access track and sector directly. As for BDOS, that is fine, but keep in
mind that BDOS was customized for nearly every cp/m platform
>From: "Fred Cisin" <cisin(a)xenosoft.com>
>
>I think that capability of including comments is essential!
>
>For example:
>
><COMMENT>CRC error on disk. Not yet determined whether it is
>a read error, or a deliberate component of copy protection
></COMMENT>
>
>or
><COMMENT>Note that HEAD NUMBER field in sector header is wrong.
>Machine uses WD controller, and doesn't care about that field,
>therefore, the incorrect value does not need to be replicated
>for normal use. </COMMENT>
>
>
Hi
A must.
Dwight
>From: "Steve Thatcher" <melamy(a)earthlink.net>
From: "Dwight K. Elvey" <dwight.elvey(a)amd.com>
---snip---
> It may also be that the only way that person has to capture the
>data is the output of a controller chip. The archiving should allow
>this as well ( more in the format the Steve would like it all to
>be in ).
>
>*** not sure why this relates to the format I was proposing. What I was
desdcribing was a way to do both low level bytes as well as blocks of data
>
Hi Steve
Again, you've missed the point here. The information may not
have anything other than the bit stream and the track. The
person extracting the data may have no knowledge of the
format ( FM, MFM, M2FM, RLL or whatever ). They are just
archiving the data of the disk. I agree that if there is
sufficient information available to include such things
as sector boundaries, that should be encoded in some method.
This may not be in the actual data but as part of the format
description ( requires some work to extract from the data ).
At the lowest level, the archive may only contain a bit stream
that correlates to the signal coming from the drive. Without
information as to the encoding method, this may be useless
to you. As Sellam has stated, your application is secondary
to actually capturing a reproduceable medium.
Obviously, most people will not be able to create such information
( I believe I will ). Most will be creating such things as the
output data from some standard disk reading chip. These will
surely have some form of partioning, either in the header or
embedded in the data.
This could still be a valid archive input for standard formats.
Note the word 'could' and not must.
Dwight
here is a "crude" example of what I was talking about.
try:
<archive image file version="1.0">
<definitions>
a place to put whatever relevant information is needed about the format or achive
or your favorite poem or joke
</defintions>
<target system>Tandy Model 100</target system>
<author>Ian Blindly</author>
<media>
<format>5.25" floppy</format>
<default-sector-data value="$AA"/>
<encoding>MFM? RLL? something ... </encoding>
<tracks>35</tracks>
<heads>1</heads>
<sectors size="256">18</sectors>
<wordsize>8</wordsize>
</media>
<datamap>
<head physical="0">
<track physical="0">
<sector logical="1" physical="4" datablock="DB1"
dataitemid="SB1" />
<sector logical="2" physical="8" datablock="DB2"
dataitemid="SB1"/>
<sector logical="3" physical="12" fill="$00"
/>
<sector logical="4" physical="16" fill="$00"
/>
...
</track>
<track physical="1">
<sector logical="1" physical="4" fill="$FF"
/>
<sector logical="9" physical="12" datablock="DB3"
dataitemid="SB1"/>
</track>
<track physical="22" datablock="DB3" dataitemid="SB2"
/>
</head>
</datamap>
<datablock id="DB1" type="boot">
<dataitem id="SB1" encoding="HEX" crc="1234">456789ABCDEF...</dataitem>
<dataitem id="SB2" encoding="HEX" crc="2341">1234567890A...</dataitem>
<dataitem id="SB3" encoding="HEX" crc="3412">890ABCDEF01245...</dataitem>
<dataitem id="SB4" encoding="HEX" crc="4123">456789ABCDEF...</dataitem>
</datablock>
<datablock id="DB2" type="OS">
<dataitem id="SB1" encoding="HEX" crc="1234">456789ABCDEF...</dataitem>
<dataitem id="SB2" encoding="HEX" crc="2341">1234567890A...</dataitem>
<dataitem id="SB3" encoding="HEX" crc="3412">890ABCDEF01245...</dataitem>
<dataitem id="SB4" encoding="HEX" crc="4123">456789ABCDEF...</dataitem>
</datablock>
<datablock id="DB3" type="file" name="DUMP.ASM">
<dataitem id="SB1" encoding="HEX" crc="1234">456789ABCDEF...</dataitem>
<dataitem id="SB2" encoding="HEX" crc="2341">1234567890A...</dataitem>
<dataitem id="SB3" encoding="HEX" crc="3412">890ABCDEF01245...</dataitem>
<dataitem id="SB4" encoding="HEX" crc="4123">456789ABCDEF...</dataitem>
</datablock>
</archive image file>
please don't get hung up on names, etc, The basic structure is what I have been
talking about. A utility can go into the archive file, find a <datablock>,
know what it is and extract it without having to know anything about the OS. Another
utility can read the media, datamap, and datablocks to re-create tracks in memory
to then write out to disk. There was earlier talk about years from now being able
to still re-create disks. I think that is a fine ambition, but that still requires
all the hardware and intimate OS knowledge to still be around. My idea was to at
least be able to extract the data without having to have ANY knowledge of the OS.
best regards, Steve Thatcher
I seem to recall that my slow old N* Horizon was doing dd at 4mhz with no dma - in fact it was polled I/O because their wait for I/O available kept locking up so I modified it.
best regards, Steve Thatcher
-----Original Message-----
From: Fred Cisin <cisin(a)xenosoft.com>
Sent: Aug 12, 2004 5:12 PM
To: "General Discussion: On-Topic and Off-Topic Posts" <cctalk(a)classiccmp.org>
Subject: Re: Let's develop an open-source media archive standard
On Thu, 12 Aug 2004, Dwight K. Elvey wrote:
> Hi Jules
> Here is what I've found. It is a disk drive emulator.
> Unless a PC is DMA driven, bit banging a floppy is not practical.
Disk I/O without DMA is not practical. But it IS possible.
Consider the PCJr and Tandy 1000, both of which do disk I/O
without DMA on 4.77MHz? machines.
my crude example was a form of picture because words just didn't seem to be conveying what I had in mind... :)
-----Original Message-----
From: Vintage Computer Festival <vcf(a)siconic.com>
Sent: Aug 12, 2004 5:33 PM
To: "General Discussion: On-Topic and Off-Topic Posts" <cctalk(a)classiccmp.org>
Subject: Re: archive file format exmaple
On Thu, 12 Aug 2004, Jules Richardson wrote:
> On Thu, 2004-08-12 at 11:11, Steve Thatcher wrote:
> > here is a "crude" example of what I was talking about.
> > [snip]
>
> That looks good. How's about moving things like author into the
> definitions section? I'm not sure that it belongs at the top level, but
> more in the section containing archive info (description, creation date
> etc.)
I think it's rather premature to be discussing implementation :(
--
Sellam Ismail Vintage Computer Festival
------------------------------------------------------------------------------
International Man of Intrigue and Danger http://www.vintage.org
[ Old computing resources for business || Buy/Sell/Trade Vintage Computers ]
[ and academia at www.VintageTech.com || at http://marketplace.vintage.org ]
Hi Jules
Here is what I've found. It is a disk drive emulator.
Unless a PC is DMA driven, bit banging a floppy is not practical.
There are just too many other things that the PC is doing on
the side. Having a separate dedicated uP is the best way.
That device can then either communicate with the host through serial
or parallel.
Dwight
Web pointer:
http://www.rothfus.com/SVD/
>From: "Jules Richardson" <julesrichardsonuk(a)yahoo.co.uk>
>
>On Thu, 2004-08-12 at 00:13, Dwight K. Elvey wrote:
>> >From: "Fred Cisin" <cisin(a)xenosoft.com>
>> >
>> >truly demented idea:
>> >build a hardware device that can read the image file, connects
>> >via 34 or 50 pin cable to an FDC, and that produces pulses
>> >that look like disk data to the FDC.
>> >
>> Hi
>> It has already been done. I can look up the web page if you like.
>> Dwight
>
>yes, please :-)
>
>I remember asking about this a while back, specifically wondering if a
>PC parallel port was fast enough to drive it (not without buffering at
>some sort of level, it seemed)
>
>I wouldn't mind seeing what someone else has come up with.
>
>cheers
>
>Jules
>
>
I did not say it was the primary purpose. The data blocks work hand in hand with the formatting information.
It is fine to make a standard extensible, but what good does it do if the (I hate the use the word) file can't be gotten without jumping through major hoops because how the data was stored in the image file wasn't extended out to make blocks. If you have to iterate through formatting information to get data then you have to be intimately familiar with the disk format in use. It means any GENERAL utility to read an image file for data access will have to KNOW about all machione supported rather than just getting some type of data identifier and reading the data out.
I think part of the problem here is that the word file is being taken literally to mean filename, size, data, etc. I am using in the context of a block of data. It has NOTHING to do with how the data is on the disk, where it is stored, the recording format, etc. It is just a piece of data...
Sellam in a later email thought that I was proposing that the data be duplicated twice - one for file access and one for format data - NO, that is not what I was talking about. The formatting information contains no data other that what may be necessary for things outside of what is considered a sector on a drive such as address mark, etc(note: this does not preclude sequential data - I am only using this as a specific example in this email).
I am talking about two separate sections inside the same image file. One of the sections contains data blocks. The other has specific formatting information and POINTS to the data in the data blocks. A utility program could then SIMPLY get any type of data out of the image file without the utility being out of date as soon as someone added a new physical format.
I have tried to make this email as clear as possible, so there is no misunderstanding of what I have proposed.
best regards, Steve Thatcher
-----Original Message-----
From: "Dwight K. Elvey" <dwight.elvey(a)amd.com>
Sent: Aug 11, 2004 8:11 PM
To: cctalk(a)classiccmp.org
Subject: RE: Let's develop an open-source media archive standard
>From: "Steve Thatcher" <melamy(a)earthlink.net>
>
>what is wrong with making things easier?
---snip---
Hi
I'm not saying to make it impossible to do, just that it shouldn't
be considered as a primary purpose. Using an extendable language
like I've suggested, one can add such features. Part of the problem
is that when someone creates the archive, they may not even know
the file structure of the disk. I would expect the specification
to be broad enough to allow such. Still, the primary thing is
to be able to recreate the original material.
To me this means that any input to some emulator may mean that
it requires some post processing. How would one know what some
some format a particular emulator wanted? How would a person always
know how to read the directory structure and be able to extract
files? If one wanted to work in all cases, I'd expect that the
person writing the emulator would provide the needed post processing
to extract such information. Otherwise, they'd only be able to
read archives of disk that were specifically created for their
system. Those archives that the person didn't know the file
structure would be useless unless.
Such things are secondary functions. They shouldn't be restricted
>from being used, it is just that the primary function should
be to capture the entire information in as close to the original
format as possible. Creating post processors could easily be
done as a separate outside function for special purposes.
Dwight
>>XML is platform neutral because it's basically ASCII, right?
Yes, true, but I think of XML more as a Web technology requiring a complex
parsing engine. I'm not a Web programmer so my thoughts on XML are probably
somewhat broken.
Another comment was made about the difference between what's actually on the
media versus what the CPU actually sees. We would thus need to capture the
raw data stream from the "heads" side of the controller in order to
regenerate a usable original media from the metafile.
For emulator use, we can grind this metafile through a translation program
to get the bytestream. OR, the metafile could contain both types of data
(using the container file and metadirectory idea from earlier).
What we really need is PDF for magnetic media :-)
-----Original Message-----
From: cctalk-bounces(a)classiccmp.org
[mailto:cctalk-bounces@classiccmp.org]On Behalf Of Vintage Computer
Festival
Sent: Wednesday, August 11, 2004 3:23 PM
To: General Discussion: On-Topic and Off-Topic Posts
Subject: RE: Let's develop an open-source media archive standard
On Wed, 11 Aug 2004, Cini, Richard wrote:
> This example represents the block data using metatags...I guess along the
> "XML" part of the thread.
>
> I was thinking similarly to you but not using XML metadata:
>
> ;Hardware descriptor
> MFGR
> MACHINE
> SUBTYPE
> DRIVETYPE (this of course defines what follows)
> ;for floppy
> DRIVESIZE
> ENCODING
> TRACKS
> SECTORS
> SECTSIZE
> ;HexData
> ; Each record or group of records contains the related media data. The
> address record would be used for encoding the metadata
> 00TTSSHH: (00-track-sector-head)
>
> I looked to Intel Hex (or Motorola) because it had built-in CRC facilities
> and it was human-readable ASCII. The drive and machine description could
be
> encoded in special MOT records probably.
I like the XML style because it's more explicit; more human-readable.
> XML is more a more "current" technology but I was trying to keep with the
> platform neutrality by sticking to text-only and not assuming the use of
any
> other technology like XML.
XML is platform neutral because it's basically ASCII, right?
--
Sellam Ismail Vintage Computer
Festival
----------------------------------------------------------------------------
--
International Man of Intrigue and Danger
http://www.vintage.org
[ Old computing resources for business || Buy/Sell/Trade Vintage Computers
]
[ and academia at www.VintageTech.com || at http://marketplace.vintage.org
]