On Sat, 19 Jun 2010, Kieron Wilkinson wrote:
It's certainly nice to talk to people who do do
that though! I don't
envy you, but I'm sure it's interesting and gratifying work (?)
It's a helluva hobby :-)
It's great when it will contribute towards paying the bills.
the bulk of
floppies in existence, they don't represent the variety
of variations in filesystems, encodings or layouts. Most of the
details of those dedicated systems (say, from someone's PBX) remain
unknown to the current day. There are some people lurking who have
knowledge of certain file formats (e.g. embroidery machines) who can
assist in translating the data to something meaningful. But often,
the best you can get is "this is what the machine does when you feed
it this diskette".
Ouch.
. . . and the data and information from the users are not always reliable.
Such as when they create a single sided 35 track sample disk to analyze,
but record it on a used 40 track double sided disk, send you a disk to
analyze of a completely alien file system that happens to be jammed full
of remains of deleted files, or tell you that "the computer is a
Lear-Sigler with a Northstar Horizon external drive", or a "Pentabs"
computer (rebranded Vector-Graphic)
Just getting them to NAME the computer is a struggle.
If anyone
wants to try their hand at it, I can send a time-domain
(i.e. Catweasel) sample of a Lanier 32-sector M2FM (as best I can
determine it) WP disk. You have only to figure out the character
set, file system, floppy encoding and file format...
Nasty. Might as well be a
cryptanalysist for that sort of thing! I
or just a puzzle solver
wonder if some statistical-based analysis would help,
but perhaps you
are way ahead of me on that?
Some
Chuck is very fond of histograms.
On the other hand, I hardly ever continued with any format that wasn't
going to be possible to convert using relatively stock hardware. "If it
isn't IBM/WD, then just file the disk in the appropriate section of
hardware incompatibles."
I played around with some probabalistic code to come up with what to try
first, particularly in finding and identifying software sector interleaves
(which sector is used after sector number 1? Feed the code the start and
end bytes of sectors and have it identify which ones are most likely to be
"half a worm" (start of a "word" at the end of one sector, end of the
word
at the beginning of another sector); in the absence of adequate langauage
text (or excessively unfamiliar languages) multibyte machine language
instructions are adequate)
"Ah HA! The code thinks that there is a high probability that sector 7
follows sector 4. 5 instances of probable sequence on the disk, 3 of
which are words!, and 1 of the non-words started with an upper case
character. Therefore, 1,4,7? But is it 1,4,7,2,5,8,3,6,9 or
1,4,7,3,6,9,2,5,8?" YES, both of those sequences exist.
'e' is the most probable character in English language text. But if a
sector ends with a 'q', then 'u' or '.' are the most probable
subsequent
characters. BTW, 'e' is NOT the most probable character following a space
- look at the thicknesses of different starting letters in the dictionary!
An upper case character is much more likely to follow a space that follows
another space or a period., etc.
But for file information, nothing beats the human mind.
--
Grumpy Ol' Fred cisin at
xenosoft.com