Software-based floppy disc data separator

19 Jun 2010

On Sat, 19 Jun 2010, Kieron Wilkinson wrote:
...
  It's certainly nice to talk to people who do do
that though! I don't
 envy you, but I'm sure it's interesting and gratifying work (?) 
It's a helluva hobby   :-)
It's great when it will contribute towards paying the bills.
...
   the bulk of
floppies in existence, they don't represent the variety
 of variations in filesystems, encodings or layouts.  Most of the
 details of those dedicated systems (say, from someone's PBX) remain
 unknown to the current day.   There are some people lurking who have
 knowledge of certain file formats (e.g. embroidery machines) who can
 assist in translating the data to something meaningful.  But often,
 the best you can get is "this is what the machine does when you feed
 it this diskette". 
 Ouch. 
. . . and the data and information from the users are not always reliable.
Such as when they create a single sided 35 track sample disk to analyze,
but record it on a used 40 track double sided disk, send you a disk to
analyze of a completely alien file system that happens to be jammed full
of remains of deleted files, or tell you that "the computer is a
Lear-Sigler with a Northstar Horizon external drive", or a "Pentabs"
computer (rebranded Vector-Graphic)
Just getting them to NAME the computer is a struggle.
...
   If anyone
wants to try their hand at it, I can send a time-domain
 (i.e. Catweasel) sample of a Lanier 32-sector M2FM (as best I can
 determine it) WP disk.  You have only to figure out the character
 set, file system, floppy encoding and file format...  Nasty. Might as well be a
cryptanalysist for that sort of thing! I 
or just a puzzle solver
...
  wonder if some statistical-based analysis would help,
but perhaps you
 are way ahead of me on that? 
Some
Chuck is very fond of histograms.
On the other hand, I hardly ever continued with any format that wasn't
going to be possible to convert using relatively stock hardware.  "If it
isn't IBM/WD, then just file the disk in the appropriate section of
hardware incompatibles."
I played around with some probabalistic code to come up with what to try
first, particularly in finding and identifying software sector interleaves
(which sector is used after sector number 1?  Feed the code the start and
end bytes of sectors and have it identify which ones are most likely to be
"half a worm" (start of a "word" at the end of one sector, end of the
word
at the beginning of another sector); in the absence of adequate langauage
text (or excessively unfamiliar languages) multibyte machine language
instructions are adequate)
"Ah HA!  The code thinks that there is a high probability that sector 7
follows sector 4.  5 instances of probable sequence on the disk, 3 of
which are words!, and 1 of the non-words started with an upper case
character.  Therefore, 1,4,7?  But is it 1,4,7,2,5,8,3,6,9 or
1,4,7,3,6,9,2,5,8?"   YES, both of those sequences exist.
'e' is the most probable character in English language text.  But if a
sector ends with a 'q', then 'u' or '.' are the most probable
subsequent
characters.  BTW, 'e' is NOT the most probable character following a space
- look at the thicknesses of different starting letters in the dictionary!
An upper case character is much more likely to follow a space that follows
another space or a period., etc.
But for file information, nothing beats the human mind.
--
Grumpy Ol' Fred                     cisin at xenosoft.com

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Software-based floppy disc data separator