Recovering data from disks was a lot easier 30 years ago when most
filesystems had contiguous files and it was just a matter of finding
file boundaries. Was very glad of this when accidentally wiped first
200 blocks of an RT-11 RK05 and just had to write a FORTRAN program
to copy blocks of data and assign the files names. Also wrote a
program to do a disk scan to look for specific file type and like to
show people what jpeg files they've left behind on a disk they've
"wiped". Finding images very easy as can display a list of many
image icons on screen and quickly scroll through them if one is
looking far a particular image. Of course, the longer a hard disk
has been used and not defragmented, the lower the recovery percentage
of files. Got paid for file recovery a few times but mainly use it
to show people what really happens when they "wipe" a disk. Have
convinced a lot of people that low level format on a disk they're
giving away a good idea.
After thinking about disk imaging tools like
Greaseweasel,
I started thinking about tools that would grab and examine the unused
portions of disks.
It's obviously file-system dependent. At one level we know of
"undelete" tools that could piece together recently deleted files
and restore them intact by using abandoned bits of block table info.
Of course some simple file systems can't even permit that.
But very few systems would bother to zero out the released blocks
of erased or rewritten files and then blocks are left full of
old data. Text source code would be easy to spot.
I have vague memories of bits of Amiga OS source code being unintentionally
released in unused blocks on OS binary disks that were sent out for
mass duplication and distribution.
This situation makes me hesitant to release disk images from the past.
It's one thing to do it with disks that were mine and to take responsibility
for my risk; it's another to release disks once owned and used by others.
Do the unused sectors contain their love letters from 1983?
Or if I want to release disk images that contain known personal files,
how will I image, then remove specific files, then zero unused blocks
if I don't want to alter the original media?
Obviously in some situations the relevant files can be pulled and
redistributed in a new filesystem like a Zip.
The situation only gets worse with distributing larger images of
entire hard disks. Or with Windows, "quick format" doesn't zero blocks.
In another case I encountered while digging through files on an old
RSTS backup tape, we had a program that logged usage data to a file
and for speed purposes it would preallocate a large file (as opposed
to extending the file, which was slower) and then write block records
to it. RSTS reused blocks without zeroing. In the unused blocks
of an extant file I found an email I'd sent in '82 as well as bits
from other users of the same timesharing system.
Certainly the archivists out there have considered these questions.
How are they solved?
Are there notable tools that focus on the files that aren't there?
I don't mean modern forensic carving tools... but some concepts would
be similar.
- John