Spelunking the places where files are not
Fred Cisin
cisin at xenosoft.com
Mon Mar 8 13:56:46 CST 2021
On Mon, 8 Mar 2021, John Foust via cctalk wrote:
> I'm familiar with the various undelete tools for Windows and Linux.
> Such tools may not exist or make sense for older file systems.
Windows/MS-DOS was certainly not unique nor original in marking file
primary directory entries (FPDE) as deleted, without removing all traces.
CP/M, for instance maintains most of the FPDE. Look in the directory
sectors for entries starting with 00 or E5.
UCSD P-System is easy, until the disk has been CRUNCH'ed.
> Entire files would be great to find, but I suspect interesting
> fragments may be more likely.
AS Chuck pointed out, when a file does not fill the remaining space in the
last block that is allocated to it, that space MIGHT contaian residual
content from previous use. Yes, the OS will usually write complete
sectors (may or may not contain unused portion of sector buffer content!),
and overwrite the rest of the last SECTOR, but it is unlikely to clear out
the unused sectors in that last block.
> Running a Windows-based tool like Recuva on a hard drive leads
> to such a firehose of fragments if you choose the deep scan that
> examines all unused blocks. I've only tried the free version.
> Does the pro version give you a way to exclude all the dozens
> of OS file types that are probably not the user-made files
> that you want?
I'd recommend a two stage process. Make files out of all of those
fragments. THEN, use other tools to select which of those fragments
contain the type of content that you are looking for. On something as
small as a floppy, of course, a human is cost effective.
> And for the archaic disk formats, it would be good to have
> platform-specific methods of identifying fragments to guess
> their file type beyond executable and ASCII. Older run-length
> compression image formats may be more possible to recover than
> today's block-compressed images.
PROJECT: Create a program that will take a list of files, and partial
files, and for each one, identify the file type, and attempt to display
the content. There are thousands of file formats to implement.
Knowledge of those file formats, and especially their headers, is
essential.
Obviously, file headers and beginnings of files may be easier to identify
than random pieces from the middle.
For example, in MS-DOS/Windoze, a file that starts with "MZ" (Mark
Zbikowski) is almost always an .EXE file.
Fortunately, most of the older file formats were simpler, and word
processing files had lots of text strings in them.
--
Grumpy Ol' Fred cisin at xenosoft.com
More information about the cctalk
mailing list