Spelunking the places where files are not

Fred Cisin cisin at xenosoft.com
Mon Mar 8 13:56:46 CST 2021


On Mon, 8 Mar 2021, John Foust via cctalk wrote:
> I'm familiar with the various undelete tools for Windows and Linux.
> Such tools may not exist or make sense for older file systems.

Windows/MS-DOS was certainly not unique nor original in marking file 
primary directory entries (FPDE) as deleted, without removing all traces.
CP/M, for instance maintains most of the FPDE.  Look in the directory 
sectors for entries starting with 00 or E5.

UCSD P-System is easy, until the disk has been CRUNCH'ed.

> Entire files would be great to find, but I suspect interesting
> fragments may be more likely.

AS Chuck pointed out, when a file does not fill the remaining space in the 
last block that is allocated to it, that space MIGHT contaian residual 
content from previous use.  Yes, the OS will usually write complete 
sectors (may or may not contain unused portion of sector buffer content!), 
and overwrite the rest of the last SECTOR, but it is unlikely to clear out 
the unused sectors in that last block.


> Running a Windows-based tool like Recuva on a hard drive leads
> to such a firehose of fragments if you choose the deep scan that
> examines all unused blocks.  I've only tried the free version.
> Does the pro version give you a way to exclude all the dozens
> of OS file types that are probably not the user-made files
> that you want?

I'd recommend a two stage process.  Make files out of all of those 
fragments.  THEN, use other tools to select which of those fragments 
contain the type of content that you are looking for.  On something as 
small as a floppy, of course, a human is cost effective.

> And for the archaic disk formats, it would be good to have
> platform-specific methods of identifying fragments to guess
> their file type beyond executable and ASCII.  Older run-length
> compression image formats may be more possible to recover than
> today's block-compressed images.

PROJECT: Create a program that will take a list of files, and partial 
files, and for each one, identify the file type, and attempt to display 
the content.  There are thousands of file formats to implement.
Knowledge of those file formats, and especially their headers, is 
essential.
Obviously, file headers and beginnings of files may be easier to identify 
than random pieces from the middle.
For example, in MS-DOS/Windoze, a file that starts with "MZ" (Mark 
Zbikowski) is almost always an .EXE file.
Fortunately, most of the older file formats were simpler, and word 
processing files had lots of text strings in them.

--
Grumpy Ol' Fred     		cisin at xenosoft.com


More information about the cctalk mailing list