Continuing to let you all know about developments, I do expect that
many of you are facing a similar
problem - trying to condense and preserve a lifetime of "collecting
digital stuff".
The DFF utility has been very helpful, however once I started
organizing my files, I realized that although
there are a lot of duplication, much of it is was downloaded at
different times and/or from different sites making much of it
different, many vendors don't go out of their way to make file
content/purpose obvious
in the names, and many files are dependent on other files - so
manually reorganizing the data is NOT
always easy.
The best solution I have come up with so far is to invent a new
archive format designed to eliminate
duplicate data but capable of recreating the entire original directory
trees (or parts thereof). To that end
I created the two utilities described below (now included in the web
archive). -- yeah, I do seem to have a
fair bit of spare time on my hands these days...
;=BDA - Build Dave's Archive
;=EDA - Extract Dave's Archive
Dave's Archives contain the smallest possible representation of a complete
directory tree:
- Only one copy of the data for duplicate files is stored.
- Duplicate filenames are stored only once.
- Path information is stored only once per directory, and only additions
to a path are stored (adding/removing from last path).
eg: Starting with a large DIR of support files for one of my systems. This
has duplicates and a lot of pre-compressed install files:
314 dirs, 930 files using 3,762,691,033 bytes.
Just "ZIP"ing it I get: SysSupt.zip 3,352,081,951 bytes
7zip does a bit better: SysSupt.7z 3,245,871,362 bytes
Running BDA, I get:
SysSupt.DA1 9,404 bytes
and SysSupt.DA2 1,912,855,711 bytes
Big improvement, but no compression yet, using ZIP and 7zip I get:
SysSupt.zip 1,636,965,417 bytes
and SysSupt.7z 1,609,663,862 bytes
And YES, using ZIP/7zip to extract the .DA's, then EDA gives me a directory
with exactly the same content that I started with.
Like my other tools, these can deal with BIG directory trees, and the
output file format is well documented
should you ever want to recover the files by other means.
Sorry if I've not responded to messages here, tend not to follow the
list directly much these days due to the high content, but you can
always reach me through the link on my site - might take me a few days
to respond, but I do get to it from time to time...
Dave
--
----------------------------------------------------------------------------------
Personal site:
http://dunfield.maknonsolutions.com
----------------------------------------------------------------------------------