A tool many of you may make find useful!

Dave Dunfield dave.dunfield at gmail.com
Fri Jun 26 11:14:34 CDT 2020

>this tool is really similar to "rdfind", which compares file sizes and
>content, independently from file name, and is able to create a list of
>correspondence, delete duplicate files, and create symbolic links to the
>single instance.
>This can work on large amounts of files, even on complex directory tree.

Sounds good, don't know that I saw that one (tend not to look too hard
as I enjoy creating stuff, and what I do is usually smaller, easier to
use - at least for me - and more reliable).

Didn't want to go into a lot of detail as this isn't exactly classic
computer related.. although I expect a lot of classic collectors are
like me and have use for it.

Couple things I implemented in DFF which I don't know of in other tools:

It uses an "index" file - first attempt just used the output of windows: DIR/S
but I found it got big and unworkable fast, and changed from one version of
windows to another. DFF creates its own which is small and consistent,
having only the DIR names, and file sizes + names.

This is normally a temp file, but you can Keep it, just Build it without
processing, and process it later. You can also have DFF append to it so you
can deal with as complex dir structure as you like, by /BAing it in various
places. It can deal with files in arbitrary directory trees on multiple drives
quite easily.

You can also have it place and END marker in the file, which means that
anything you append will be treated differently. Anything before the END
marker is scanned and reported as you expect. After the END marker, files are
considered as possible duplicates, but not checked and reported separately.

And since the "index" file is a text file, you can add to it, change it and
retrieve it's content very easily - you don't need special programs provided
by the tool maker to do unusual things.  Same is true for it's output.

You can also have it list:
  - All files (dups have a dup instance number see below)
  - Only duplicate files
  - Only single files
  - Under each directory, you can get it to list where all the duplicates
    are (full path)
This combined with the END marker makes some fairly powerful things possible.
(Show me any files occurring here which are not also occurring there).

Each instance of duplication is assigned a unique "duplicate instance" number
which is shown next to all files which are part of that "duplicate instance".

I thought about an automatic "delete duplicates" feature but didn't implement
it as I am organizing a lot of data, much is duplicated, it's final resting
place may not be one of the original locations and I want control over how the
final archive is organized.


Personal site: http://dunfield.maknonsolutions.com

More information about the cctalk mailing list