http://www.nsrl.nist.gov/Project_Overview.htm
for an example of a large (about 18 million unique sums) database of file hashes
It is interesting that they don't claim that they are all valid files, just that they
have a hash and a file/path name from a known source