Eric Smith wrote:
Chuck Guzis
wrote:
I'm sure that if you delved into the
mainframe world software (which
may be hard to do), you'd find similar schemes. Hashing for filename
lookup wasn't all that uncommon--after all, it was used in just about
every compiler I ever ran into for symbol table lookup
However, using it for in-memory symbol table lookup isn't prior art
for using it to reduce disk access for filename lookup in a directory
on a storage medium, unless you can prove that it is an obvious
extension. You and I would probably think it is obvious.
Unfortunately the patent office "obviousness test", which is supposed
to be about a "person having ordinary skill in the art" (PHOSITA), as
actually applied seems only to consider persons that are complete
morons, for whom nothing is obvious.
But computer software patents weren't allowed
back then, which makes
the claim of "prior art" hard to make.
Whether software patents were allowed back then has no bearing on
whether something that can be shown to have been done at the time
qualifies as prior art today.
If Eric is correct and prior art from before software patents
were allowed is accepted, then this example may be valid:
Back in 1967, the first job I had was for a system using a
time - shared editor which then allowed the user to submit
a batch job into a queue that ran them one at a time - the
systems were still very slow back then and memory was
very limited. The computer was the CDC 3300. I do not
remember the name of the operating system, but essentially
it was one batch job along with a multi-user editor which
allowed users to create files (programs) which were then
submitted to the batch job queue.
The lab which supported the system had so many users that
the number of files became much larger and a suitable on
disk lookup table search algorithm was needed to reduce
the time taken to find the the index block on the disk drive
which pointed to the actual file label. The algorithm was
trivial - add the words in the file name (there were around
10 if I remember back 45 years ago) and divide that sum
by the number of blocks assigned to the search table. The
best choice of the number of blocks in the search table was
a prime number. One of the primes just over 600 was used
since each search block held 10 file names. Dividing
the sum of the words in the files name by the number of
blocks in the search table gave the remainder which was
then used as the block number to start the search.
A search table of 607 blocks could hold 6070 file
names and until the search table was over 80% full
(over 4800 files), rarely was more than one block
read in the search table as opposed to an average
of 240 blocks with a linear search. The penalty
was that 127 blocks in a 607 block search table were
wasted. Probably, there are better algorithms to
produce the hash sum to be divided for the search table
on the disk drive. But I doubt that changing the
method of producing the hash sum actually changes
the algorithm in any significant way. I suspect that
the most important detail is that the search table
have a prime number as the number of blocks.
Initially, the algorithm was written incorrectly and
previously used, but now deleted entries were skipped
when assigning a new file entry. The table filled up
within 2 weeks at that point with about 2 hours warning
of the event. I had written a small program which
checked all the entries in the table and I pointed out to
the manager the impending crisis - which I remember
was ignored since he could not believe his code was
incorrect. Within an hour of the system being shut
down, he had found the bug (one instruction) and
it was fixed.
A couple of years later, I switched jobs and worked
for a time-sharing company which probably purchased
a license to run the same software on a CDC 3500.
Where proof of the software used could be found at
this point, I do not know.
If this hashing algorithm qualifies, then send the information
along. I jumped into this thread just today and did not
follow what started it.
Jerome Fine