Google is particularly bad about fetching documents
over and over
again.
mmm... any evidence they are using OCR to index pdf's?
of all the places I'd *like* them to OCR, it's bitsavers.
in fact, mmmmm, I'd like to connect the two dots. bitsavers
+ google (and, and/all mit, standford, cmu, ... software archives)
something to start mentioning at various fund raising
cocktail parties :-)
To be clear - the problem is that Google consumes bandwidth by
repeatedly downloading static documents, verses downloading dynamic
content whose index status might be new or dirty?