No, I get your point. ?I offered up *a* way to get
this done and you
just want to tell me how it won't work and how awesome you are at
doing it, all without actually doing it.
Well, I am awesome (start blowing the horns!) at doing this sort of
thing (but, no, I am not volunteering).
Kicking around the backwaters of the Internet is a circa 1950 document
called (I think) "Cross Index of Electronic Tube Types". It is a raw
text file, about 120 pages of very tight text, so really about 200
pages of normal spaced text. Tube types, descriptions, stock numbers,
equivalents, all sorts of esoteric knowledge. Fair Radio sells
reprints of it. Anyway, about 20 years ago I scanned the third
generation photocopy of the document, OCRed it (on a 486), then
proofread it in my spare time. Some comments:
1) It went faster than I thought it would. The first pages were slow,
but later pages went fast. I noticed that I started to recognize
common OCR errors, and had a heightened sense finding them. This
became very handy when I started sections where all the data was very
similar, like all the 2J** magnetron tubes. Towards the end I was
doing a page in less than two minutes. If you do a sample using a
"human-OCR" service, you may not see this effect, as the few samples
submitted might not get the guys and girls "in tune". Once "in tune",
I am sure the service will start to fly, and be done much quicker than
expected.
2) Being a tube geek anyway, many of the OCR errors, or even errors in
the source data, stuck out like a sore thumb. These were easy to
correct (and in some cases, correct me). Knowing what you are looking
at is a huge help. Think of it like how your 8th grade math teachers
taught you about story type problems - if the answer you get looks
wrong, it probably is. As it turns out, the final OCR'd data is
remarkable error free - except for original source errors.
3) It was extremely easy to break into chunks. Some days I could do a
bunch, others, none. Putting it aside for even a week did not get me
"out of tune".
4) It was not as much of a drag as I thought it would be. Every so
often I would discover some interesting tidbit in the listing, and
have to research it more. I bet if I was doing the same thing with a
source listing, perhaps with juicy comments from the programmers, I
would probably get distracted just as much, and thus learn a bunch of
things that I probably would never learn any other way.
Anyway, the point of this is that while the firmware listing in
question is quite big, it is certainly very doable. One person could
probably do it very on-and-off in a few months, or it could be broken
into chunks and done in a fraction of the time by a team of people. It
is not an insurmountable obstacle.
--
Will