I'd love to write a program to "OCR" punched card images. Now, if
I only had some spare time. :-)
Ain't that the truth.
Scanners are cheap and ubiquitous. You could lay
several cards on the
scanner at once, perhaps placing a specially-colored paper on the
normally white reflective lid, and presto - like chroma-key on video,
you can easily "see" the borders, index notch and holes. I wonder if
any of today's "paper port" auto-feeding cheapo scanners would handle
a punched card - I don't see why not.
I've got a Brother el-cheapo parallel port scanner that was able to
scan punch cards after I put the card (singular) in the torn paper
sleeve, in front of a piece of purple construction paper (I was going
for contrast and it was the darkest I had). The scans at 400 dpi look
great. I shrunk them 4:1 in Photoshop and saved them as GIFs. My
thought was to use the GIF library by Tom Boutell to slurp in a GIF,
run a high-frequency (edge detect) on them, guess the dpi based on
height and width, look for the notched corner (or be told the orientation
by the user) and pixel-column by pixel column, look for holes or the
absence thereof.
It could save the card data in Jone's proposed
file format. One advantage
of this system would be that it could handle aged cards that some physical
imperfection (like dents from rubber bands, folds, worn edges, etc.) that
might jam a card reader.
I hadn't thought of the benefits to damaged cards, but I agree on using
Doug Jones' format (which I just discovered, thanks to this thread).
The one drawback I've seen is speed. It's a pain in the butt to
stick a card in a scanner, scan the card, save the image, etc., etc. I
can't see reading much more than a couple of cards per minute. With
a modified scanner (to back up the scan area with black, not white), or
with suitably modified cards - say by xeroxing cards on a xerox machine
with a black background and scanning the xeroxes, it could be made to go
faster. Even a digital camera would probably be faster than a scanner.
With that speed, it's more efficient to take a block of lumber, dado
out a channel, drill 12 holes and wire up a dozen phototransistors into
a parallel port. Maybe a Basic Stamp could drive it? That way, any
machine, parallel equipped or not, could read the cards over a serial
port. The BS might even be trainable to output the file in D.J. format.
One issue I haven't gotten around is how rigidly oriented the cards have to
be. The technique I've been mulling around will work, providing that the
pixels of one card column don't overlap the pixels of another card column
in the same pixel column. I'm trying to do this in a portable manner,
in C, with code that is completely insensitive to OS and windowing scheme.
Right now, I'm scanning cards with Win95 (because of the PP scanner) and
will be interpreting them on a Linux box (because it's nicer to program).
Anyone who suggests VisualBasic will be forced to stuff the chad back into
a case of lace cards. ;-)
-ethan