Book Scanning

"The first five guys to sign up were two mechanical engineers, two software developers, and an intellectual property lawyer."—Dan Reetz, speaking about the diybookscanner.org forum at the New York Law School D is for Digitize conference, October 9, 2009

I personally scan the books I bought because I'm tired of thousands of them cluttering up my house, ending up lost at the bottom of a box marked Dishes, getting eaten by bugs, or attacked by acid inherent in the paper. If you don't like the idea of reading electronic editions of books, stop reading here! Also, if you think format-shifting is intellectual property theft, stop reading here. There are as many reasons people have for scanning books as there are people:

  • Saving family history documents
  • Format-shifting for the print-disabled
  • Archiving rare books
  • Annoyed at the space a huge collection of physical books occupies
  • Appalled at the prices for college text books
  • Increasing access to out-of-copyright works from libraries
  • A cheaper, more book-friendly method of scanning
  • A mobile alternative to hundreds of pounds of reference books

I didn't make these up. Each of these represents a real person on the DIY Book Scanner forum. I won't pull any punches: some uses of this technology are copyright infringement. Some are not. This is technology easily built by anyone with a few hundred U.S. dollars (mostly for the two 8+ megapixel cameras), and there is no reasonable way to stop it. Those against format-shifting should have a long, hard think about what that means. Those who believe the technology can be embargoed should probably stop reading this blog.

Dan Reetz started it all by posting his initial design on instructibles.com — winning a competition for a laser etcher in the process. After pages and pages of comments and refinements, he decided to set up a website just for the design, and invited everyone to come on in. It's been going strong ever since.

"If Dan Reetz didn’t exist, it would be necessary for Cory Doctorow to invent him." —Author Robin Sloan, The Future of the Book: Bringing Book Scanning Home, October 12, 2009

Having built a book scanner, and having digitized a few books on it at about 10 pages a minute — I'm slow and careful; higher speeds are regularly attained by others — I've found that the images have some distortion because the pages are not pressed completely flat. A minor cause of additional distortion is lens geometry. How to postprocess the images to end up with nice undistorted images?

Several ideas that have been kicked around the DIY Book Scanner forum are:

  • calibration images (see here and here)
  • image analysis and modeling for a mathematical distortion inverse transformation (see here)
  • stereoscopic imaging for direct measurement of distortion (briefly mentioned here)
  • manual page straightening (if all else fails)

After fiddling around briefly with the first, and spending a lot of time on the second, my current objective is to work with the third idea, stereoscopic images. I believe this holds the most promise of accurately determining page distortion without relying on the content of the page. So my next step is to convert my two-camera scanner into a four-camera model, after first experimenting with placing the existing two cameras on the same side to see if stereoscopic imaging is practical without a lot of fiddling.