Genetic programming in the cloud

I've been working with genetic programming on and off since the early 90's. I haven't done anything with respect to research, but I have played with GP for various problems. Occasionally I've played with image processing in GP, and recently I've gotten interested in applying GP to the DIY Book Scanner dewarp problem. I don't have a particular direction or plan in mind (otherwise I'd be a Fully-Baked Maker), but I figure I'll start with some basic image processing routines and work up from there.

I use ECJ as my GP engine.

My original forays several years ago into image processing GP involved images of 16x16. I didn't really have a good enough processor or good enough libraries to handle bigger images. Today, I'm using bilevel images of actual pages, scaled down 8x. The resulting images are about 300x400, which is a huge step up from my little 16x16 toy images. Nevertheless, my laptop can process these images in GP well enough.

My first idea was to set up a problem where GP had to find a sequence of operations which would yield the following image: IMG.closeBrick(10, 2) & IMG.dilateBrick(2, 3). The closeBrick(w, h) operation performs a morphological close using a structuring element that is wxh, all hits, and center at w/2, h/2. Similarly, dilateBrick(w, h) performs a morphological dilate using a wxh structuring element, all hits, and center at w/2, h/2. The & operation performs a pixel-by-pixel AND operation.

As primitive functions, I used the following list:

IMG: → img: returns the source image
And: img, img → img: returns the pixel-by-pixel AND of two images
Or: img, img → img: returns the pixel-by-pixel OR of two images
Xor: img, img → img: returns the pixel-by-pixel XOR of two images
Inv: img → img: returns the pixel-by-pixel inverse of an image
Shift: int, int, img → img: shifts an image by a number of pixels
CloseBrick: int, int, img → img: morphologically closes an image with a brick
OpenBrick: int, int, img → img: morphologically opens an image with a brick
ErodeBrick: int, int, img → img: morphologically erodes an image with a brick
DilateBrick: int, int, img → img: morphologically dilates an image with a brick
W: → int: returns the width of the image
H: → int: returns the height of the image
One: → int: returns 1
MinusOne: → int: returns -1
+: int, int → int: adds two integers
-: int, int → int: subtracts two integers
*: int, int → int: multiplies two integers
/: int, int → int: safely divides two integers (returns 1 if divisor is 0)

The integer parameters to the morphological operators are negated if negative, and limited to 0-199. This means if, for example, H,-H are the parameters, the brick will actually be 199,199.

Individuals are scored against 167 images. Each pixel they get wrong adds one to the error count. The result is scaled by the total number of pixels (167x300x400) which doesn't affect the GP algorithm since it's only a scaling factor. Thus, 0% is a perfect score, and 100% is the worst score.

I started a run of 3000 individuals and let it run overnight. About twelve or so hours later, I had gotten 20 generations, and a score of 5%. Looking at the best individuals, it was clear that GP was approaching the answer, but very slowly. There was no way I was going to solve this problem, let alone a dewarping problem, with my laptop.

Then I remembered this thing that was being trumpeted a few months ago: the Amazon Elastic Compute Cloud. I did some research into it, and found that it was surprisingly (and refreshingly) cheap. No cost to sign up, and you pay for virtual compute instances by the hour and storage by the month. Linux compute instances are anywhere from 9 cents per hour up to $2.40 per hour, depending on memory, storage, number of cores, and so on, but I've found that using spot instances, where you bid on instances, you can get the price down by more than half.

The instance I chose was an 8-core 64-bit instance with low memory, for 68 cents per hour, or 25-30 cents per hour spot. I can run the "master" GP program from my laptop, which would store the individuals and perform evolution on them (the fast part), and run "slaves" on the instances (the slow part). With ten instances (for about three bucks an hour) I can run 80 slaves, and so run about 80 times as fast as my laptop. So in a single hour I might be able to get 130 generations or so.

The nice thing about ECJ is that slaves can come up and down while the master runs, and the master will just keep distributing individuals to whatever slaves are up for evaluation. So even if my spot instances die or are terminated, ECJ will still run.

I tried running with a single instance, and after about 40 generations, a little over an hour, GP had solved the problem. The winning individual:

IMG.dilateBrick(1,1).dilateBrick(199,1) &
IMG.dilateBrick(199,2).dilateBrick(199,199).erodeBrick(199,2) &
IMG.closeBrick(199,2) &
IMG.dilateBrick(1,1).dilateBrick(2,3)

Note that dilateBrick(1,1) does nothing. The fact that this individual does the same thing in a different way than the required operation is normal for GP.

Runs with larger images take very long, even in the cloud. I may have to rethink the whole idea. Still, it's an idea, even if half-baked.

Dewarping pages

So as I said a while back, one of the major issues with books images from cameras is that the pages are not flat. The ideal dewarper would correct for keystoning (where the margins of the page are not perfectly rectangular) and curling (where the individual text lines are not straight). I did eventually come up with an algorithm that handles many kinds of keystoning and curling. It works like this. Starting with a bilevel image that consists only of the page on a white background. I will use sample closeups of an actual page, at around 600 dpi:

1. Find the text lines:

a. Label the connected components (8-connected). This is generally fast when using a one-pass algorithm.

b. For each connected component, find the upper right and lower left coordinates of a box which fully encloses each component. This simply involves finding the minimum and maximum x-coordinate and minimum and maximum y-coordinate of each pixel in a given component.

c. Create a new, blank white image. For each connected component, and for each x-coordinate in its box, determine the average y-coordinate, and put a black pixel in the new image at the (x, average y) point. The resulting image traces the center of gravity in the y direction of each connected component.

d. In the new image, locate the leftmost pixel. If there is more than one leftmost pixel, choose the topmost of those. If the entire image is blank, we have found all the lines, so go to step 2.

e. Set up a new point array and add the pixel to the array. Remove the pixel from the image.

f. Set up a new array of 30 y-values, and initialize each element to the y-coordinate of the chosen pixel.

g. Compute the average of all 30 y-values. Obviously, initially this result will be the y-coordinate of the chosen pixel. Call the average y.

h. Starting from dx = 1 to some skip bound (I chose 1/20 of the image's width), search the new image at (x+dx, y+dy), where dy runs from negative some search bound to positive some search bound (I chose +/- 30), for a black pixel.

i. If a pixel was found, we're going to treat this as the next pixel in the line: add the coordinates of the found pixel to the point array, remove the pixel from the image, move all values in the y-value array one to the left, and add the y-coordinate of the found pixel to the end of the array. In other words, the y-value array represents a moving average. Go to step g.

j. If a pixel was not found, we've found all the pixels we can for this line. Close out the point array (store it somewhere as a found line), and now determine if the found line should be kept as valid enough for the algorithm: the number of pixels in the line must be greater than 1/3 the width of the image, and the difference between the topmost and bottommost pixel in the line must be less than 1/20 the height of the image. If the line passes these tests, keep the line, otherwise throw it away. In any case, go to step d.

2. Approximate the left and right margins by a straight line:

a. Create an array of lines for the left side, and another array for the right side. Initialize each array to the full set of found lines.

b. For the left side, pick out the leftmost pixel of each line, and approximate a straight line to those pixels. The equation should be of the form x = Ay + B: the line is more vertical than horizontal, so this form is more appropriate.

c. Do the same for the right side, except use the rightmost pixels of each line.

d. For the left side, for each line in the left side array, determine if the leftmost pixel is to the right of the approximated left margin by more than some threshold (I chose 50). That is, take the y coordinate of the leftmost pixel, compute x = Ay + B, and determine if the x-coordinate of the leftmost pixel is greater than x + 50. If so, set reapproximateLeft to true. Remove the line whose leftmost pixel is the farthest past the right of the approximated left margin from the left side array.

e. Do the same for the right side, except determine if the rightmost pixel is to the left of the approximated right margin, and set reapproximateRight, and remove the line whose rightmost pixel is the farthest past the left.

f. If reapproximateLeft is true, do step b again, and if reapproximateRight is true, do step c again. If either was true, go to step d. If neither was true, we have found the best approximation to the left and right margins.

3. Determine the forward and reverse keystone correction equation. The forward equation transforms an x-coordinate in the original image into the x-coordinate for the dekeystoned image. The reverse equation transforms an x-coordinate in the dekeystoned image into the x-coordinate for the original image. The y-coordinate is not transformed because we are only straightening the image based on the tilt of the left and right margins. Also, the largest separation between the margins is kept, while the smallest separation is expanded:

Where:

xTL,yTL is the top point at the left margin (B,0)
xTR,yTR is the top point at the right margin (B,0)
xBL,yBR is the bottom point at the left margin (Ah+B,h)
xBR,yBR is the bottom point at the right margin (Ah+b,h)

Also, if the space between margins at the top (xTR-xTL) is greater than the space between margins at the bottom (xBR-xBL):

x'TL = xTL
x'TR = xTR
x'BL = xTL
x'BR = xTR

Otherwise:

x'TL = xBL
x'TR = xBR
x'BL = xBL
x'BR = xBR

Now the reverse transform equation is simply x = a0 + a1x' + a2y + a3x'y while the forward transform equation is x' = (x - a0 - a2y) / (a1 + a3y).

4. Perform a forward transform on the pixels in every line of the line array.

5. The margins are now straight. Store the left margin x-coordinate as x'(B,0) (i.e. using the forward transform). Do the same for the right margin x-coordinate.

6. Order the lines in the array by their first y-coordinate (lowest first).

7. For every line, compute the topmost (smallest) y-coordinate (topY) and bottommost (largest) y-coordinate (botY).

8. For every line in order, perform a nonlinear regression to fit the line to an equation y = b0 + b1x + b2(1.1-x)b3 [+b4 sin(2πx + b5) ] where the sinusoidal portion is optional for each page. The initial guess for the very first line should be b0 = 10, b1 = 1, b2 = 1, b3 = 2, b4 = 1, b5 = 0. The initial guess for each subsequent line should be equal to the parameters found for solution of the previous line.

Here, however, rather than using the actual x and y coordinates of each line's pixel for the nonlinear regression, we will instead treat x as the fraction across the page from left margin to right margin (i.e. new x = [x-left]/[right-left]) and y similarly as [y-topY]/[botY-topY]. This normalizes the coordinates such that the left margin is x=0, the right margin is x=1, the top of the line is y=0 and the bottom of the line is y=1. This normalization seems to help the nonlinear regression fit.

Optionally, perform a backwards pass: do nonlinear regression (still keeping the initial guesses) for the lines in reverse order. This will help to make the equations for the lines look more similar in shape to each other.

And finally, optionally, perform a second forward and backwards pass. Again, this may help the estimated lines look even more similar in shape to each other.

The reason that the sinusoid is optional is that sometimes without the sinusoid, the estimated lines end up with a wavy component. Adding a sinusoid will help remove that wavy component.

The nonlinear regression step is one of the slowest parts of the algorithm, and could stand a great deal of improvement.

9. For every (x,y) point in the original image, modify its x coordinate through a forward transform, and then determine where it is relative to the estimated lines, and linearly interpolate the y coordinate. The idea here is that every pixel along an estimated line should have been at exactly the same y coordinate, i.e. the curved line should have been a straight line.

This is also a slow part of the algorithm, and can also stand some improvement.

Here's an example of the final product, showing before and after. You can pick up the software via the Scan Tailor thread in DIY Book Scanner.

Book Scanning

"The first five guys to sign up were two mechanical engineers, two software developers, and an intellectual property lawyer."—Dan Reetz, speaking about the diybookscanner.org forum at the New York Law School D is for Digitize conference, October 9, 2009

I personally scan the books I bought because I'm tired of thousands of them cluttering up my house, ending up lost at the bottom of a box marked Dishes, getting eaten by bugs, or attacked by acid inherent in the paper. If you don't like the idea of reading electronic editions of books, stop reading here! Also, if you think format-shifting is intellectual property theft, stop reading here. There are as many reasons people have for scanning books as there are people:

  • Saving family history documents
  • Format-shifting for the print-disabled
  • Archiving rare books
  • Annoyed at the space a huge collection of physical books occupies
  • Appalled at the prices for college text books
  • Increasing access to out-of-copyright works from libraries
  • A cheaper, more book-friendly method of scanning
  • A mobile alternative to hundreds of pounds of reference books

I didn't make these up. Each of these represents a real person on the DIY Book Scanner forum. I won't pull any punches: some uses of this technology are copyright infringement. Some are not. This is technology easily built by anyone with a few hundred U.S. dollars (mostly for the two 8+ megapixel cameras), and there is no reasonable way to stop it. Those against format-shifting should have a long, hard think about what that means. Those who believe the technology can be embargoed should probably stop reading this blog.

Dan Reetz started it all by posting his initial design on instructibles.com — winning a competition for a laser etcher in the process. After pages and pages of comments and refinements, he decided to set up a website just for the design, and invited everyone to come on in. It's been going strong ever since.

"If Dan Reetz didn’t exist, it would be necessary for Cory Doctorow to invent him." —Author Robin Sloan, The Future of the Book: Bringing Book Scanning Home, October 12, 2009

Having built a book scanner, and having digitized a few books on it at about 10 pages a minute — I'm slow and careful; higher speeds are regularly attained by others — I've found that the images have some distortion because the pages are not pressed completely flat. A minor cause of additional distortion is lens geometry. How to postprocess the images to end up with nice undistorted images?

Several ideas that have been kicked around the DIY Book Scanner forum are:

  • calibration images (see here and here)
  • image analysis and modeling for a mathematical distortion inverse transformation (see here)
  • stereoscopic imaging for direct measurement of distortion (briefly mentioned here)
  • manual page straightening (if all else fails)

After fiddling around briefly with the first, and spending a lot of time on the second, my current objective is to work with the third idea, stereoscopic images. I believe this holds the most promise of accurately determining page distortion without relying on the content of the page. So my next step is to convert my two-camera scanner into a four-camera model, after first experimenting with placing the existing two cameras on the same side to see if stereoscopic imaging is practical without a lot of fiddling.