Dewarping pages

So as I said a while back, one of the major issues with books images from cameras is that the pages are not flat. The ideal dewarper would correct for keystoning (where the margins of the page are not perfectly rectangular) and curling (where the individual text lines are not straight). I did eventually come up with an algorithm that handles many kinds of keystoning and curling. It works like this. Starting with a bilevel image that consists only of the page on a white background. I will use sample closeups of an actual page, at around 600 dpi:

1. Find the text lines:

a. Label the connected components (8-connected). This is generally fast when using a one-pass algorithm.

b. For each connected component, find the upper right and lower left coordinates of a box which fully encloses each component. This simply involves finding the minimum and maximum x-coordinate and minimum and maximum y-coordinate of each pixel in a given component.

c. Create a new, blank white image. For each connected component, and for each x-coordinate in its box, determine the average y-coordinate, and put a black pixel in the new image at the (x, average y) point. The resulting image traces the center of gravity in the y direction of each connected component.

d. In the new image, locate the leftmost pixel. If there is more than one leftmost pixel, choose the topmost of those. If the entire image is blank, we have found all the lines, so go to step 2.

e. Set up a new point array and add the pixel to the array. Remove the pixel from the image.

f. Set up a new array of 30 y-values, and initialize each element to the y-coordinate of the chosen pixel.

g. Compute the average of all 30 y-values. Obviously, initially this result will be the y-coordinate of the chosen pixel. Call the average y.

h. Starting from dx = 1 to some skip bound (I chose 1/20 of the image's width), search the new image at (x+dx, y+dy), where dy runs from negative some search bound to positive some search bound (I chose +/- 30), for a black pixel.

i. If a pixel was found, we're going to treat this as the next pixel in the line: add the coordinates of the found pixel to the point array, remove the pixel from the image, move all values in the y-value array one to the left, and add the y-coordinate of the found pixel to the end of the array. In other words, the y-value array represents a moving average. Go to step g.

j. If a pixel was not found, we've found all the pixels we can for this line. Close out the point array (store it somewhere as a found line), and now determine if the found line should be kept as valid enough for the algorithm: the number of pixels in the line must be greater than 1/3 the width of the image, and the difference between the topmost and bottommost pixel in the line must be less than 1/20 the height of the image. If the line passes these tests, keep the line, otherwise throw it away. In any case, go to step d.

2. Approximate the left and right margins by a straight line:

a. Create an array of lines for the left side, and another array for the right side. Initialize each array to the full set of found lines.

b. For the left side, pick out the leftmost pixel of each line, and approximate a straight line to those pixels. The equation should be of the form x = Ay + B: the line is more vertical than horizontal, so this form is more appropriate.

c. Do the same for the right side, except use the rightmost pixels of each line.

d. For the left side, for each line in the left side array, determine if the leftmost pixel is to the right of the approximated left margin by more than some threshold (I chose 50). That is, take the y coordinate of the leftmost pixel, compute x = Ay + B, and determine if the x-coordinate of the leftmost pixel is greater than x + 50. If so, set reapproximateLeft to true. Remove the line whose leftmost pixel is the farthest past the right of the approximated left margin from the left side array.

e. Do the same for the right side, except determine if the rightmost pixel is to the left of the approximated right margin, and set reapproximateRight, and remove the line whose rightmost pixel is the farthest past the left.

f. If reapproximateLeft is true, do step b again, and if reapproximateRight is true, do step c again. If either was true, go to step d. If neither was true, we have found the best approximation to the left and right margins.

3. Determine the forward and reverse keystone correction equation. The forward equation transforms an x-coordinate in the original image into the x-coordinate for the dekeystoned image. The reverse equation transforms an x-coordinate in the dekeystoned image into the x-coordinate for the original image. The y-coordinate is not transformed because we are only straightening the image based on the tilt of the left and right margins. Also, the largest separation between the margins is kept, while the smallest separation is expanded:


xTL,yTL is the top point at the left margin (B,0)
xTR,yTR is the top point at the right margin (B,0)
xBL,yBR is the bottom point at the left margin (Ah+B,h)
xBR,yBR is the bottom point at the right margin (Ah+b,h)

Also, if the space between margins at the top (xTR-xTL) is greater than the space between margins at the bottom (xBR-xBL):

x'TL = xTL
x'TR = xTR
x'BL = xTL
x'BR = xTR


x'TL = xBL
x'TR = xBR
x'BL = xBL
x'BR = xBR

Now the reverse transform equation is simply x = a0 + a1x' + a2y + a3x'y while the forward transform equation is x' = (x - a0 - a2y) / (a1 + a3y).

4. Perform a forward transform on the pixels in every line of the line array.

5. The margins are now straight. Store the left margin x-coordinate as x'(B,0) (i.e. using the forward transform). Do the same for the right margin x-coordinate.

6. Order the lines in the array by their first y-coordinate (lowest first).

7. For every line, compute the topmost (smallest) y-coordinate (topY) and bottommost (largest) y-coordinate (botY).

8. For every line in order, perform a nonlinear regression to fit the line to an equation y = b0 + b1x + b2(1.1-x)b3 [+b4 sin(2πx + b5) ] where the sinusoidal portion is optional for each page. The initial guess for the very first line should be b0 = 10, b1 = 1, b2 = 1, b3 = 2, b4 = 1, b5 = 0. The initial guess for each subsequent line should be equal to the parameters found for solution of the previous line.

Here, however, rather than using the actual x and y coordinates of each line's pixel for the nonlinear regression, we will instead treat x as the fraction across the page from left margin to right margin (i.e. new x = [x-left]/[right-left]) and y similarly as [y-topY]/[botY-topY]. This normalizes the coordinates such that the left margin is x=0, the right margin is x=1, the top of the line is y=0 and the bottom of the line is y=1. This normalization seems to help the nonlinear regression fit.

Optionally, perform a backwards pass: do nonlinear regression (still keeping the initial guesses) for the lines in reverse order. This will help to make the equations for the lines look more similar in shape to each other.

And finally, optionally, perform a second forward and backwards pass. Again, this may help the estimated lines look even more similar in shape to each other.

The reason that the sinusoid is optional is that sometimes without the sinusoid, the estimated lines end up with a wavy component. Adding a sinusoid will help remove that wavy component.

The nonlinear regression step is one of the slowest parts of the algorithm, and could stand a great deal of improvement.

9. For every (x,y) point in the original image, modify its x coordinate through a forward transform, and then determine where it is relative to the estimated lines, and linearly interpolate the y coordinate. The idea here is that every pixel along an estimated line should have been at exactly the same y coordinate, i.e. the curved line should have been a straight line.

This is also a slow part of the algorithm, and can also stand some improvement.

Here's an example of the final product, showing before and after. You can pick up the software via the Scan Tailor thread in DIY Book Scanner.