Sergei Mikhailovich Prokudin-Gorskii was a photographer who captured the Russian Empire in color decades before color photography became widespread. His technique involved taking three separate exposures through red, green, and blue filters on glass plates. This project implements an algorithm to automatically align these images using metrics such as Euclidian Distance or Normalized Cross-Correlation, and then combine these three channel images into full-color photographs, displaying the vibrant images that were taken nearly a century ago.
For the single-scale alignment, I implemented an exhaustive search approach specifically for the low-resolution
images (cathedral.jpg, monastery.jpg, tobolsk.jpg). The algorithm searches over a window of -15 to 15 pixels
using the L2 (Euclidean) distance metric as the similarity measure. I found the optimal shift values by minimizing
the L2 distance metric, and after calculating optimal Euclidian Distances, I used np.roll to align the
images using the optimal shifts we detected.
The initial alignment however was not perfect due to the black borders present in the original glass plate scans. To address this, I implemented a 10% crop from all edges before computing the alignment metrics. This preprocessing step significantly improved the L2-based alignment quality for the single-scale approach.
For high-resolution images, the single-scale approach was computationally expensive and less accurate. To avoid this, I implemented a multi-scale pyramid approach with the following key improvements:
The pyramid alignment process works iteratively. Starting with the original high-resolution image, I downsample it by factors of 2 in a loop until reaching a minimum dimension of 256 pixels. At the coarsest level, I perform an exhaustive search over a larger window (-30 to 30 pixels) using NCC to find the best alignment. This coarse alignment is then propagated up to the next level by scaling the offsets by 2. At each finer level, I only need to search within ±2 pixels of the scaled estimate, dramatically reducing computation time while maintaining accuracy.
The NCC metric itself computes the normalized dot product between two images, making it more invariant to differences in brightness and contrast. By aligning both the blue and red channels to the green channel (instead of aligning to the blue channel), I ensure that any alignment errors don't compound, resulting in more consistent and accurate color reconstruction across all images.
After making the above changes, the alignment of the Emir image particularly demonstrated the superiority of using the G channel to align rather than the B channel.
I selected three additional images from the Prokudin-Gorskii collection to test my algorithm.
I also implemented Sobel edge detection as an alternative alignment method. The Sobel operator detects edges in the images, which can be more robust for alignment when the color channels have very different brightness distributions. However, after switching to aligning both channels to the green channel (rather than blue), the improvements from Sobel edge detection were not as dramatic as expected. The G channel alignment alone provided sufficiently good results for most images, including the challenging Emir photograph.
| Image | Resolution | Blue Offset (y, x) | Red Offset (y, x) |
|---|---|---|---|
| Cathedral | Low | (-5, -2) | (7, 1) |
| Monastery | Low | (3, -2) | (6, 1) |
| Tobolsk | Low | (-3, -3) | (4, 1) |
| Church | High | (-25, -4) | (33, -8) |
| Emir | High | (-49, -24) | (57, 17) |
| Harvesters | High | (-59, -17) | (65, -3) |
| Icon | High | (-41, -17) | (48, 5) |
| Italil | High | (-38, -21) | (39, 15) |
| Lastochikino | High | (3, 2) | (78, -7) |
| Lugano | High | (-41, 16) | (52, -13) |
| Melons | High | (-82, -11) | (96, 3) |
| Self Portrait | High | (-79, -29) | (98, 8) |
| Siren | High | (-49, 6) | (47, -19) |
| Three Generations | High | (-52, -14) | (59, -3) |
| Collection Image 1 | High | (-41, -5) | (64, -4) |
| Collection Image 2 | High | (-60, -28) | (66, 6) |
| Collection Image 3 | High | (-29, -1) | (90, -4) |