proj1/

├── overview/ Colorizing the Prokudin-Gorskii Photo Collection

├── single_scale_alignment/ L2 Distance Method

├── multi_scale_pyramid/ NCC with Image Pyramids

├── additional/ Extra Examples from the Collection

├── bells_and_whistles/ Sobel Edge Detection

└── summary/ Results Table & Conclusions

Images of the Russian Empire

Colorizing the Prokudin-Gorskii Photo Collection

Overview

Sergei Mikhailovich Prokudin-Gorskii was a photographer who captured the Russian Empire in color decades before color photography became widespread. His technique involved taking three separate exposures through red, green, and blue filters on glass plates. This project implements an algorithm to automatically align these images using metrics such as Euclidian Distance or Normalized Cross-Correlation, and then combine these three channel images into full-color photographs, displaying the vibrant images that were taken nearly a century ago.

Part 1: Single-Scale Alignment

Using L2 (Euclidean) Distance for Low-Resolution Images

Implementation Approach

For the single-scale alignment, I implemented an exhaustive search approach specifically for the low-resolution images (cathedral.jpg, monastery.jpg, tobolsk.jpg). The algorithm searches over a window of -15 to 15 pixels using the L2 (Euclidean) distance metric as the similarity measure. I found the optimal shift values by minimizing the L2 distance metric, and after calculating optimal Euclidian Distances, I used np.roll to align the images using the optimal shifts we detected.

Handling Border Artifacts

The initial alignment however was not perfect due to the black borders present in the original glass plate scans. To address this, I implemented a 10% crop from all edges before computing the alignment metrics. This preprocessing step significantly improved the L2-based alignment quality for the single-scale approach.

Uncropped Example 1

Uncropped Example 2

Single-Scale Results

Cathedral

B: (-5, -2) | R: (7, 1)

Monastery

B: (3, -2) | R: (6, 1)

Tobolsk

B: (-3, -3) | R: (4, 1)

Part 2: Multi-Scale Pyramid Alignment

Using NCC with Image Pyramids for High-Resolution Images

Transitioning to NCC and Pyramids

For high-resolution images, the single-scale approach was computationally expensive and less accurate. To avoid this, I implemented a multi-scale pyramid approach with the following key improvements:

Normalized Cross-Correlation (NCC): Switched from L2 distance to NCC, which is more robust to brightness variations between color channels
Image Pyramids: Downsampled each BGR channel by a factor of 2 to create multiple resolution levels
Adaptive Search Windows: Used -30 to 30 pixels at the coarsest level, then ±2 pixels at finer levels for refinement
Increased Cropping: Extended border cropping from 10% to 15% for better border frame removal
Green Channel Reference: Aligned both B and R channels to G rather than alinging to B. I noticed this method improved alignment to make the resulting image about perfectly aligned

Image Pyramid Algorithm Process

The pyramid alignment process works iteratively. Starting with the original high-resolution image, I downsample it by factors of 2 in a loop until reaching a minimum dimension of 256 pixels. At the coarsest level, I perform an exhaustive search over a larger window (-30 to 30 pixels) using NCC to find the best alignment. This coarse alignment is then propagated up to the next level by scaling the offsets by 2. At each finer level, I only need to search within ±2 pixels of the scaled estimate, dramatically reducing computation time while maintaining accuracy.

The NCC metric itself computes the normalized dot product between two images, making it more invariant to differences in brightness and contrast. By aligning both the blue and red channels to the green channel (instead of aligning to the blue channel), I ensure that any alignment errors don't compound, resulting in more consistent and accurate color reconstruction across all images.

The Emir Challenge - Comparing Methods

After making the above changes, the alignment of the Emir image particularly demonstrated the superiority of using the G channel to align rather than the B channel.

Blue Channel Alignment

Green Channel Alignment

Multi-Scale Pyramid Results

Emir of Bukhara

B: (-49, -24) | R: (57, 17)

Church

B: (-25, -4) | R: (33, -8)

Harvesters

B: (-59, -17) | R: (65, -3)

Icon

B: (-41, -17) | R: (48, 5)

Self Portrait

B: (-79, -29) | R: (98, 8)

Three Generations

B: (-52, -14) | R: (59, -3)

Melons

B: (-82, -11) | R: (96, 3)

Italil

B: (-38, -21) | R: (39, 15)

Lugano

B: (-41, 16) | R: (52, -13)

Lastochikino

B: (3, 2) | R: (78, -7)

Siren

B: (-49, 6) | R: (47, -19)

Additional Examples from the Collection

I selected three additional images from the Prokudin-Gorskii collection to test my algorithm.

Collection Image 1

B: (-41, -5) | R: (64, -4)

Collection Image 2

B: (-60, -28) | R: (66, 6)

Collection Image 3

B: (-29, -1) | R: (90, -4)

Bells and Whistles

I also implemented Sobel edge detection as an alternative alignment method. The Sobel operator detects edges in the images, which can be more robust for alignment when the color channels have very different brightness distributions. However, after switching to aligning both channels to the green channel (rather than blue), the improvements from Sobel edge detection were not as dramatic as expected. The G channel alignment alone provided sufficiently good results for most images, including the challenging Emir photograph.

Results Summary

Image	Resolution	Blue Offset (y, x)	Red Offset (y, x)
Cathedral	Low	(-5, -2)	(7, 1)
Monastery	Low	(3, -2)	(6, 1)
Tobolsk	Low	(-3, -3)	(4, 1)
Church	High	(-25, -4)	(33, -8)
Emir	High	(-49, -24)	(57, 17)
Harvesters	High	(-59, -17)	(65, -3)
Icon	High	(-41, -17)	(48, 5)
Italil	High	(-38, -21)	(39, 15)
Lastochikino	High	(3, 2)	(78, -7)
Lugano	High	(-41, 16)	(52, -13)
Melons	High	(-82, -11)	(96, 3)
Self Portrait	High	(-79, -29)	(98, 8)
Siren	High	(-49, 6)	(47, -19)
Three Generations	High	(-52, -14)	(59, -3)
Collection Image 1	High	(-41, -5)	(64, -4)
Collection Image 2	High	(-60, -28)	(66, 6)
Collection Image 3	High	(-29, -1)	(90, -4)