../

proj1/

├── overview/ Colorizing the Prokudin-Gorskii Photo Collection
├── single_scale_alignment/ L2 Distance Method
├── multi_scale_pyramid/ NCC with Image Pyramids
├── additional/ Extra Examples from the Collection
├── bells_and_whistles/ Sobel Edge Detection
└── summary/ Results Table & Conclusions

Images of the Russian Empire

Colorizing the Prokudin-Gorskii Photo Collection

Overview

Sergei Mikhailovich Prokudin-Gorskii was a photographer who captured the Russian Empire in color decades before color photography became widespread. His technique involved taking three separate exposures through red, green, and blue filters on glass plates. This project implements an algorithm to automatically align these images using metrics such as Euclidian Distance or Normalized Cross-Correlation, and then combine these three channel images into full-color photographs, displaying the vibrant images that were taken nearly a century ago.

Part 1: Single-Scale Alignment

Using L2 (Euclidean) Distance for Low-Resolution Images

Implementation Approach

For the single-scale alignment, I implemented an exhaustive search approach specifically for the low-resolution images (cathedral.jpg, monastery.jpg, tobolsk.jpg). The algorithm searches over a window of -15 to 15 pixels using the L2 (Euclidean) distance metric as the similarity measure. I found the optimal shift values by minimizing the L2 distance metric, and after calculating optimal Euclidian Distances, I used np.roll to align the images using the optimal shifts we detected.

Handling Border Artifacts

The initial alignment however was not perfect due to the black borders present in the original glass plate scans. To address this, I implemented a 10% crop from all edges before computing the alignment metrics. This preprocessing step significantly improved the L2-based alignment quality for the single-scale approach.

Uncropped Example 1
Uncropped Example 1
Uncropped Example 2
Uncropped Example 2

Single-Scale Results

Cathedral
Cathedral
B: (-5, -2) | R: (7, 1)
Monastery
Monastery
B: (3, -2) | R: (6, 1)
Tobolsk
Tobolsk
B: (-3, -3) | R: (4, 1)

Part 2: Multi-Scale Pyramid Alignment

Using NCC with Image Pyramids for High-Resolution Images

Transitioning to NCC and Pyramids

For high-resolution images, the single-scale approach was computationally expensive and less accurate. To avoid this, I implemented a multi-scale pyramid approach with the following key improvements:

Image Pyramid Algorithm Process

The pyramid alignment process works iteratively. Starting with the original high-resolution image, I downsample it by factors of 2 in a loop until reaching a minimum dimension of 256 pixels. At the coarsest level, I perform an exhaustive search over a larger window (-30 to 30 pixels) using NCC to find the best alignment. This coarse alignment is then propagated up to the next level by scaling the offsets by 2. At each finer level, I only need to search within ±2 pixels of the scaled estimate, dramatically reducing computation time while maintaining accuracy.

The NCC metric itself computes the normalized dot product between two images, making it more invariant to differences in brightness and contrast. By aligning both the blue and red channels to the green channel (instead of aligning to the blue channel), I ensure that any alignment errors don't compound, resulting in more consistent and accurate color reconstruction across all images.

The Emir Challenge - Comparing Methods

After making the above changes, the alignment of the Emir image particularly demonstrated the superiority of using the G channel to align rather than the B channel.

Emir with pixel-based alignment
Blue Channel Alignment
Emir with edge-based alignment
Green Channel Alignment

Multi-Scale Pyramid Results

Emir
Emir of Bukhara
B: (-49, -24) | R: (57, 17)
Church
Church
B: (-25, -4) | R: (33, -8)
Harvesters
Harvesters
B: (-59, -17) | R: (65, -3)
Icon
Icon
B: (-41, -17) | R: (48, 5)
Self Portrait
Self Portrait
B: (-79, -29) | R: (98, 8)
Three Generations
Three Generations
B: (-52, -14) | R: (59, -3)
Melons
Melons
B: (-82, -11) | R: (96, 3)
Italil
Italil
B: (-38, -21) | R: (39, 15)
Lugano
Lugano
B: (-41, 16) | R: (52, -13)
Lastochikino
Lastochikino
B: (3, 2) | R: (78, -7)
Siren
Siren
B: (-49, 6) | R: (47, -19)

Additional Examples from the Collection

I selected three additional images from the Prokudin-Gorskii collection to test my algorithm.

Favorite 1
Collection Image 1
B: (-41, -5) | R: (64, -4)
Favorite 2
Collection Image 2
B: (-60, -28) | R: (66, 6)
Favorite 3
Collection Image 3
B: (-29, -1) | R: (90, -4)

Bells and Whistles

I also implemented Sobel edge detection as an alternative alignment method. The Sobel operator detects edges in the images, which can be more robust for alignment when the color channels have very different brightness distributions. However, after switching to aligning both channels to the green channel (rather than blue), the improvements from Sobel edge detection were not as dramatic as expected. The G channel alignment alone provided sufficiently good results for most images, including the challenging Emir photograph.

Results Summary

Image Resolution Blue Offset (y, x) Red Offset (y, x)
Cathedral Low (-5, -2) (7, 1)
Monastery Low (3, -2) (6, 1)
Tobolsk Low (-3, -3) (4, 1)
Church High (-25, -4) (33, -8)
Emir High (-49, -24) (57, 17)
Harvesters High (-59, -17) (65, -3)
Icon High (-41, -17) (48, 5)
Italil High (-38, -21) (39, 15)
Lastochikino High (3, 2) (78, -7)
Lugano High (-41, 16) (52, -13)
Melons High (-82, -11) (96, 3)
Self Portrait High (-79, -29) (98, 8)
Siren High (-49, 6) (47, -19)
Three Generations High (-52, -14) (59, -3)
Collection Image 1 High (-41, -5) (64, -4)
Collection Image 2 High (-60, -28) (66, 6)
Collection Image 3 High (-29, -1) (90, -4)
© 2025 Sukhamrit Singh. All rights reserved.