Abstract

This report demonstrates the application of Principal Component Analysis (PCA) for image compression and color extraction. We analyze a painting by a Ukrainian artist Mykola Pymonenko “Vorozhinnia” to illustrate how dimensionality reduction techniques can compress images while maintaining visual quality. Additionally, we explore color transfer methods using k-means clustering and LAB color space transformations to extract the pixels’ colors to new images.

1. Introduction

Principal Component Analysis (PCA) is a fundamental statistical technique used to analyze and simplify complex datasets by reducing their dimensionality. According to a scientific overview on ScienceDirect, PCA is widely used in computer science (and many other fields) for dimensionality reduction and feature extraction: it projects high-dimensional data onto a lower-dimensional linear subspace that retains the maximum possible variance from the original dataset. This is done by identifying directions of greatest variability and representing them as orthogonal (uncorrelated) axes called principal components. The first principal component captures the largest amount of variance in the data, the second captures the largest remaining variance orthogonal to the first, and so on.

2. Data Description

2.1 Loading the Original Image

# Load image
img_path <- "800px-Mykola_Pymonenko-Vorozhinnia.jpg.webp"
img <- readJPEG(img_path)

# Display original image
plot(image_read(img_path))
title(main = "Original Color Image: Mykola Pymonenko - Vorozhinnia")

# Image information
img_info <- image_info(image_read(img_path))
print(img_info)
## # A tibble: 1 × 7
##   format width height colorspace matte filesize density
##   <chr>  <int>  <int> <chr>      <lgl>    <int> <chr>  
## 1 JPEG     800   1150 sRGB       FALSE   129585 72x72

Image Specifications:

  • Format: WEBP (converted from JPEG)
  • Dimensions: 800 × 1150 pixels
  • Total pixels: 920,000 pixels
  • Color space: RGB (3 channels)

3. Converting to Greyscale

img2 <- image_read(img_path)
gray <- image_data(img2, channels = "gray")
photo.bw_rot <- matrix(as.integer(gray), nrow = dim(gray)[3], ncol = dim(gray)[2]) / 255
plot(1, type="n")  
rasterImage(photo.bw_rot, 0.8, 0.5,1.1,1.5)

4. RGB Channel Extraction

The RGB color model represents each pixel using three intensity values (Red, Green, Blue), each ranging from 0 to 1. For this 800 × 1150 image, each color channel is stored as a matrix of 920,000 values.

print(paste("Red channel dimensions:", dim(red_img)[1], "×", dim(red_img)[2]))
## [1] "Red channel dimensions: 1150 × 800"
print(paste("Green channel dimensions:", dim(green_img)[1], "×", dim(green_img)[2]))
## [1] "Green channel dimensions: 1150 × 800"
print(paste("Blue channel dimensions:", dim(blue_img)[1], "×", dim(blue_img)[2]))
## [1] "Blue channel dimensions: 1150 × 800"

5. Principal Component Analysis

5.1 PCA Computation

PCA is applied separately to each color channel to identify the principal components that capture the most variance in the data.

# Perform PCA on each channel
r.pca <- prcomp(red_img, center = FALSE)
g.pca <- prcomp(green_img, center = FALSE)
b.pca <- prcomp(blue_img, center = FALSE)

# Display PCA dimensions
print(paste("Number of principal components:", ncol(r.pca$x)))
## [1] "Number of principal components: 800"

5.2 Variance Explained

The scree plots below show the percentage of variance explained by each principal component. The first few components capture the majority of the information, enabling effective compression.

# Visualize variance explained by principal components
f1 <- fviz_eig(r.pca, main = "Red Channel", barfill = "red", ncp = 5, addlabels = TRUE)
f2 <- fviz_eig(g.pca, main = "Green Channel", barfill = "green", ncp = 5, addlabels = TRUE)
f3 <- fviz_eig(b.pca, main = "Blue Channel", barfill = "blue", ncp = 5, addlabels = TRUE)

# Display all three plots
grid.arrange(f1, f2, f3, ncol = 3)

Key Findings:

  • The first principal component in each channel explains over 50% of the variance
  • The first 5 components cumulatively explain over 80% of the total variance
  • This demonstrates that image data is highly redundant and amenable to compression

6. Image Compression

6.1 Compression Methodology

Image compression via PCA involves:

  1. Decomposition: \(X = U \Sigma V^T\) (Singular Value Decomposition)
  2. Truncation: Keep only the first \(k\) principal components
  3. Reconstruction: \(\hat{X}_k = U_k \Sigma_k V_k^T\)

Where \(k\) is the number of components retained.

6.2 Compressed Images at Different Levels

# 45 Principal Components
compressed_img <- abind(r.pca$x[,1:45] %*% t(r.pca$rotation[,1:45]),
                        g.pca$x[,1:45] %*% t(g.pca$rotation[,1:45]),
                        b.pca$x[,1:45] %*% t(b.pca$rotation[,1:45]),
                        along = 3)
plot(image_read(compressed_img))
title("Compressed Image (45 Components)")

# 30 Principal Components
compressed_img <- abind(r.pca$x[,1:30] %*% t(r.pca$rotation[,1:30]),
                        g.pca$x[,1:30] %*% t(g.pca$rotation[,1:30]),
                        b.pca$x[,1:30] %*% t(b.pca$rotation[,1:30]),
                        along = 3)
plot(image_read(compressed_img))
title("Compressed Image (30 Components)")

# 20 Principal Components
compressed_img <- abind(r.pca$x[,1:20] %*% t(r.pca$rotation[,1:20]),
                        g.pca$x[,1:20] %*% t(g.pca$rotation[,1:20]),
                        b.pca$x[,1:20] %*% t(b.pca$rotation[,1:20]),
                        along = 3)
plot(image_read(compressed_img))
title("Compressed Image (20 Components)")

# 12 Principal Components
compressed_img <- abind(r.pca$x[,1:12] %*% t(r.pca$rotation[,1:12]),
                        g.pca$x[,1:12] %*% t(g.pca$rotation[,1:12]),
                        b.pca$x[,1:12] %*% t(b.pca$rotation[,1:12]),
                        along = 3)
plot(image_read(compressed_img))
title("Compressed Image (12 Components)")

# 6 Principal Components
compressed_img <- abind(r.pca$x[,1:6] %*% t(r.pca$rotation[,1:6]),
                        g.pca$x[,1:6] %*% t(g.pca$rotation[,1:6]),
                        b.pca$x[,1:6] %*% t(b.pca$rotation[,1:6]),
                        along = 3)
plot(image_read(compressed_img))
title("Compressed Image (6 Components)")

# 3 Principal Components
compressed_img <- abind(r.pca$x[,1:3] %*% t(r.pca$rotation[,1:3]),
                        g.pca$x[,1:3] %*% t(g.pca$rotation[,1:3]),
                        b.pca$x[,1:3] %*% t(b.pca$rotation[,1:3]),
                        along = 3)
plot(image_read(compressed_img))
title("Compressed Image (3 Components)")

7. Dominant Color Extraction

7.1 K-means Clustering

We extract the 8 most dominant colors from the original image using k-means clustering on the RGB color space.

# Visualize dominant color palette
par(mar = c(1,1,3,1))
barplot(
  rep(1, 8),
  col = palette8,
  border = NA,
  main = "Dominant Color Palette (K-means, k=8)"
)

Key findings from a palette: The extracted palette is dominated by warm, muted earth tones—soft peach, multiple shades of brown, beige, and deep near-black—suggesting a composition built around natural materials, subdued warmth, and strong tonal contrast. The limited brightness and absence of saturated hues point to a restrained, earthy visual character rather than a vivid or decorative color scheme.

8. Multidimensional Scaling (MDS) Analysis

Multidimensional Scaling (MDS) is a family of multivariate statistical techniques used to represent the structure of complex similarity or dissimilarity data as a spatial configuration of points in a lower-dimensional space. The core aim of MDS is to take information about the pairwise proximities (similarities or dissimilarities) among a set of items and construct a geometric representation in which the distances between points correspond as closely as possible to the observed proximities. In such a configuration, objects that are more similar (or less dissimilar) appear closer together, whereas objects that are less similar are placed further apart in the mapped space.

9. Distance Metrics Comparison

plot(d, asp = 1, main = "MDS: Comparison of Distance Metrics", 
     xlab = "Dimension 1", ylab = "Dimension 2", pch = 16, col = "gray80")

methods <- c("euclidean", "manhattan", "canberra", "maximum")
cols <- c("red", "blue", "green", "orange")

for (i in seq_along(methods)) {
  mds <- cmdscale(dist(d, method = methods[i]))
  points(mds, col = cols[i], pch = 16, cex = 0.7)
}

legend("bottomleft", legend = methods, col = cols, pch = 16, cex = 0.9)

Interpretation: The MDS analysis consistently produces a ring-shaped configuration across all four distance metrics, indicating that the image data are structured by continuous variation rather than discrete clusters. This ring geometry reflects the smooth, gradual transitions in color and luminance that characterize analysed image “Vorozhinnia”, where tones change progressively across the canvas instead of forming sharply separated regions. The persistence of the ring under different distance measures demonstrates that this structure is intrinsic to the image and not an artifact of a specific metric choice. In perceptual terms, most pixels are similar in overall intensity but differ in the direction of their color composition, leading to angular rather than radial separation in the MDS space. This suggests that relationships among colors in the painting are governed by relative balance and harmony rather than extreme contrasts. Overall, the MDS result confirms that the visual organization of the image is dominated by continuous tonal gradients, consistent with painterly techniques emphasizing cohesion and subtle modulation.

10. Quality Metrics

n_components <- c(45, 30, 20, 12, 6, 3)

stats <- data.frame(
  PC = n_components,
  FileSize = NA_real_,
  MSE = NA_real_
)

orig_img <- as.numeric(img_array) / 255

for (i in seq_along(n_components)) {

  k <- n_components[i]

  # Reconstruct image using PCA
  recon_img <- abind(
    r.pca$x[, 1:k] %*% t(r.pca$rotation[, 1:k]),
    g.pca$x[, 1:k] %*% t(g.pca$rotation[, 1:k]),
    b.pca$x[, 1:k] %*% t(b.pca$rotation[, 1:k]),
    along = 3
  )

  # Normalize reconstruction
  recon_img <- recon_img / max(recon_img)

  # Compute MSE directly (no JPEG read/write)
  stats$MSE[i] <- mse(
    as.vector(orig_img),
    as.vector(recon_img)
  )

  # OPTIONAL: estimate file size by actually saving once
  filename <- paste0("photo_", k, "_pc.jpg")
  writeJPEG(recon_img, filename, quality = 1)
  stats$FileSize[i] <- file.info(filename)$size
}

print(stats)
##   PC FileSize         MSE
## 1 45   517876 0.002696035
## 2 30   463048 0.003279155
## 3 20   415071 0.004213164
## 4 12   370709 0.005922245
## 5  6   332076 0.009964953
## 6  3   286438 0.012101386

The reconstruction quality metrics demonstrate a clear trade-off between compression and visual fidelity in the PCA-based image analysis of “Vorozhinnia”. As the number of retained principal components decreases from 45 to 3, file size is substantially reduced, while the mean squared error (MSE) increases, indicating growing deviation from the original image. Reconstructions using 45–30 components achieve very low MSE values and preserve the painting’s smooth color gradients and overall structure with minimal perceptual loss. At 20–12 components, most global features remain intact, but finer textures and subtle tonal variations begin to deteriorate. Strong compression at 6 and 3 components leads to noticeably higher MSE, reflecting the loss of local detail and the dominance of only the most salient color patterns. Overall, the results show that moderate dimensionality reduction provides an effective balance between compression efficiency and image quality for this painting.

11. Visualizing quality metrics

par(mar = c(5, 5, 4, 2) + 0.1)

plot(stats$PC, stats$MSE, type = "b", pch = 19, col = "darkred", lwd = 2,
     xlab = "Number of Principal Components",
     ylab = "Mean Squared Error",
     main = "Reconstruction Quality")
grid()
text(stats$PC, stats$MSE, labels = stats$PC, pos = 2, cex = 0.8)

The reconstruction quality plot shows that increasing the number of principal components substantially reduces the mean squared error, indicating that more visual information from the original painting is preserved as additional components are retained. Moreover, this analysis identifies the point at which further components yield diminishing improvements, demonstrating that the image can be efficiently compressed while maintaining high visual fidelity.

12. Color Transfer Transitioning

12.1 Methodology of Computational Color Transfer Between Images

Color transfer uses the LAB color space (Lightness, A, B) to separate luminance from chrominance. The process:

  1. Extracting dominant colors from source image (k-means)
  2. Converting images to LAB color space
  3. Transfering color statistics (mean and standard deviation) from source to target
  4. Preserving original luminance to maintain detail
  5. Optional saturation enhancement for vibrant results

12.2 Loading New Images for Pixel-Level Color Transfer

imgB <- image_read("Olexandr_murashko_Divchyna_v_chervonim_kapeliusi.jpg.webp")
plot(imgB)
title(main = "New Color Image")

### 12.2.2 Python-Based Color-Transfer

library(reticulate)

img_source <- normalizePath("800px-Mykola_Pymonenko-Vorozhinnia.jpg.webp")
img_target <- normalizePath("Olexandr_murashko_Divchyna_v_chervonim_kapeliusi.jpg.webp")

out_palette <- "palette_kmeans.png"
out_result <- "girl_warm_colors.png"
out_enhanced <- "girl_warm_colors_enhanced.png"
out_comparison <- "comparison.png"

py_run_string(paste0(
"
import cv2
import numpy as np
from sklearn.cluster import KMeans
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

img_source = cv2.imread(r'", img_source, "')
img_target = cv2.imread(r'", img_target, "')

if img_source is None or img_target is None:
    raise RuntimeError('Could not load images')

img_source_rgb = cv2.cvtColor(img_source, cv2.COLOR_BGR2RGB)
img_target_rgb = cv2.cvtColor(img_target, cv2.COLOR_BGR2RGB)

h, w, c = img_source_rgb.shape
pixels_source = img_source_rgb.reshape(-1, 3)

n_colors = 8
kmeans = KMeans(n_clusters=n_colors, random_state=123, n_init=10)
kmeans.fit(pixels_source)
palette = kmeans.cluster_centers_.astype(np.uint8)

fig, ax = plt.subplots(figsize=(8,2))
ax.imshow(palette.reshape(1, n_colors, 3))
ax.axis('off')
plt.savefig(r'", out_palette, "', dpi=150, bbox_inches='tight')
plt.close()

lab_source = cv2.cvtColor(img_source_rgb, cv2.COLOR_RGB2LAB).astype(np.float32)
lab_target = cv2.cvtColor(img_target_rgb, cv2.COLOR_RGB2LAB).astype(np.float32)

L_s, a_s, b_s = cv2.split(lab_source)
L_t, a_t, b_t = cv2.split(lab_target)

def transfer_channel(target, source):
    target_std = target.std()
    if target_std < 1e-6:
        target_std = 1e-6
    return (target - target.mean()) * (source.std()/target_std) + source.mean()

a_tr = transfer_channel(a_t, a_s)
b_tr = transfer_channel(b_t, b_s)

lab_tr = cv2.merge([L_t, a_tr, b_tr])
lab_tr[:,:,0] = np.clip(lab_tr[:,:,0], 0, 100)
lab_tr[:,:,1] = np.clip(lab_tr[:,:,1], -128, 127)
lab_tr[:,:,2] = np.clip(lab_tr[:,:,2], -128, 127)

result = cv2.cvtColor(lab_tr.astype(np.uint8), cv2.COLOR_LAB2RGB)

hsv = cv2.cvtColor(result, cv2.COLOR_RGB2HSV).astype(np.float32)
hsv[:,:,1] = np.clip(hsv[:,:,1] * 1.3, 0, 255)
result_enh = cv2.cvtColor(hsv.astype(np.uint8), cv2.COLOR_HSV2RGB)

cv2.imwrite(r'", out_result, "', cv2.cvtColor(result, cv2.COLOR_RGB2BGR))
cv2.imwrite(r'", out_enhanced, "', cv2.cvtColor(result_enh, cv2.COLOR_RGB2BGR))

fig, axes = plt.subplots(1,3, figsize=(15,5))
axes[0].imshow(img_target_rgb); axes[0].axis('off'); axes[0].set_title('Original')
axes[1].imshow(result); axes[1].axis('off'); axes[1].set_title('Warm Palette')
axes[2].imshow(result_enh); axes[2].axis('off'); axes[2].set_title('Enhanced')

plt.savefig(r'", out_comparison, "', dpi=150, bbox_inches='tight')
plt.close()

print('Color transfer complete.')
"
))
## Color transfer complete.
# View results in R
library(magick)

# Display palette
palette_img <- image_read(out_palette)
plot(palette_img)

I extracted the dominant colors palette from the original image, “Vorozhinnia” and will now proceed to apply them to the image pixels of new loaded color image.

# Display comparison
comparison_img <- image_read(out_comparison)
plot(comparison_img)

In this experiment, the dominant chromatic palette was first extracted from “Vorozhinnia” using k-means clustering applied to pixel data in the RGB space. These representative colors were then statistically transferred onto the target image, Girl in a Red Hat, through distribution matching in the LAB color space. Importantly, while chromatic channels (a and b) were modified to reflect the source palette, the luminance (L) channel of the target painting was intentionally preserved in order to retain structural shading and tonal depth.

The intermediate result, labeled “Warm Palette,” illustrates the direct outcome of this chromatic transfer. Contrary to expectations, the image does not exhibit a pronounced infusion of warm tones; instead, it appears noticeably muted, with desaturated greys and subdued blues dominating the composition. This indicates that the darker tonal statistics of the source painting constrained the transferred chromatic range.

To counteract this attenuation, an additional enhancement stage was implemented in the HSV color space, wherein the saturation channel was amplified. The “Enhanced” version therefore presents subtle increases in chromatic intensity — most visible in mid-tone fabrics and facial contours — yet these adjustments remain modest rather than transformative. The overall luminance framework of the target image continues to suppress vivid palette manifestation.

# Display final enhanced result
result_img <- image_read(out_enhanced)
plot(result_img)
title("Girl with Warm Color Palette - Enhanced")

The limited perceptual divergence between the ‘Warm Palette’ and ‘Enhanced’ outputs underscores a key methodological constraint: color transfer that excludes luminance adaptation cannot fully reproduce the visual vitality of the source palette. Illumination structure, contrast distribution, and painterly shading exert a dominant influence over final appearance.

library(reticulate)

# Define file paths
img_source <- "800px-Mykola_Pymonenko-Vorozhinnia.jpg.webp"
img_target <- "VG485-1000x1000.webp"
out_palette <- "palette_kmeans.png"
out_result <- "vang_gogh_warm_colors.png"
out_enhanced <- "vang_gogh_warm_colors_enhanced.png"
out_comparison <- "comparison.png"

# Run Python code
py_run_string(
  paste0(
    "import cv2\n",
    "import numpy as np\n",
    "from sklearn.cluster import KMeans\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# Load images\n",
    "img_source = cv2.imread('", img_source, "')\n",
    "img_target = cv2.imread('", img_target, "')\n",
    "\n",
    "if img_source is None or img_target is None:\n",
    "    raise RuntimeError('Could not load images')\n",
    "\n",
    "# Convert to RGB\n",
    "img_source_rgb = cv2.cvtColor(img_source, cv2.COLOR_BGR2RGB)\n",
    "img_target_rgb = cv2.cvtColor(img_target, cv2.COLOR_BGR2RGB)\n",
    "\n",
    "print('Images loaded successfully')\n",
    "print('Source shape:', img_source_rgb.shape)\n",
    "print('Target shape:', img_target_rgb.shape)\n",
    "\n",
    "# Extract dominant colors from source using k-means\n",
    "h, w, c = img_source_rgb.shape\n",
    "pixels_source = img_source_rgb.reshape(-1, 3)\n",
    "\n",
    "# Apply k-means to extract 8 dominant colors\n",
    "n_colors = 8\n",
    "kmeans = KMeans(n_clusters=n_colors, random_state=123, n_init=10)\n",
    "kmeans.fit(pixels_source)\n",
    "palette = kmeans.cluster_centers_.astype(np.uint8)\n",
    "\n",
    "print('\\nExtracted', n_colors, 'dominant colors')\n",
    "print('Palette RGB values:')\n",
    "print(palette)\n",
    "\n",
    "# Visualize the extracted palette\n",
    "fig, ax = plt.subplots(1, 1, figsize=(8, 2))\n",
    "palette_display = palette.reshape(1, n_colors, 3)\n",
    "ax.imshow(palette_display)\n",
    "ax.axis('off')\n",
    "ax.set_title('Dominant Color Palette (k-means)', fontsize=14)\n",
    "plt.tight_layout()\n",
    "plt.savefig('", out_palette, "', dpi=150, bbox_inches='tight')\n",
    "plt.close()\n",
    "\n",
    "# Convert both images to LAB color space for better color transfer\n",
    "lab_source = cv2.cvtColor(img_source_rgb, cv2.COLOR_RGB2LAB).astype(np.float32)\n",
    "lab_target = cv2.cvtColor(img_target_rgb, cv2.COLOR_RGB2LAB).astype(np.float32)\n",
    "\n",
    "# Split LAB channels\n",
    "L_source, a_source, b_source = cv2.split(lab_source)\n",
    "L_target, a_target, b_target = cv2.split(lab_target)\n",
    "\n",
    "# Transfer color statistics from source to target\n",
    "def transfer_channel(target, source):\n",
    "    target_std = target.std()\n",
    "    if target_std < 1e-6:\n",
    "        target_std = 1e-6\n",
    "    return (target - target.mean()) * (source.std() / target_std) + source.mean()\n",
    "\n",
    "# Transfer only color channels (a and b), keep original luminance\n",
    "a_transferred = transfer_channel(a_target, a_source)\n",
    "b_transferred = transfer_channel(b_target, b_source)\n",
    "\n",
    "# Merge with original luminance to preserve details\n",
    "lab_transferred = cv2.merge([L_target, a_transferred, b_transferred])\n",
    "\n",
    "# Clip to valid LAB ranges\n",
    "lab_transferred[:,:,0] = np.clip(lab_transferred[:,:,0], 0, 100)\n",
    "lab_transferred[:,:,1] = np.clip(lab_transferred[:,:,1], -128, 127)\n",
    "lab_transferred[:,:,2] = np.clip(lab_transferred[:,:,2], -128, 127)\n",
    "\n",
    "# Convert back to RGB\n",
    "lab_transferred = lab_transferred.astype(np.uint8)\n",
    "result = cv2.cvtColor(lab_transferred, cv2.COLOR_LAB2RGB)\n",
    "\n",
    "# Optional: enhance saturation for warmer colors\n",
    "hsv_result = cv2.cvtColor(result, cv2.COLOR_RGB2HSV).astype(np.float32)\n",
    "hsv_result[:,:,1] = np.clip(hsv_result[:,:,1] * 1.3, 0, 255)\n",
    "result_enhanced = cv2.cvtColor(hsv_result.astype(np.uint8), cv2.COLOR_HSV2RGB)\n",
    "\n",
    "# Save results\n",
    "cv2.imwrite('", out_result, "', cv2.cvtColor(result, cv2.COLOR_RGB2BGR))\n",
    "cv2.imwrite('", out_enhanced, "', cv2.cvtColor(result_enhanced, cv2.COLOR_RGB2BGR))\n",
    "\n",
    "# Display comparison\n",
    "fig, axes = plt.subplots(1, 3, figsize=(15, 5))\n",
    "\n",
    "axes[0].imshow(img_target_rgb)\n",
    "axes[0].set_title('Original Starry Night', fontsize=12)\n",
    "axes[0].axis('off')\n",
    "\n",
    "axes[1].imshow(result)\n",
    "axes[1].set_title('With Warm Palette', fontsize=12)\n",
    "axes[1].axis('off')\n",
    "\n",
    "axes[2].imshow(result_enhanced)\n",
    "axes[2].set_title('Enhanced Saturation', fontsize=12)\n",
    "axes[2].axis('off')\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.savefig('", out_comparison, "', dpi=150, bbox_inches='tight')\n",
    "plt.close()\n",
    "\n",
    "print('\\nColor transfer complete!')\n",
    "print('Files saved:')\n",
    "print('- palette_kmeans.png')\n",
    "print('- vang_gogh_warm_colors.png')\n",
    "print('- vang_gogh_warm_colors_enhanced.png')\n",
    "print('- comparison.png')\n"
  )
)
## Images loaded successfully
## Source shape: (1150, 800, 3)
## Target shape: (833, 1000, 3)
## 
## Extracted 8 dominant colors
## Palette RGB values:
## [[ 86  53  38]
##  [218 164 133]
##  [ 12   9   9]
##  [117  83  59]
##  [154 128 102]
##  [244 220 186]
##  [194 104  78]
##  [ 50  31  25]]
## 
## Color transfer complete!
## Files saved:
## - palette_kmeans.png
## - vang_gogh_warm_colors.png
## - vang_gogh_warm_colors_enhanced.png
## - comparison.png

12.3 Mapping the Source Palette Onto a New Painting

# View results in R
library(magick)

# Display palette
palette_img <- image_read(out_palette)
plot(palette_img)

I started experimenting and extracted the same palette to another image to check whether the dominance of grey/dark colors will be reduced and color-to-color transition will appear.

# Display comparison
comparison_img <- image_read(out_comparison)
plot(comparison_img)

The obtained results differ from the initially expected outcome. The primary objective was to achieve a direct and precise transfer of dominant colors from the source palette to the target image, assuming a near one-to-one correspondence between extracted palette colors and pixel values. However, the resulting images reveal that such exact color replication was not fully accomplished.

This discrepancy arises mainly from the fact that the transfer process did not explicitly account for luminance (light intensity) variations present in the original artwork. Color perception in paintings is strongly influenced not only by chromatic components but also by illumination, shading, and local contrast. As a result, even when chromatic information is transferred, differences in light distribution can significantly alter the visual outcome.

In the produced variants, the palette was successfully imposed onto the target image; however, the overall scene appears darker and less vibrant than anticipated. This indicates that while chromatic channels were modified, the luminance structure of the target image constrained the final appearance. Consequently, the transferred colors did not manifest with the same visual intensity as in the source painting.

# Display final enhanced result
result_img <- image_read(out_enhanced)
plot(result_img)

To mitigate this limitation, an additional enhancement step was introduced by increasing saturation levels, particularly in brighter regions. This adjustment aimed to compensate for the unaccounted lighting differences and to better approximate the warmth and vibrancy of the source palette. Although this approach improved perceptual richness, it remains an approximation rather than an exact chromatic reconstruction.

library(reticulate)

# Define file paths - REVERSED
img_source <- "Olexandr_murashko_Divchyna_v_chervonim_kapeliusi.jpg.webp"  # Extract colors from girl
img_target <- "800px-Mykola_Pymonenko-Vorozhinnia.jpg.webp"  # Apply to Pymonenko
out_palette <- "palette_kmeans_reverse.png"
out_result <- "pymonenko_with_girl_colors.png"
out_enhanced <- "pymonenko_with_girl_colors_enhanced.png"
out_comparison <- "comparison_reverse.png"

# Run Python code
py_run_string(
  paste0(
    "import cv2\n",
    "import numpy as np\n",
    "from sklearn.cluster import KMeans\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# Load images\n",
    "img_source = cv2.imread('", img_source, "')\n",
    "img_target = cv2.imread('", img_target, "')\n",
    "\n",
    "if img_source is None or img_target is None:\n",
    "    raise RuntimeError('Could not load images')\n",
    "\n",
    "# Convert to RGB\n",
    "img_source_rgb = cv2.cvtColor(img_source, cv2.COLOR_BGR2RGB)\n",
    "img_target_rgb = cv2.cvtColor(img_target, cv2.COLOR_BGR2RGB)\n",
    "\n",
    "print('Images loaded successfully')\n",
    "print('Source shape (Girl):', img_source_rgb.shape)\n",
    "print('Target shape (Pymonenko):', img_target_rgb.shape)\n",
    "\n",
    "# Extract dominant colors from GIRL using k-means\n",
    "h, w, c = img_source_rgb.shape\n",
    "pixels_source = img_source_rgb.reshape(-1, 3)\n",
    "\n",
    "# Apply k-means to extract 8 dominant colors\n",
    "n_colors = 8\n",
    "kmeans = KMeans(n_clusters=n_colors, random_state=123, n_init=10)\n",
    "kmeans.fit(pixels_source)\n",
    "palette = kmeans.cluster_centers_.astype(np.uint8)\n",
    "\n",
    "print('\\nExtracted', n_colors, 'dominant colors from GIRL image')\n",
    "print('Palette RGB values:')\n",
    "print(palette)\n",
    "\n",
    "# Visualize the extracted palette\n",
    "fig, ax = plt.subplots(1, 1, figsize=(8, 2))\n",
    "palette_display = palette.reshape(1, n_colors, 3)\n",
    "ax.imshow(palette_display)\n",
    "ax.axis('off')\n",
    "ax.set_title('Dominant Colors from Girl in Red Hat', fontsize=14)\n",
    "plt.tight_layout()\n",
    "plt.savefig('", out_palette, "', dpi=150, bbox_inches='tight')\n",
    "plt.close()\n",
    "\n",
    "# Convert both images to LAB color space for better color transfer\n",
    "lab_source = cv2.cvtColor(img_source_rgb, cv2.COLOR_RGB2LAB).astype(np.float32)\n",
    "lab_target = cv2.cvtColor(img_target_rgb, cv2.COLOR_RGB2LAB).astype(np.float32)\n",
    "\n",
    "# Split LAB channels\n",
    "L_source, a_source, b_source = cv2.split(lab_source)\n",
    "L_target, a_target, b_target = cv2.split(lab_target)\n",
    "\n",
    "# Transfer color statistics from GIRL to PYMONENKO\n",
    "def transfer_channel(target, source):\n",
    "    target_std = target.std()\n",
    "    if target_std < 1e-6:\n",
    "        target_std = 1e-6\n",
    "    return (target - target.mean()) * (source.std() / target_std) + source.mean()\n",
    "\n",
    "# Transfer only color channels (a and b), keep original luminance\n",
    "a_transferred = transfer_channel(a_target, a_source)\n",
    "b_transferred = transfer_channel(b_target, b_source)\n",
    "\n",
    "# Merge with original luminance to preserve details\n",
    "lab_transferred = cv2.merge([L_target, a_transferred, b_transferred])\n",
    "\n",
    "# Clip to valid LAB ranges\n",
    "lab_transferred[:,:,0] = np.clip(lab_transferred[:,:,0], 0, 100)\n",
    "lab_transferred[:,:,1] = np.clip(lab_transferred[:,:,1], -128, 127)\n",
    "lab_transferred[:,:,2] = np.clip(lab_transferred[:,:,2], -128, 127)\n",
    "\n",
    "# Convert back to RGB\n",
    "lab_transferred = lab_transferred.astype(np.uint8)\n",
    "result = cv2.cvtColor(lab_transferred, cv2.COLOR_LAB2RGB)\n",
    "\n",
    "# Optional: enhance saturation for more vibrant colors\n",
    "hsv_result = cv2.cvtColor(result, cv2.COLOR_RGB2HSV).astype(np.float32)\n",
    "hsv_result[:,:,1] = np.clip(hsv_result[:,:,1] * 1.3, 0, 255)\n",
    "result_enhanced = cv2.cvtColor(hsv_result.astype(np.uint8), cv2.COLOR_HSV2RGB)\n",
    "\n",
    "# Save results\n",
    "cv2.imwrite('", out_result, "', cv2.cvtColor(result, cv2.COLOR_RGB2BGR))\n",
    "cv2.imwrite('", out_enhanced, "', cv2.cvtColor(result_enhanced, cv2.COLOR_RGB2BGR))\n",
    "\n",
    "# Display comparison\n",
    "fig, axes = plt.subplots(1, 3, figsize=(15, 5))\n",
    "\n",
    "axes[0].imshow(img_target_rgb)\n",
    "axes[0].set_title('Original Pymonenko', fontsize=12)\n",
    "axes[0].axis('off')\n",
    "\n",
    "axes[1].imshow(result)\n",
    "axes[1].set_title('With Girl Colors', fontsize=12)\n",
    "axes[1].axis('off')\n",
    "\n",
    "axes[2].imshow(result_enhanced)\n",
    "axes[2].set_title('Enhanced Saturation', fontsize=12)\n",
    "axes[2].axis('off')\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.savefig('", out_comparison, "', dpi=150, bbox_inches='tight')\n",
    "plt.close()\n",
    "\n",
    "print('\\nREVERSE Color transfer complete!')\n",
    "print('Colors extracted from: Girl in Red Hat')\n",
    "print('Colors applied to: Pymonenko painting')\n",
    "print('\\nFiles saved:')\n",
    "print('- palette_kmeans_reverse.png')\n",
    "print('- pymonenko_with_girl_colors.png')\n",
    "print('- pymonenko_with_girl_colors_enhanced.png')\n",
    "print('- comparison_reverse.png')\n"
  )
)
## Images loaded successfully
## Source shape (Girl): (700, 521, 3)
## Target shape (Pymonenko): (1150, 800, 3)
## 
## Extracted 8 dominant colors from GIRL image
## Palette RGB values:
## [[231 229 194]
##  [ 66  48  40]
##  [115  76  55]
##  [208 202 171]
##  [ 29  24  21]
##  [190 172 146]
##  [222  66  17]
##  [161 120  93]]
## 
## REVERSE Color transfer complete!
## Colors extracted from: Girl in Red Hat
## Colors applied to: Pymonenko painting
## 
## Files saved:
## - palette_kmeans_reverse.png
## - pymonenko_with_girl_colors.png
## - pymonenko_with_girl_colors_enhanced.png
## - comparison_reverse.png
# View results in R
library(magick)

# Display palette extracted from girl
palette_img <- image_read(out_palette)
plot(palette_img)
title("Palette Extracted from Girl in Red Hat")

I hypothesized that the dominant colors extracted from the original image ‘Vorozhinnia’ were excessively dark, as evidenced by their visual appearance. Consequently, I proceeded to conduct further experimentation by extracting the color palette from an alternative, perceptually brighter painting, ‘Divchyna v chervonim kapelusi’, in order to assess the impact of luminance-rich chromatic sources on the transfer results.

# Display comparison
comparison_img <- image_read(out_comparison)
plot(comparison_img)

But, to my surprize the transition of new palette didn’t change anything. So, we may conclude that it’s not the case of a palette itself. The objective of this experiment was to extract the dominant chromatic characteristics from the painting Girl in a Red Hat and subsequently transfer this palette onto the pixels of “Vorozhinnia” by Mykola Pymonenko. To achieve this, the source image was first processed using k-means clustering in the RGB color space in order to identify a limited set of representative colors. These dominant colors were then statistically transferred to the target image by matching chromatic distributions in the LAB color space, while preserving the original luminance structure of the target painting.

Despite the use of a perceptually brighter source image, the resulting color-transferred versions of “Vorozhinnia” exhibit a predominantly dark grey appearance with subtle bluish tones. This outcome indicates that the intended chromatic influence of the extracted palette was not fully realized. The failure to achieve vivid color correspondence suggests that the palette itself was not the primary limiting factor in the transformation process.

From a methodological perspective, the code transfers only the chromatic components (a and b channels) of the LAB color space, while explicitly retaining the original luminance (L channel) of the target image. Consequently, the tonal structure, shadows, and illumination patterns of Vorozhinnia dominate the final appearance. Since this painting contains extensive low-luminance regions and strong chiaroscuro, the imposed colors are perceptually suppressed, resulting in muted and desaturated visual output.

# Display final enhanced result
result_img <- image_read(out_enhanced)
plot(result_img)
title("Pymonenko Painting with Girl's Color Palette")

In this stage of the analysis, the dominant color palette extracted from “Girl in a Red Hat” was applied to the target Pymonenko’s painting “Vorozhinnia” in order to evaluate whether a brighter chromatic source would yield a more luminous transfer outcome. The procedure followed just the same computational pipeline as in previous experiments.

The resulting image demonstrates a modest increase in perceived brightness relative to earlier transfer attempts. Although the overall tonal composition of Vorozhinnia remains largely intact, subtle shifts toward lighter and warmer hues are observable, particularly in mid-tone regions and illuminated surfaces. This indicates that the brighter source palette exerted a measurable, albeit constrained, chromatic influence.

To further enhance visual intensity, an additional post-processing step was implemented in the HSV color space. Specifically, the saturation channel was amplified, increasing the vividness of transferred colors without directly altering structural luminance. This operation contributed to the slight brightening effect by making existing colors appear more pronounced and visually radiant, even though the underlying light distribution of the painting was preserved.

13. Obtained Results

13.1 Results of PCA

Principal Component Analysis (PCA) was employed in this project as an unsupervised statistical learning technique to decompose and compress the highlight-rich color image of Vorozhinnia into its fundamental variance structures. By extracting the RGB channels and applying PCA independently to each spectral matrix, the analysis identified orthogonal principal components representing dominant spatial–chromatic patterns within the artwork. Statistically, six separate reconstructions were performed, retaining 45, 30, 20, 12, 6, and 3 principal components, respectively, from each RGB channel. As the number of retained components decreased, the dimensionality of the data representation was progressively reduced, meaning that a smaller proportion of the original variance — and thus visual information — was preserved.

From an applied perspective, this unsupervised framework achieved two complementary statistical outcomes. First, it produced low-rank approximations of the image matrices, enabling efficient storage, transmission, and denoising while preserving perceptually salient features — an outcome later quantified through file size comparisons and Mean Squared Error (MSE) metrics. Second, by integrating k-means clustering on the same pixel distributions, the study extended PCA’s variance modeling into categorical palette extraction, identifying dominant chromatic regimes without prior labeling. Together, these methods illustrate how unsupervised econometric logic — variance maximization, distributional clustering, and distance-based structure mapping — can be leveraged to model visual information as high-dimensional data.

13.2 Results of Unsupervised Palette Transfer

My analysis of computational color to pixels transfer were based on the k-means palette extraction that treats all pixels uniformly, without accounting for spatial context, semantic regions, or light distribution. As a result, dominant colors extracted from the source image do not necessarily correspond to perceptually influential regions when transferred to a structurally different painting. The subsequent saturation enhancement partially amplifies color intensity, yet it cannot compensate for the fundamental mismatch in luminance dynamics between the artworks.

The present findings confirm that palette extraction and transfer pipelines are statistically functional yet perceptually incomplete. Chromatic reassignment, when executed without luminance harmonization, remains subordinate to the tonal architecture of the target artwork. Consequently, future research must adopt integrative econometric frameworks that treat color, light, and spatial structure as interdependent variables rather than separable components.

14. Conclusions

14.1 Key Findings

  1. PCA achieves efficient compression: Using only 20-30 principal components (out of 800) retains over 90% of visual information
  2. Diminishing returns: Beyond 30 components, improvements in image quality become imperceptible to the human eye
  3. Color clustering reveals artistic palette: K-means successfully identifies the dominant warm color scheme characteristic of Ukrainian folk art
  4. LAB color space enables effective transfer: Separating luminance from chrominance preserves image structure during color manipulation

14.2 Statistical Significance

  • The compression ratio at 30 components is approximately 26.7:1
  • Storage savings: Approximately 96.2% reduction in dimensionality
  • Information retention: >90% of variance explained with 30 components

14.3 Applications

  • Image compression for efficient storage and transmission
  • Feature extraction for computer vision and pattern recognition
  • Color palette extraction for design and artistic analysis
  • Style transfer in digital art and photo editing

15. Recommendations for Future Research

1. Joint Modeling of Chrominance and Luminance

Future analyses should abandon the assumption that chromatic transfer can be meaningfully isolated from luminance structure. Instead, multivariate statistical frameworks should be employed to jointly model L, a, and b channel distributions. Techniques such as:

  • Multivariate Gaussian distribution matching

  • Copula-based dependency modeling

  • Canonical correlation analysis

could better preserve inter-channel relationships governing perceptual brightness and color intensity.

2. Spatially Weighted Clustering

Standard k-means clustering treats each pixel as an independent observation, ignoring spatial autocorrelation. However, paintings exhibit strong regional coherence driven by brushwork, illumination zones, and compositional hierarchy.

Improved approaches may include:

  • Spatial k-means (distance + pixel location)

  • Gaussian Mixture Models with spatial priors

  • Markov Random Fields for region-aware palette extraction

Such models would ensure that dominant colors correspond not merely to frequency but to perceptual salience.

3. Econometric Treatment of Pixel Distribution

From an econometric perspective, pixel intensities can be conceptualized as high-dimensional stochastic processes. This opens the possibility of applying:

  • Panel data models (pixels across color channels)

  • Heteroskedasticity diagnostics in luminance variance

  • Quantile regression for extreme light regions

  • Distributional shift analysis (Wasserstein distance, KL divergence)

These tools would allow researchers to quantify not only mean color transfer but also variance, skewness, and tail behavior in tonal distributions.

4. Illumination Normalization and Light Transport Modeling

Given that luminance mismatch was the dominant failure mechanism, future work should incorporate explicit light correction prior to palette transfer. Potential solutions include:

  • Histogram matching in the luminance channel

  • Retinex-based illumination normalization

  • Intrinsic image decomposition (reflectance vs shading)

  • Physically based light transport estimation

Such preprocessing would decouple pigment color from illumination, enabling more faithful palette mapping.

5. Semantic and Object-Level Segmentation

Uniform pixel treatment ignores semantic structure — faces, garments, skies, and interiors carry different perceptual weights. Integrating computer vision segmentation could enable:

  • Region-specific palette transfer

  • Skin-tone preservation constraints

  • Background vs foreground color econometrics

This would prevent perceptually implausible recolorization.

6. Perceptual Evaluation Metrics

Future statistical validation should move beyond visual inspection and incorporate perceptual metrics such as:

  • SSIM (Structural Similarity Index)

  • LPIPS (Learned Perceptual Image Patch Similarity)

  • ΔE color difference in CIEDE2000

These measures would quantify whether transferred palettes achieve perceptual — not merely numerical — fidelity.

References

  • Jolliffe, I. T. (2002). Principal Component Analysis. Springer Series in Statistics.
  • Reinhard, E., et al. (2001). “Color Transfer between Images.” IEEE Computer Graphics and Applications.
  • MacQueen, J. (1967). “Some methods for classification and analysis of multivariate observations.” Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability.