This paper demonstrates a practical application of K-Means clustering to image inpainting—the task of repairing corrupted or damaged images. We implement a patch-based repair pipeline that learns a texture dictionary from intact image regions using K-Means clustering, then uses this dictionary to predict missing pixel values. For simplicity and computational efficiency, we apply the method to grayscale images; extension to color (RGB) images is theoretically straightforward but requires more complex data structures and computational resources. The study validates the method’s effectiveness on sparse, random damage and quantifies performance degradation as corruption increases.
Image inpainting is the process of reconstructing missing or damaged parts of an image. In real-world applications, images are often corrupted by:
The goal is to predict plausible pixel values that blend seamlessly with the surrounding intact areas.
Imagine you have a pile of photographs of trees and flowers mixed together, but you cannot see the labels. A simple approach to organize them is to group similar-looking photos together: - Photos of similar trees cluster together - Photos of similar flowers cluster together
This process—grouping similar items without labels—is called clustering. It’s an unsupervised learning technique: no one tells you the correct groups beforehand; the algorithm discovers groups by itself based on similarity.
K-Means is a popular clustering algorithm that works as follows:
The result is K representative items (cluster centers) that capture the essential patterns or modes in your data.
In image inpainting, we use K-Means to discover the fundamental texture patterns that appear in the intact parts of the image: - Smooth regions (low-texture areas) - Edges (boundaries between objects) - Detailed textures (fine patterns)
By clustering small patches (e.g., 5×5 pixel neighborhoods) extracted from intact regions, K-Means creates a texture dictionary—a compact set of representative patches. When we encounter a missing pixel, we:
This approach is powerful because: - It learns from the image itself: No manual rules or training data needed. - It captures local context: Patches preserve spatial relationships and texture patterns. - It’s computationally feasible: K-Means is fast and scalable. - It’s interpretable: The texture dictionary visually shows what patterns the algorithm learned.
For this study, we convert the input image to grayscale (single intensity channel) before processing. This simplification:
In principle, color (RGB) image processing is entirely feasible: one would extract and cluster patches with 3×P×P dimensions instead of P×P. However, this increases memory usage and computation time proportionally.
We start with a lossless image format (PNG or BMP) to preserve exact pixel values. Lossy formats (JPEG) introduce compression artifacts that corrupt the texture patterns needed for clustering.
library(magick)
# https://drive.google.com/file/d/1EZCwt6O_KdksHAR8wQWlIJSNct-f_Fza/view?usp=sharing
img_path <- "~/UL/Wisła w Warszawa.png"
original_img_magick <- image_read(img_path)
# Convert to grayscale
grayscale_img_magick <- original_img_magick %>%
image_quantize(colorspace = "gray")
# Extract pixel data as numeric matrix in [0,1]
img_array_backup <- grayscale_img_magick %>%
image_data() %>%
as.numeric()
img_matrix_backup <- matrix(img_array_backup,
nrow = image_info(grayscale_img_magick)$height,
ncol = image_info(grayscale_img_magick)$width)
cat(sprintf("Image loaded: %d × %d pixels\n",
nrow(img_matrix_backup), ncol(img_matrix_backup)))
To evaluate repair quality, we simulate random pixel damage by setting a percentage of pixels to NA (missing):
MISSING_PERCENTAGE <- 0.20 # 20% damage
corrupted_img_matrix <- img_matrix_backup
total_pixels <- prod(dim(corrupted_img_matrix))
num_missing <- round(total_pixels * MISSING_PERCENTAGE)
set.seed(42)
missing_indices <- sample(total_pixels, num_missing)
corrupted_img_matrix[missing_indices] <- NA
cat(sprintf("Damage applied: %.0f%% pixels set to NA\n",
MISSING_PERCENTAGE * 100))
A patch is a small square neighborhood (e.g., 5×5 pixels) centered at each pixel. We extract all patches from intact regions only (those containing no NA values):
P_SIZE <- 5 # Patch size
extract_patches <- function(mat, P = P_SIZE) {
rows <- nrow(mat)
cols <- ncol(mat)
half_p <- floor(P / 2)
patches_list <- list()
for (i in (half_p + 1):(rows - half_p)) {
for (j in (half_p + 1):(cols - half_p)) {
patch <- mat[(i - half_p):(i + half_p), (j - half_p):(j + half_p)]
# Only include complete patches (no NA values)
if (!any(is.na(patch))) {
patches_list[[length(patches_list) + 1]] <- as.vector(patch)
}
}
}
patches_df <- do.call(rbind, patches_list)
return(patches_df)
}
train_patches <- extract_patches(corrupted_img_matrix, P = P_SIZE)
cat(sprintf("Training patches extracted: %d\n", nrow(train_patches)))
Why only intact patches? If we included patches with missing values, the cluster centers would be distorted averages. By training on pure, complete patches, we learn the true underlying textures of the image.
We apply K-Means clustering to the extracted patches to discover K representative texture patterns:
K_CLUSTERS <- 50 # Number of cluster centers (texture patterns)
# Run K-Means
kmeans_model <- kmeans(train_patches,
centers = K_CLUSTERS,
nstart = 50,
iter.max = 300)
# The cluster centers form our texture dictionary
texture_dictionary <- kmeans_model$centers
cat(sprintf("K-Means converged in %d iterations\n", kmeans_model$iter))
cat(sprintf("Dictionary created: %d texture patterns\n", nrow(texture_dictionary)))
Each row of texture_dictionary is a 5×5 patch that
represents a typical texture found in the image (e.g., smooth area,
edge, fine detail). These are the templates we will use
to repair missing pixels.
For each missing pixel, we:
repair_image_kmeans <- function(corrupted_mat, dictionary, P = P_SIZE, num_cores = 11) {
repaired_mat <- corrupted_mat
rows <- nrow(corrupted_mat)
cols <- ncol(corrupted_mat)
half_p <- floor(P / 2)
# Find all missing pixels in the interior
row_idx <- (half_p + 1):(rows - half_p)
col_idx <- (half_p + 1):(cols - half_p)
na_rel <- which(is.na(corrupted_mat[row_idx, col_idx]), arr.ind = TRUE)
total_na_pixels <- nrow(na_rel)
coords <- cbind(na_rel[,1] + half_p, na_rel[,2] + half_p)
# For each missing pixel, find the best-matching dictionary entry
# and use its center value as the prediction
for (k in seq_len(nrow(coords))) {
i <- coords[k, 1]
j <- coords[k, 2]
local_patch_vec <- as.vector(
corrupted_mat[(i - half_p):(i + half_p), (j - half_p):(j + half_p)]
)
# Find the nearest dictionary entry
min_dist <- Inf
best_idx <- 1
valid_mask <- !is.na(local_patch_vec)
for (m in 1:nrow(dictionary)) {
if (sum(valid_mask) > 0) {
dist_sq <- sum((local_patch_vec[valid_mask] -
dictionary[m, valid_mask])^2)
} else {
dist_sq <- Inf
}
if (dist_sq < min_dist) {
min_dist <- dist_sq
best_idx <- m
}
}
# Predict the missing pixel using the center of the best match
center_patch <- matrix(dictionary[best_idx, ], nrow = P, ncol = P)
repaired_mat[i, j] <- center_patch[half_p + 1, half_p + 1]
}
return(repaired_mat)
}
repaired_img_matrix <- repair_image_kmeans(corrupted_img_matrix, texture_dictionary)
We compare the repaired image to the original using two standard image quality metrics:
\[\text{MSE} = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2\] where \(y_i\) is the original pixel and \(\hat{y}_i\) is the repaired pixel. Lower MSE is better.
\[\text{PSNR} = 10 \log_{10} \left( \frac{\text{MAX}^2}{\text{MSE}} \right) \text{ dB}\] where MAX = 1 (since pixel values are in [0,1]). Higher PSNR is better.
library(Metrics)
# Clip to the interior region (where repair was attempted)
half_p <- floor(P_SIZE / 2)
row_indices <- (half_p + 1):(nrow(img_matrix_backup) - half_p)
col_indices <- (half_p + 1):(ncol(img_matrix_backup) - half_p)
original_clipped <- img_matrix_backup[row_indices, col_indices]
repaired_clipped <- repaired_img_matrix[row_indices, col_indices]
mse_value <- mse(original_clipped, repaired_clipped)
psnr_value <- 10 * log10(1^2 / mse_value)
cat(sprintf("MSE: %.6f\n", mse_value))
cat(sprintf("PSNR: %.2f dB\n", psnr_value))
The K-Means algorithm discovered K = 50 texture patterns. These cluster centers represent: - Smooth, low-texture areas (uniform patches) - Edges (transitions between light and dark) - Fine details and textures
Visualizing a few centers reveals that the algorithm automatically learned meaningful image structures without any manual guidance.
Click each image to view a larger version.
|
Original Grayscale Image |
Corrupted Image (20% Damage) |
|