Abstract

This paper demonstrates a practical application of K-Means clustering to image inpainting—the task of repairing corrupted or damaged images. We implement a patch-based repair pipeline that learns a texture dictionary from intact image regions using K-Means clustering, then uses this dictionary to predict missing pixel values. For simplicity and computational efficiency, we apply the method to grayscale images; extension to color (RGB) images is theoretically straightforward but requires more complex data structures and computational resources. The study validates the method’s effectiveness on sparse, random damage and quantifies performance degradation as corruption increases.

1. Introduction

1.1 What is Image Inpainting?

Image inpainting is the process of reconstructing missing or damaged parts of an image. In real-world applications, images are often corrupted by:

Random pixel loss (sensor noise, data transmission errors)
Scratches or physical damage
Unwanted object removal
Missing data regions

The goal is to predict plausible pixel values that blend seamlessly with the surrounding intact areas.

1.2 The K-Means Clustering Approach: An Intuitive Explanation

What is Clustering?

Imagine you have a pile of photographs of trees and flowers mixed together, but you cannot see the labels. A simple approach to organize them is to group similar-looking photos together: - Photos of similar trees cluster together - Photos of similar flowers cluster together

This process—grouping similar items without labels—is called clustering. It’s an unsupervised learning technique: no one tells you the correct groups beforehand; the algorithm discovers groups by itself based on similarity.

What is K-Means?

K-Means is a popular clustering algorithm that works as follows:

Choose K (the number of groups). For example, if you want 3 groups, K=3.
Randomly place K “centers” (representative items) in the data space.
Assign each item to the nearest center. Every photo goes to the cluster whose center it most closely resembles.
Update the centers by computing the average (mean) of all items in each cluster.
Repeat steps 3–4 until the centers stop moving significantly.

The result is K representative items (cluster centers) that capture the essential patterns or modes in your data.

Why is K-Means Useful for Image Repair?

In image inpainting, we use K-Means to discover the fundamental texture patterns that appear in the intact parts of the image: - Smooth regions (low-texture areas) - Edges (boundaries between objects) - Detailed textures (fine patterns)

By clustering small patches (e.g., 5×5 pixel neighborhoods) extracted from intact regions, K-Means creates a texture dictionary—a compact set of representative patches. When we encounter a missing pixel, we:

Look at the pixel’s surrounding intact neighbors (its local patch).
Find the dictionary entry (cluster center) that best matches this local pattern.
Use that dictionary entry to predict what the missing pixel should be.

This approach is powerful because: - It learns from the image itself: No manual rules or training data needed. - It captures local context: Patches preserve spatial relationships and texture patterns. - It’s computationally feasible: K-Means is fast and scalable. - It’s interpretable: The texture dictionary visually shows what patterns the algorithm learned.

1.3 Grayscale Simplification

For this study, we convert the input image to grayscale (single intensity channel) before processing. This simplification:

Reduces computational cost: From 3 channels (RGB) to 1 channel.
Maintains texture information: Grayscale preserves all luminosity patterns needed for repair.
Allows focus on methodology: Validates the K-Means approach without color complexity.

In principle, color (RGB) image processing is entirely feasible: one would extract and cluster patches with 3×P×P dimensions instead of P×P. However, this increases memory usage and computation time proportionally.

2. Methodology

2.1 Data Preparation

2.1.1 Image Loading and Grayscale Conversion

We start with a lossless image format (PNG or BMP) to preserve exact pixel values. Lossy formats (JPEG) introduce compression artifacts that corrupt the texture patterns needed for clustering.

library(magick)

# https://drive.google.com/file/d/1EZCwt6O_KdksHAR8wQWlIJSNct-f_Fza/view?usp=sharing
img_path <- "~/UL/Wisła w Warszawa.png"
original_img_magick <- image_read(img_path)

# Convert to grayscale
grayscale_img_magick <- original_img_magick %>%
  image_quantize(colorspace = "gray")

# Extract pixel data as numeric matrix in [0,1]
img_array_backup <- grayscale_img_magick %>%
  image_data() %>%
  as.numeric()

img_matrix_backup <- matrix(img_array_backup,
                            nrow = image_info(grayscale_img_magick)$height,
                            ncol = image_info(grayscale_img_magick)$width)

cat(sprintf("Image loaded: %d × %d pixels\n",
            nrow(img_matrix_backup), ncol(img_matrix_backup)))

2.1.2 Corruption Simulation

To evaluate repair quality, we simulate random pixel damage by setting a percentage of pixels to NA (missing):

MISSING_PERCENTAGE <- 0.20  # 20% damage

corrupted_img_matrix <- img_matrix_backup
total_pixels <- prod(dim(corrupted_img_matrix))
num_missing <- round(total_pixels * MISSING_PERCENTAGE)

set.seed(42)
missing_indices <- sample(total_pixels, num_missing)
corrupted_img_matrix[missing_indices] <- NA

cat(sprintf("Damage applied: %.0f%% pixels set to NA\n",
            MISSING_PERCENTAGE * 100))

2.1.3 Patch Extraction for Training

A patch is a small square neighborhood (e.g., 5×5 pixels) centered at each pixel. We extract all patches from intact regions only (those containing no NA values):

P_SIZE <- 5  # Patch size

extract_patches <- function(mat, P = P_SIZE) {
  rows <- nrow(mat)
  cols <- ncol(mat)
  half_p <- floor(P / 2)
  patches_list <- list()

  for (i in (half_p + 1):(rows - half_p)) {
    for (j in (half_p + 1):(cols - half_p)) {
      patch <- mat[(i - half_p):(i + half_p), (j - half_p):(j + half_p)]

      # Only include complete patches (no NA values)
      if (!any(is.na(patch))) {
        patches_list[[length(patches_list) + 1]] <- as.vector(patch)
      }
    }
  }

  patches_df <- do.call(rbind, patches_list)
  return(patches_df)
}

train_patches <- extract_patches(corrupted_img_matrix, P = P_SIZE)
cat(sprintf("Training patches extracted: %d\n", nrow(train_patches)))

Why only intact patches? If we included patches with missing values, the cluster centers would be distorted averages. By training on pure, complete patches, we learn the true underlying textures of the image.

2.2 K-Means Texture Dictionary Construction

We apply K-Means clustering to the extracted patches to discover K representative texture patterns:

K_CLUSTERS <- 50  # Number of cluster centers (texture patterns)

# Run K-Means
kmeans_model <- kmeans(train_patches,
                       centers = K_CLUSTERS,
                       nstart = 50,
                       iter.max = 300)

# The cluster centers form our texture dictionary
texture_dictionary <- kmeans_model$centers

cat(sprintf("K-Means converged in %d iterations\n", kmeans_model$iter))
cat(sprintf("Dictionary created: %d texture patterns\n", nrow(texture_dictionary)))

Each row of texture_dictionary is a 5×5 patch that represents a typical texture found in the image (e.g., smooth area, edge, fine detail). These are the templates we will use to repair missing pixels.

2.3 Image Repair via Nearest-Neighbor Matching

For each missing pixel, we:

Extract the local patch around it (including the intact neighboring pixels).
Compare this incomplete patch to all K dictionary entries using Euclidean distance.
Select the best-matching dictionary entry.
Use the center pixel of that entry to predict the missing value.

repair_image_kmeans <- function(corrupted_mat, dictionary, P = P_SIZE, num_cores = 11) {
  repaired_mat <- corrupted_mat
  rows <- nrow(corrupted_mat)
  cols <- ncol(corrupted_mat)
  half_p <- floor(P / 2)

  # Find all missing pixels in the interior
  row_idx <- (half_p + 1):(rows - half_p)
  col_idx <- (half_p + 1):(cols - half_p)

  na_rel <- which(is.na(corrupted_mat[row_idx, col_idx]), arr.ind = TRUE)
  total_na_pixels <- nrow(na_rel)
  coords <- cbind(na_rel[,1] + half_p, na_rel[,2] + half_p)

  # For each missing pixel, find the best-matching dictionary entry
  # and use its center value as the prediction
  for (k in seq_len(nrow(coords))) {
    i <- coords[k, 1]
    j <- coords[k, 2]

    local_patch_vec <- as.vector(
      corrupted_mat[(i - half_p):(i + half_p), (j - half_p):(j + half_p)]
    )

    # Find the nearest dictionary entry
    min_dist <- Inf
    best_idx <- 1
    valid_mask <- !is.na(local_patch_vec)

    for (m in 1:nrow(dictionary)) {
      if (sum(valid_mask) > 0) {
        dist_sq <- sum((local_patch_vec[valid_mask] -
                       dictionary[m, valid_mask])^2)
      } else {
        dist_sq <- Inf
      }
      if (dist_sq < min_dist) {
        min_dist <- dist_sq
        best_idx <- m
      }
    }

    # Predict the missing pixel using the center of the best match
    center_patch <- matrix(dictionary[best_idx, ], nrow = P, ncol = P)
    repaired_mat[i, j] <- center_patch[half_p + 1, half_p + 1]
  }

  return(repaired_mat)
}

repaired_img_matrix <- repair_image_kmeans(corrupted_img_matrix, texture_dictionary)

2.4 Evaluation Metrics

We compare the repaired image to the original using two standard image quality metrics:

Mean Squared Error (MSE)

\[\text{MSE} = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2\] where \(y_i\) is the original pixel and \(\hat{y}_i\) is the repaired pixel. Lower MSE is better.

Peak Signal-to-Noise Ratio (PSNR)

\[\text{PSNR} = 10 \log_{10} \left( \frac{\text{MAX}^2}{\text{MSE}} \right) \text{ dB}\] where MAX = 1 (since pixel values are in [0,1]). Higher PSNR is better.

library(Metrics)

# Clip to the interior region (where repair was attempted)
half_p <- floor(P_SIZE / 2)
row_indices <- (half_p + 1):(nrow(img_matrix_backup) - half_p)
col_indices <- (half_p + 1):(ncol(img_matrix_backup) - half_p)

original_clipped <- img_matrix_backup[row_indices, col_indices]
repaired_clipped <- repaired_img_matrix[row_indices, col_indices]

mse_value <- mse(original_clipped, repaired_clipped)
psnr_value <- 10 * log10(1^2 / mse_value)

cat(sprintf("MSE: %.6f\n", mse_value))
cat(sprintf("PSNR: %.2f dB\n", psnr_value))

3. Results

3.1 Dictionary Visualization

The K-Means algorithm discovered K = 50 texture patterns. These cluster centers represent: - Smooth, low-texture areas (uniform patches) - Edges (transitions between light and dark) - Fine details and textures

Visualizing a few centers reveals that the algorithm automatically learned meaningful image structures without any manual guidance.

Click each image to view a larger version.

Original Grayscale Image

Corrupted Image (20% Damage)

Repaired Image using K-Means Inpainting

3.2 Repair Performance

For a given corruption level (e.g., 20% damage):

Before repair (corrupted): 20% of pixels are missing (shown as black in the output).
After repair (repaired): All missing pixels are predicted using the texture dictionary.

The repaired image quality depends on:

Damage level: Higher corruption → more difficult repair.
Dictionary quality: More training patches → better dictionary.
Patch size: Larger patches capture more context but require more training data.

3.3 Performance Degradation Curve

As the corruption ratio increases from 10% to 50%, the repair quality (measured by PSNR) typically:

Remains stable at low damage levels (10–30%).
Degrades sharply at high damage levels (40–50%).

This occurs because:

At low damage, most patches remain intact for training.
At high damage, fewer complete patches survive for dictionary construction.
When damage is extreme, the dictionary becomes unreliable or nearly empty.

4. Discussion

4.1 Advantages of K-Means Clustering for Inpainting

Unsupervised: Requires no manual labels or training data beyond the image itself.
Computationally efficient: Fast clustering and nearest-neighbor matching.
Interpretable: The texture dictionary is visually meaningful.
Patch-based: Preserves local spatial structure and coherence.

4.2 Limitations

Sparse damage only: Performs well when missing areas are small and scattered.
No semantic understanding: Cannot infer complex objects or structures (e.g., cannot complete a face that is 50% missing).
Patch size trade-off: Small patches → less context; large patches → need more training data.
Block damage: Contiguous missing regions are harder to repair than scattered pixels.

4.3 Extension to Color (RGB) Images

Theoretically, extending this method to color images is straightforward: - Extract 3×P×P patches (P×P per channel) instead of P×P. - Cluster these 3×P² dimensional vectors. - Predict all three color channels simultaneously.

In practice, this requires:

3× more memory for patch storage.
3× more computation for distance calculations.
3× more training data to learn a reliable color texture dictionary.

For the current study, we focused on grayscale to demonstrate the core methodology clearly and run experiments efficiently.

5. Conclusion

This paper presents a practical, interpretable approach to image inpainting using K-Means clustering. The method discovers a compact texture dictionary from the image itself and uses it to predict missing pixels. The results demonstrate:

K-Means is effective for discovering representative texture patterns in an unsupervised manner.
Patch-based nearest-neighbor matching successfully repairs scattered pixel damage at reasonable corruption levels.
Performance degrades gracefully as damage increases, with a clear threshold beyond which repair becomes unreliable.

The work validates that simple, classical machine learning techniques remain powerful for low-level image processing tasks, offering interpretability and efficiency that deep learning approaches often lack. Future extensions could include color image processing, adaptive patch sizes, or probabilistic matching (Fuzzy C-Means) for smoother predictions.

References

MacQueen, J. (1967). “Some methods for classification and analysis of multivariate observations.” Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1(14), 281–297.
Bertalmío, M., Sapiro, G., Caselles, V., & Ballester, C. (2000). “Image inpainting.” Proceedings of SIGGRAPH 2000, 417–424.
Efros, A. A., & Freeman, W. T. (2001). “Image quilting for texture synthesis and transfer.” Proceedings of SIGGRAPH 2001, 341–346.
Criminisi, A., Pérez, P., & Toyama, K. (2004). “Region filling and object removal by exemplar-based image inpainting.” IEEE Transactions on Image Processing, 13(9), 1200–1212.
Kuhn, M., & Johnson, K. (2013). Applied Predictive Modeling. Springer-Verlag.

K-Means Clustering Photo Inpainting: A Simple Unsupervised Learning Approach

Dongliang Zhang

2025-12-06