1. Introduction

League of Legends is one of the most played competitive games in the world, featuring over 160 unique champions each with distinct combat roles, resource mechanics, and stat profiles. Balancing such a large and diverse roster is a significant design challenge and one that data-driven analysis can help illuminate. This project applies unsupervised learning to a dataset of League of Legends champion base statistics sourced from Kaggle (Cute Dango, League of Legends Champions dataset, available at kaggle.com/datasets/cutedango/league-of-legends-champions) to discover whether champions naturally cluster into distinct statistical archetypes, and which features drive those groupings.

The workflow combines three complementary approaches:

Hard clustering (K-Means, Hierarchical) to identify stable, discrete champion archetypes, Soft clustering (Fuzzy C-Means) to quantify champion hybridity, how strongly each champion belongs to one archetype versus another, Dimensionality reduction (PCA, MDS, UMAP, t-SNE, SOM) to visualize the structure of the feature space and validate clustering results across multiple independent methods.

Research questions

How many distinct champion archetypes emerge when clustering by numerical base statistics?

Which attributes best discriminate between champion types, and what do they reveal about Riot’s design philosophy?

Can fuzzy membership scores meaningfully quantify champion hybridity and identify statistically unusual designs?

Why this matters understanding the statistical structure of champion design has practical implications for game balance, champion categorization, and the study of how design constraints shape a competitive roster. It also serves as a case study in applying unsupervised learning to a real-world dataset where ground truth labels exist (champion roles) but are deliberately withheld to test whether the data alone recovers meaningful structure.


2. Load Data and Libraries

library(tidyverse)
library(cluster)
library(factoextra)
library(NbClust)
library(e1071)
library(gridExtra)
library(ggrepel)
library(DT)
library(mclust)
library(ggplot2)
library(corrplot)
library(factoextra)

lol <- read.csv("LoL_champions.csv")

cat("Dataset loaded successfully!\n")
## Dataset loaded successfully!
cat("Total champions:", nrow(lol), "\n")
## Total champions: 167
cat("Total variables:", ncol(lol), "\n")
## Total variables: 24

3. Data Preprocessing

To ensure the clustering model discovers patterns based purely on numerical combat statistics, rather than pre-existing classifications, several columns were removed or transformed before analysis.

Removed columns Tags (champion type, e.g. Mage, Assassin), Role (lane assignment, e.g. top, jungle, mid, bot), and Name were all dropped. Including these would risk grouping champions by their assigned identity rather than their underlying stat profiles.

Resource type encoding Resourse.type was converted into a binary variable (resource_bin), where 1 = Mana and 0 = any other resource. Mana is by far the most common resource in the dataset, the remaining champions either use a unique champion-specific resource or none at all, so a binary encoding captures the most meaningful distinction without introducing unnecessary categories.

Range deduplication Range.type (a categorical melee/ranged label) was removed in favour of the numeric Attack.range variable, which carries the same information in a more precise and model-friendly form. Keeping both would artificially double the weight of range in the distance calculations. After these transformations, the dataset contains no missing values and was standardised to mean = 0, sd = 1 before clustering to ensure all features contribute equally regardless of their original scale.

The final processed dataset contains 167 champions and 20 numerical features, which served as the input for all subsequent clustering and dimensionality reduction methods.

# Removed data that would classify our data in groups like role in game. Range.type deleted because of double with attack.range.
lol_clean <- lol %>%
  select(-Name, -Tags, -Role) %>%
  mutate(resource_bin = ifelse(Resourse.type == "Mana", 1, 0)) %>%
  select(-Range.type, -Resourse.type)

cat("Missing values present:", any(is.na(lol_clean)), "\n")
## Missing values present: FALSE
if (any(is.na(lol_clean))) {
  lol_clean <- lol_clean %>% drop_na()
}

cat("\nProcessed data structure:\n")
## 
## Processed data structure:
str(lol_clean)
## 'data.frame':    167 obs. of  20 variables:
##  $ Base.HP                  : int  650 590 600 630 685 685 550 560 580 640 ...
##  $ HP.per.lvl               : int  114 104 119 107 120 94 92 102 102 101 ...
##  $ Base.mana                : int  0 418 200 350 350 285 495 418 348 280 ...
##  $ Mana.per.lvl             : num  0 25 0 40 40 40 45 25 42 35 ...
##  $ Movement.speed           : int  345 330 345 330 330 335 325 335 325 325 ...
##  $ Base.armor               : int  38 21 23 26 47 33 21 19 26 26 ...
##  $ Armor.per.lvl            : num  4.8 4.7 4.7 4.7 4.7 4 4.9 4.7 4.2 4.6 ...
##  $ Base.magic.resistance    : int  32 30 37 30 32 32 30 30 30 30 ...
##  $ Magic.resistance.per.lvl : num  2.05 1.3 2.05 1.3 2.05 2.05 1.3 1.3 1.3 1.3 ...
##  $ Attack.range             : int  175 550 125 500 125 125 600 625 550 600 ...
##  $ HP.regeneration          : num  3 2.5 9 3.75 8.5 9 5.5 5.5 3.25 3.5 ...
##  $ HP.regeneration.per.lvl  : num  0.5 0.6 0.9 0.65 0.85 0.85 0.55 0.55 0.55 0.55 ...
##  $ Mana.regeneration        : num  0 8 50 8.2 8.5 7.4 8 8 6.5 7 ...
##  $ Mana.regeneration.per.lvl: num  0 0.8 0 0.7 0.8 0.55 0.8 0.8 0.4 0.65 ...
##  $ Attack.damage            : int  60 53 62 52 62 57 51 50 55 59 ...
##  $ Attack.damage.per.lvl    : num  5 3 3.3 3 3.75 3.8 3.2 2.65 2.3 2.95 ...
##  $ Attack.speed.per.lvl     : num  2.5 2.2 3.2 4 2.12 ...
##  $ Attack.speed             : num  0.651 0.668 0.625 0.638 0.625 0.736 0.658 0.61 0.64 0.658 ...
##  $ AS.ratio                 : num  0.651 0.625 0.625 0.4 0.625 0.638 0.625 0.625 0.64 0.658 ...
##  $ resource_bin             : num  0 1 0 1 1 1 1 1 1 1 ...
lol_scaled <- scale(lol_clean)
cat("\n✓ Data standardized (mean=0, sd=1)\n")
## 
## ✓ Data standardized (mean=0, sd=1)

4. Determining Optimal Number of Clusters

4.1 Cluster tendency check (Hopkins statistic)

Before running clustering, we test whether the numeric feature space shows non-random structure (i.e., whether clustering is meaningful) using the Hopkins statistic.

set.seed(123)
res <- get_clust_tendency(lol_scaled, n = nrow(lol_scaled)-1, graph = TRUE)

cat("Hopkins Statistic:", round(res$hopkins_stat, 4), "\n")
## Hopkins Statistic: 0.7601
# Interpretation
if(res$hopkins_stat > 0.7) {
  cat("Interpretation: The score above 0.7 indicates a strong clustering tendency (highly non-random structure).\n")
} else if(res$hopkins_stat >= 0.5) {
  cat("Interpretation: The score above 0.5 suggests some structure exists, but clusters may not be sharply defined.\n")
} else {
  cat("Interpretation: The score close to 0.5 indicates that the data is close to a random distribution.\n")
}
## Interpretation: The score above 0.7 indicates a strong clustering tendency (highly non-random structure).
# Visual Assessment of Cluster Tendency (VAT)
res$plot + 
  labs(title = "Visual Assessment of Cluster Tendency (VAT)",
       subtitle = paste("Hopkins Statistic:", round(res$hopkins_stat, 4)),
       caption = "Interpretation: The presence of distinct red square-shaped blocks along the diagonal \nconfirms the existence of natural clusters in the League of Legends champion data.")

The VAT plot displays the dissimilarity matrix of the champions. The distinct red blocks visible along the diagonal represent groups of champions that are highly similar to each other but different from other groups. Since these rectangular structures are clearly defined, it visually confirms that the dataset contains natural archetypes (clusters), supporting the high Hopkins statistic and justifying the further use of K-means and Hierarchical clustering. After we know that data is clusterable, we can move on to using different analysis.

4.2 Elbow Method

fviz_nbclust(lol_scaled, kmeans, method = "wss") +
  labs(title = "Elbow Method for Optimal K",
       subtitle = "Look for the 'elbow' where improvement slows") +
  theme_minimal() +
  geom_vline(xintercept = 3, linetype = "dashed", color = "red")

The Elbow Method was used to determine the optimal number of clusters by plotting the Total Within-Cluster Sum of Squares (WSS) against the number of clusters (K). As shown in the plot, a distinct ‘elbow’ or ‘knee’ point is visible at K = 3, where the rate of decrease in WSS begins to level off significantly. To clearly highlight this transition, a red dashed line has been added at the third cluster. Choosing K = 3 provides an optimal balance between model complexity and cluster compactness, following the principle of parsimony.

4.3 Silhouette Heuristic (initial)

fviz_nbclust(lol_scaled, kmeans, method = "silhouette") +
  labs(title = "Silhouette Heuristic for K",
       subtitle = "Higher average silhouette width indicates better clustering") +
  theme_minimal() +
  geom_vline(xintercept = 3, linetype = "dashed", color = "red")

The Silhouette Method was employed to validate the cluster quality by measuring how well each object lies within its cluster. The blue dashed line indicates the mathematically optimal number of clusters according to this heuristic (K=2), which yields the highest average silhouette width. However, for the purpose of this research and to maintain better interpretability of champion archetypes, we have highlighted our previously selected K=3 with a red dashed line. While K=2 provides a slightly higher cohesion, K=3 allows for a more granular and meaningful separation of League of Legends roles (e.g., distinguishing between tanks, mages, and marksmen) without a significant drop in silhouette quality.

4.4 NbClust Consensus

To provide a robust conclusion regarding the optimal number of clusters, the NbClust package was utilized. This tool is a comprehensive validation framework that computes 30 different indices (such as Hubert, Duda, and Beale) simultaneously.

set.seed(123)
nb_result <- NbClust(
  lol_scaled,
  distance = "euclidean",
  min.nc = 2, max.nc = 10,
  method = "kmeans"
)

## *** : The Hubert index is a graphical method of determining the number of clusters.
##                 In the plot of Hubert index, we seek a significant knee that corresponds to a 
##                 significant increase of the value of the measure i.e the significant peak in Hubert
##                 index second differences plot. 
## 

## *** : The D index is a graphical method of determining the number of clusters. 
##                 In the plot of D index, we seek a significant knee (the significant peak in Dindex
##                 second differences plot) that corresponds to a significant increase of the value of
##                 the measure. 
## 
## ******************************************************************* 
## * Among all indices:                                                
## * 1 proposed 2 as the best number of clusters 
## * 1 proposed 4 as the best number of clusters 
## * 2 proposed 9 as the best number of clusters 
## 
##                    ***** Conclusion *****                            
##  
## * According to the majority rule, the best number of clusters is  9 
##  
##  
## *******************************************************************
min_k <- 2
max_k <- 10

row1 <- as.numeric(nb_result$Best.nc[1, ])
row2 <- as.numeric(nb_result$Best.nc[2, ])

is_valid_k_row <- function(x, min_k, max_k) {
  x <- x[is.finite(x)]
  if (length(x) == 0) return(FALSE)
  mean(x >= min_k & x <= max_k & abs(x - round(x)) < 1e-8) > 0.8
}

k_votes <- if (is_valid_k_row(row1, min_k, max_k)) row1 else row2
k_votes <- k_votes[is.finite(k_votes)]
k_votes <- k_votes[k_votes >= min_k & k_votes <= max_k]
k_votes <- round(k_votes)

votes_table <- sort(table(k_votes), decreasing = TRUE)
k_nbclust <- as.integer(names(votes_table)[1])

cat("NbClust consensus (majority vote): K =", k_nbclust, "\n\n")
## NbClust consensus (majority vote): K = 3
cat("Votes per K:\n")
## Votes per K:
print(votes_table)
## k_votes
## 3 2 5 9 8 
## 7 4 4 4 1

The final validation using the NbClust package provides a definitive statistical justification for our clustering structure. By evaluating 30 different internal indices simultaneously, the algorithm produced a distribution of ‘votes’ for the optimal number of clusters.

According to the majority rule, the optimal number of clusters is K = 3, which received the highest number of votes (7 indices). While there were secondary suggestions for K=2, K=5, and K=9 (each receiving 4 votes), the consensus clearly favors a 3-cluster partition. This result is particularly significant because it aligns with the ‘elbow’ observed in our previous analysis.

Specifically, the Hubert and D-indices (graphical methods) were used to identify the ‘knee’ or significant peak in the second differences plot, which further supports the stability of the 3-cluster solution. Choosing K=3 ensures that the champion archetypes are statistically distinct while remaining broad enough to represent the core gameplay roles in League of Legends.

4.5 Validation Summary

cat("Elbow suggests K ≈ 3\n")
## Elbow suggests K ≈ 3
cat("NbClust majority suggests K =", k_nbclust, "\n")
## NbClust majority suggests K = 3
cat("The initial silhouette heuristic plot may suggest a smaller K (often K=2).\n")
## The initial silhouette heuristic plot may suggest a smaller K (often K=2).
cat("We start with K=3 for interpretability, then re-check using detailed silhouette diagnostics.\n\n")
## We start with K=3 for interpretability, then re-check using detailed silhouette diagnostics.
k_final <- 3

5. K-Means Clustering

set.seed(123)
kmeans_result <- kmeans(lol_scaled, centers = k_final, nstart = 25)
lol$cluster <- factor(kmeans_result$cluster)

cat("K used:", k_final, "\n")
## K used: 3
print(setNames(table(lol$cluster), paste("Clusters size",names(table(lol$cluster)))))
## Clusters size 1 Clusters size 2 Clusters size 3 
##              78              27              62

5.1 Visualization (K = 3)

fviz_cluster(
  kmeans_result, data = lol_scaled,
  geom = "point",
  ellipse.type = "convex",
  palette = c("coral", "darkgreen", "steelblue"),
  ggtheme = theme_minimal()
) +
  labs(
    title = paste0("K-Means Clustering (K=", k_final, ")"),
    subtitle = paste(nrow(lol), "champions clustered in", ncol(lol_clean), "dimensional space")
  )

5.2 Silhouette Analysis (K = 3)

sil <- silhouette(kmeans_result$cluster, dist(lol_scaled))

fviz_silhouette(sil) +
  labs(
    title = paste0("Silhouette Plot for K-Means (K=", k_final, ")"),
    subtitle = "K=3 initial solution"
  )
##   cluster size ave.sil.width
## 1       1   78          0.30
## 2       2   27          0.12
## 3       3   62          0.29

avg3 <- mean(sil[, 3])
neg3 <- sum(sil[, 3] < 0)

cat(" SILHOUETTE SUMMARY (K=3) \n")
##  SILHOUETTE SUMMARY (K=3)
cat("Avg silhouette:", round(avg3, 4), "\n")
## Avg silhouette: 0.2687
cat("Negative silhouettes:", neg3, "\n\n")
## Negative silhouettes: 2
clusters <- sort(unique(sil[, 1]))
sil_summary <- data.frame(
  Cluster = clusters,
  Size = sapply(clusters, function(i) sum(sil[, 1] == i)),
  Avg_Silhouette = sapply(clusters, function(i) mean(sil[sil[, 1] == i, 3]))
)
print(sil_summary, row.names = FALSE)
##  Cluster Size Avg_Silhouette
##        1   78      0.3042883
##        2   27      0.1223551
##        3   62      0.2877652
weak_cluster <- sil_summary$Cluster[which.min(sil_summary$Avg_Silhouette)]
weak_value <- min(sil_summary$Avg_Silhouette)

cat("Interpretation:\n")
## Interpretation:
cat("K=3 is interpretable, but one cluster is relatively weak (lowest avg silhouette).\n")
## K=3 is interpretable, but one cluster is relatively weak (lowest avg silhouette).
cat("Weakest cluster:", weak_cluster, "avg silhouette =", round(weak_value, 3), "\n")
## Weakest cluster: 2 avg silhouette = 0.122

A detailed examination of the silhouette widths reveals that Cluster 2 is significantly weaker than the others. Most importantly, two champions exhibit negative silhouette values, meaning they are mathematically closer to a neighboring cluster than to their assigned group. That is why I decided to test other k for silhouette scan.

5.3 Detailed silhouette scan

set.seed(123)

ks <- 2:10
avg_sil_by_k <- sapply(ks, function(k) {
  km <- kmeans(lol_scaled, centers = k, nstart = 25)
  sil_k <- silhouette(km$cluster, dist(lol_scaled))
  mean(sil_k[, 3])
})

sil_df <- data.frame(K = ks, Avg_Silhouette = as.numeric(avg_sil_by_k))
print(sil_df, row.names = FALSE)
##   K Avg_Silhouette
##   2      0.2475940
##   3      0.2687396
##   4      0.2890218
##   5      0.1928662
##   6      0.1973759
##   7      0.1411024
##   8      0.1397115
##   9      0.1460029
##  10      0.1501829
ggplot(sil_df, aes(x = K, y = Avg_Silhouette)) +
  geom_line(linewidth = 1) +
  geom_point(size = 2) +
  geom_vline(xintercept = k_final, linetype = "dashed") +
  theme_minimal() +
  labs(
    title = "Average Silhouette Width by K",
    subtitle = "Higher is better; dashed line shows current K",
    y = "Average silhouette width"
  )

set.seed(123)
km4 <- kmeans(lol_scaled, centers = 4, nstart = 25)
sil4 <- silhouette(km4$cluster, dist(lol_scaled))
avg4 <- mean(sil4[, 3])
neg4 <- sum(sil4[, 3] < 0)

seeds <- 1:20
avg_sils4 <- sapply(seeds, function(s) {
  set.seed(s)
  km <- kmeans(lol_scaled, centers = 4, nstart = 25)
  sil_k <- silhouette(km$cluster, dist(lol_scaled))
  mean(sil_k[, 3])
})

cat("\n DECISION CHECK (K=3 vs K=4) \n")
## 
##  DECISION CHECK (K=3 vs K=4)
cat("K=3: avg silhouette =", round(avg3, 4), "| negatives =", neg3, "\n")
## K=3: avg silhouette = 0.2687 | negatives = 2
cat("K=4: avg silhouette =", round(avg4, 4), "| negatives =", neg4, "\n")
## K=4: avg silhouette = 0.289 | negatives = 0
cat("K=4 stability (20 seeds): mean =", round(mean(avg_sils4), 4),
    "| sd =", round(sd(avg_sils4), 4), "\n\n")
## K=4 stability (20 seeds): mean = 0.289 | sd = 0
k_final <- 4
kmeans_result <- km4
lol$cluster <- factor(kmeans_result$cluster)

After evaluating a range of possible clusters (K=2 to 10), a strategic decision was made to switch from K=3 to K=4. Although K=3 was initially supported by the Elbow method and NbClust, the Silhouette analysis revealed a significant improvement in model quality at K=4 (Average Silhouette Width =0.289). Now we move on to checking the silhouette for K=4.

5.4 Visualization (Final K = 4)

fviz_cluster(
  kmeans_result, data = lol_scaled,
  geom = "point",
  ellipse.type = "convex",
  ggtheme = theme_minimal()
) +
  labs(
    title = paste0("K-Means Clustering (Final K=", k_final, ")"),
    subtitle = paste(nrow(lol), "champions clustered in", ncol(lol_clean), "dimensional space")
  )

5.5 Silhouette Analysis (Final K = 4)

sil_final <- silhouette(kmeans_result$cluster, dist(lol_scaled))

fviz_silhouette(sil_final) +
  labs(
    title = paste0("Silhouette Plot for Final K-Means (K=", k_final, ")"),
    subtitle = "Final chosen solution"
  )
##   cluster size ave.sil.width
## 1       1   62          0.29
## 2       2   22          0.22
## 3       3   78          0.30
## 4       4    5          0.37

cat("\n FINAL SILHOUETTE SUMMARY (K=4) \n")
## 
##  FINAL SILHOUETTE SUMMARY (K=4)
cat("Avg silhouette:", round(mean(sil_final[, 3]), 4), "\n")
## Avg silhouette: 0.289
cat("Negative silhouettes:", sum(sil_final[, 3] < 0), "\n")
## Negative silhouettes: 0

The decision to finalize the model with K=4 is driven by superior statistical validation and improved cluster purity. As shown in the final Silhouette summary, the Average Silhouette Width reached 0.289, and more importantly, all four clusters now have their individual averages situated above the red threshold line.

A key factor in this selection is the complete elimination of negative silhouette values, ensuring every champion is correctly assigned to its most similar group. While the 2D visualization shows a slight overlap between two clusters, this is likely a projection artifact of reducing 20-dimensional data into a two-dimensional plane. The silhouette results confirm that these groups are mathematically distinct. Furthermore, the emergence of a small, highly specialized Cluster 4 (n=5, Si=0.37) is particularly valuable. These extreme profiles would have been lost in a coarser model, but here they represent a unique archetype that we expect to see fully separated during the subsequent 3D dimension reduction analysis.


6. Hierarchical Clustering

To validate the robustness of our K=4 solution, we performed Hierarchical Clustering and compared the results with the previous K-Means partition. The comparison reveals an exceptional degree of consistency between the two different algorithmic approaches.

dist_matrix <- dist(lol_scaled, method = "euclidean")
hclust_result <- hclust(dist_matrix, method = "ward.D2")

hclust_clusters <- cutree(hclust_result, k = k_final)
lol$hclust_cluster <- factor(hclust_clusters)

cat("K used:", k_final, "\n")
## K used: 4
cat("Cluster sizes:\n")
## Cluster sizes:
print(table(lol$hclust_cluster))
## 
##  1  2  3  4 
## 22 77  5 63

6.1 Dendrogram

fviz_dend(
  hclust_result, k = k_final,
  cex = 0.4,
  color_labels_by_k = TRUE,
  rect = TRUE,
  main = paste0("Hierarchical Clustering Dendrogram (K=", k_final, ")"),
  xlab = "Champions",
  ylab = "Height (Distance)"
)

6.2 Comparison with K-Means (best relabeling + ARI)

comparison_table <- table(KMeans = lol$cluster, Hierarchical = lol$hclust_cluster)

cat("K-MEANS VS HIERARCHICAL (CROSS-TAB) \n\n")
## K-MEANS VS HIERARCHICAL (CROSS-TAB)
print(comparison_table)
##       Hierarchical
## KMeans  1  2  3  4
##      1  0  0  0 62
##      2 22  0  0  0
##      3  0 77  0  1
##      4  0  0  5  0
naive_agreement <- sum(diag(comparison_table)) / nrow(lol)
cat("Naive diagonal agreement (label IDs must match):",
    round(naive_agreement * 100, 1), "%\n")
## Naive diagonal agreement (label IDs must match): 0 %
if (!requireNamespace("gtools", quietly = TRUE)) install.packages("gtools")
perms <- gtools::permutations(n = k_final, r = k_final, v = 1:k_final)

best_agreement <- -Inf
best_perm <- NULL
kmeans_int <- as.integer(lol$cluster)

for (i in seq_len(nrow(perms))) {
  p <- perms[i, ]
  mapped_h <- p[hclust_clusters]
  agree <- mean(kmeans_int == mapped_h)
  if (agree > best_agreement) {
    best_agreement <- agree
    best_perm <- p
  }
}

cat("Best label-matched agreement:", round(best_agreement * 100, 1), "%\n")
## Best label-matched agreement: 99.4 %
cat("Best mapping (Hierarchical old label -> KMeans label):\n")
## Best mapping (Hierarchical old label -> KMeans label):
print(setNames(best_perm, 1:k_final))
## 1 2 3 4 
## 2 3 4 1
ari <- mclust::adjustedRandIndex(kmeans_int, hclust_clusters)
cat("Adjusted Rand Index (ARI; label-invariant):", round(ari, 3), "\n\n")
## Adjusted Rand Index (ARI; label-invariant): 0.978

Key Findings from the Cross-Tabulation

Label Switching While the initial ‘naive’ diagonal agreement was 0%, this is merely a result of arbitrary cluster ID assignment (e.g., K-Means ‘Cluster 1’ corresponds to Hierarchical ‘Cluster 4’). After re-mapping the labels, we achieved a 99.4% best label-matched agreement.

Statistical Robustness The Adjusted Rand Index (ARI) of 0.978 indicates a near-perfect overlap. Since ARI is invariant to label permutations, this score confirms that both methods are identifying the exact same underlying champion archetypes.

Cluster Stability Out of 167 champions, only one single champion was classified differently between the two methods (K-Means Cluster 3 vs. Hierarchical Cluster 4).

Conclusion The convergence of these two distinct mathematical techniques-centroid-based (K-Means) and connectivity-based (Hierarchical)-strongly validates our model. The clusters are not artifacts of the algorithm used, but represent stable, distinct archetypes in the League of Legends dataset.


7. Fuzzy C-Means (choose K specifically for fuzzy, then compute versatility)

Why a separate K for fuzzy?
While K-Means and Hierarchical clustering aim to find hard partitions-where each champion belongs to exactly one group-Fuzzy C-Means (FCM) optimizes for soft memberships. This allows a champion to have a degree of belonging (0 to 1) to multiple clusters simultaneously.

Because the underlying mathematical objective differs, we evaluate the optimal number of clusters (K fuzzy) independently. The goal in fuzzy clustering is not just to minimize distance, but to ensure that the resulting membership grades are informative. If K is too high, memberships often become ‘diluted’ or almost uniform (e.g., 0.25 across four groups), providing no real insight into a champion’s identity. Therefore, we select K fuzzy by identifying the point where clusters remain distinct and the ‘fuzziness’ effectively highlights true hybrid characters rather than creating statistical noise.

7.1 Scan K for fuzzy (membership quality)

set.seed(123)

ks <- 2:10
m_baseline <- 2.0  # baseline m used only for scanning K

fuzzy_metrics <- lapply(ks, function(k) {
  fm <- cmeans(lol_scaled, centers = k, m = m_baseline, iter.max = 300)

  U <- fm$membership
  maxmem <- apply(U, 1, max)

  entropy <- apply(U, 1, function(x) -sum(x * log(x + 1e-10)))
  entropy_norm <- entropy / log(k)

  data.frame(
    K = k,
    Avg_MaxMembership = mean(maxmem),
    Avg_Uncertainty = mean(1 - maxmem),
    Avg_EntropyNorm = mean(entropy_norm)
  )
})

fuzzy_df <- dplyr::bind_rows(fuzzy_metrics) %>%
  dplyr::arrange(K)

knitr::kable(
  fuzzy_df, digits = 4,
  caption = paste0("Fuzzy K scan (baseline m = ", m_baseline,
                   "). Higher Avg_MaxMembership and lower Avg_EntropyNorm are better.")
)
Fuzzy K scan (baseline m = 2). Higher Avg_MaxMembership and lower Avg_EntropyNorm are better.
K Avg_MaxMembership Avg_Uncertainty Avg_EntropyNorm
2 0.6843 0.3157 0.8768
3 0.4727 0.5273 0.9245
4 0.3423 0.6577 0.9384
5 0.2802 0.7198 0.9450
6 0.2282 0.7718 0.9523
7 0.1925 0.8075 0.9606
8 0.1760 0.8240 0.9581
9 0.1576 0.8424 0.9623
10 0.1674 0.8326 0.9437
best_k_maxmem <- fuzzy_df$K[which.max(fuzzy_df$Avg_MaxMembership)]
best_k_entropy <- fuzzy_df$K[which.min(fuzzy_df$Avg_EntropyNorm)]

K_fuzzy <- best_k_maxmem

Model selection (K)
The scan over K=2..10 indicates that K = 2 yields the most informative memberships (highest average max-membership and lowest normalized entropy).
For larger K, memberships become increasingly uniform, suggesting that the data support only a small number of fuzzy prototypes.

7.2 Choose fuzziness parameter m

m_val <- 1.5
cat("Chosen m for fuzzy c-means:", m_val, "\n")
## Chosen m for fuzzy c-means: 1.5

Moderate fuzziness for outlier detection For the fuzziness exponent, I selected m=1.5. This value was chosen as a strategic middle ground; while an initial test at m=1.3 (derived from previous benchmarks) provided insufficient ‘fuzziness’ to distinguish hybrids, 1.5 offers a moderate level of overlap. This setting is ideal for outlier and hybrid detection, allowing us to identify champions that sit on the boundary between the two main archetypes without introducing excessive statistical noise.

7.3 Final fuzzy c-means fit + versatility metrics

set.seed(123)
fuzzy_result <- cmeans(lol_scaled, centers = K_fuzzy, m = m_val, iter.max = 300)

lol$fuzzy_cluster <- factor(fuzzy_result$cluster)
membership <- fuzzy_result$membership
lol$max_membership <- apply(membership, 1, max)
lol$uncertainty <- 1 - lol$max_membership

entropy <- apply(membership, 1, function(x) -sum(x * log(x + 1e-10)))
lol$entropy_norm <- entropy / log(K_fuzzy)

cat("Uncertainty Statistics Summary\n")
## Uncertainty Statistics Summary
summary(lol$uncertainty)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.01773 0.06688 0.11848 0.16004 0.23617 0.49706
cat("Normalized Entropy Summary\n")
## Normalized Entropy Summary
summary(lol$entropy_norm)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1285  0.3542  0.5250  0.5616  0.7886  1.0000

7.4 Fuzzy cluster centers (standardized)

Following the selection of K=2 and a fuzziness parameter m=1.5, we conducted a deep-dive into the membership structures to evaluate the “soft” boundaries between champion archetypes.

centers <- round(fuzzy_result$centers, 2)
knitr::kable(centers, caption = "Fuzzy c-means cluster centers (standardized).")
Fuzzy c-means cluster centers (standardized).
Base.HP HP.per.lvl Base.mana Mana.per.lvl Movement.speed Base.armor Armor.per.lvl Base.magic.resistance Magic.resistance.per.lvl Attack.range HP.regeneration HP.regeneration.per.lvl Mana.regeneration Mana.regeneration.per.lvl Attack.damage Attack.damage.per.lvl Attack.speed.per.lvl Attack.speed AS.ratio resource_bin
-0.40 -0.18 0.37 0.03 -0.54 -0.62 -0.04 -0.43 -0.78 0.79 -0.50 -0.41 0.05 0.27 -0.59 -0.32 -0.06 -0.11 -0.09 0.27
0.44 0.18 -0.34 0.02 0.59 0.66 0.07 0.47 0.82 -0.81 0.56 0.44 -0.07 -0.24 0.63 0.35 0.04 0.11 0.13 -0.21
if (!("Attack.range" %in% colnames(fuzzy_result$centers))) {
  stop("Attack.range not found in fuzzy_result$centers. Check preprocessing column names.")
}

range_center <- fuzzy_result$centers[, "Attack.range"]
ranged_id <- which.max(range_center)
melee_id  <- which.min(range_center)

label_map <- rep(NA_character_, K_fuzzy)
label_map[ranged_id] <- "Ranged/Mana-like"
label_map[melee_id]  <- "Melee/Tanky-like"

lol$fuzzy_cluster_label <- factor(
  label_map[as.integer(lol$fuzzy_cluster)],
  levels = c("Melee/Tanky-like", "Ranged/Mana-like")
)

knitr::kable(
  data.frame(Cluster_ID = 1:K_fuzzy, Label = label_map),
  caption = "Fuzzy cluster label mapping based on Attack.range in the centers."
)
Fuzzy cluster label mapping based on Attack.range in the centers.
Cluster_ID Label
1 Ranged/Mana-like
2 Melee/Tanky-like

7.5 Uncertainty distribution

par(mfrow = c(1, 2), mar = c(5, 5, 4, 2))

hist(lol$uncertainty,
     breaks = 30,
     col = "steelblue",
     border = "white",
     main = paste0("Uncertainty Distribution (K_fuzzy=", K_fuzzy, ", m=", m_val, ")"),
     xlab = "Uncertainty (1 - max membership)",
     ylab = "Frequency")
abline(v = median(lol$uncertainty), col = "red", lty = 2, lwd = 2)

hist(lol$entropy_norm,
     breaks = 30,
     col = "coral",
     border = "white",
     main = paste0("Normalized Entropy (K_fuzzy=", K_fuzzy, ", m=", m_val, ")"),
     xlab = "Normalized entropy",
     ylab = "Frequency")
abline(v = median(lol$entropy_norm), col = "red", lty = 2, lwd = 2)

par(mfrow = c(1, 1))
cat("UNCERTAINTY STATISTICS (FUZZY) \n\n")
## UNCERTAINTY STATISTICS (FUZZY)
cat("Uncertainty summary:\n")
## Uncertainty summary:
print(summary(lol$uncertainty))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.01773 0.06688 0.11848 0.16004 0.23617 0.49706
cat("Normalized entropy summary:\n")
## Normalized entropy summary:
print(summary(lol$entropy_norm))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1285  0.3542  0.5250  0.5616  0.7886  1.0000
cat("Correlation (uncertainty vs entropy): ",
    round(cor(lol$uncertainty, lol$entropy_norm), 3), "\n", sep = "")
## Correlation (uncertainty vs entropy): 0.964

The initial scan across different values of K confirmed that a binary split provides the most stable fuzzy prototypes. As shown in the quality table:

Avg Max Membership (0.68) At K=2, champions show the strongest peak membership, meaning the algorithm can clearly distinguish between the two primary poles (Melee/Tanky vs. Ranged/Mana-based).

Entropy and Uncertainty As K increases, the Normalized Entropy rises sharply toward 1.0, indicating that memberships become too diluted to be meaningful.

The Uncertainty Distribution (calculated as 1−max membership) and Normalized Entropy are highly correlated (r=0.964), serving as excellent proxies for identifying “statistical hybrids.”

Core Archetypes The majority of champions exhibit low uncertainty (Median =0.118), meaning they fit firmly into one of the two main prototypes.

The Hybrid “Bridge” Champions with uncertainty values approaching 0.5 (Max =0.497) are the true “misfits” of the dataset. These characters possess nearly equal membership in both the Ranged/Mana-like and Melee/Tanky-like clusters.

Strategic Interpretation While our previous K=4 Hard Clustering (K-Means) provided granular roles, this Fuzzy K=2 model reveals the fundamental “biological” spectrum of the game. High-uncertainty champions represent the most versatile designs in the meta, blending survivability with utility or range.


8. Principal Component Analysis (PCA)

Why transition to PCA? Clustering 167 champions across 20 different statistical dimensions (HP, Mana, Armor, Attack Speed, etc.) creates a “curse of dimensionality” where it becomes difficult to visualize and interpret the underlying patterns. PCA allows us to reduce this complexity by transforming the original correlated variables into a few uncorrelated Principal Components. This is particularly effective for this model because many champion stats are naturally linked (e.g., high Armor often correlates with high HP), allowing PCA to capture the “essence” of a champion’s profile in a simplified space.

pca_result <- prcomp(lol_scaled, scale. = FALSE)
lol$PC1 <- pca_result$x[, 1]
lol$PC2 <- pca_result$x[, 2]

var_explained <- summary(pca_result)$importance[2, 1:2] * 100

cat("PCA VARIANCE EXPLAINED\n\n")
## PCA VARIANCE EXPLAINED
cat("PC1 explains", round(var_explained[1], 1), "%\n")
## PC1 explains 28 %
cat("PC2 explains", round(var_explained[2], 1), "%\n")
## PC2 explains 15.5 %
cat("PC1+PC2 explain", round(sum(var_explained[1:2]), 1), "%\n")
## PC1+PC2 explain 43.5 %

The PCA results show that the first two components capture a total of 43.5% of the entire dataset’s variance. PC1 is the most dominant, explaining 28% of the differences between champions, likely representing the fundamental ‘Tankiness vs. Squishiness’ or ‘Melee vs. Ranged’ spectrum. PC2 adds another 15.5%, capturing secondary nuances such as utility or scaling. While 43.5% might seem moderate, in a complex dataset with 20 variables, it is a significant achievement that allows us to visualize our 4 clusters in a 2D plane while retaining the most critical structural information.

8.1 PCA loadings: which attributes drive PC1/PC2?

loadings <- as.data.frame(pca_result$rotation[, 1:2])
loadings$Feature <- rownames(loadings)

top_pc1 <- loadings |> dplyr::arrange(dplyr::desc(abs(PC1))) |> head(10)
top_pc2 <- loadings |> dplyr::arrange(dplyr::desc(abs(PC2))) |> head(10)

knitr::kable(top_pc1, digits = 3, caption = "Top 10 absolute loadings for PC1")
Top 10 absolute loadings for PC1
PC1 PC2 Feature
Attack.range 0.378 0.065 Attack.range
Magic.resistance.per.lvl -0.375 -0.103 Magic.resistance.per.lvl
Attack.damage -0.324 -0.114 Attack.damage
Base.armor -0.304 -0.067 Base.armor
Movement.speed -0.299 0.013 Movement.speed
Base.mana 0.245 -0.353 Base.mana
HP.regeneration -0.241 -0.237 HP.regeneration
Base.magic.resistance -0.233 -0.153 Base.magic.resistance
Base.HP -0.221 -0.177 Base.HP
resource_bin 0.219 -0.444 resource_bin
knitr::kable(top_pc2, digits = 3, caption = "Top 10 absolute loadings for PC2")
Top 10 absolute loadings for PC2
PC1 PC2 Feature
Mana.per.lvl 0.104 -0.466 Mana.per.lvl
resource_bin 0.219 -0.444 resource_bin
Mana.regeneration.per.lvl 0.214 -0.433 Mana.regeneration.per.lvl
Base.mana 0.245 -0.353 Base.mana
HP.regeneration -0.241 -0.237 HP.regeneration
HP.regeneration.per.lvl -0.192 -0.221 HP.regeneration.per.lvl
HP.per.lvl -0.110 -0.204 HP.per.lvl
Base.HP -0.221 -0.177 Base.HP
Base.magic.resistance -0.233 -0.153 Base.magic.resistance
Attack.damage.per.lvl -0.187 -0.121 Attack.damage.per.lvl

The PCA loadings reveal the underlying “DNA” of the champion archetypes by showing which specific statistics drive the separation on the plots.

PC1: The “Frontline vs. Backline” Spectrum The first principal component (28% of variance) acts as a primary axis for combat positioning:

Attack.range (0.378) is the dominant positive driver. Champions on the positive side of PC1 are characterized by high reach and safety.

Magic.resistance.per.lvl (-0.375), Attack.damage (-0.324), and Base.armor (-0.304) are the strongest negative drivers.

This captures the fundamental “Range vs. Durability” trade-off. PC1 separates the “glass cannon” marksmen and mages from the sturdy tanks and brawlers who possess higher base damage and defensive scaling.

PC2: The “Resource Dependency and Scaling” Axis The second principal component (15.5% of variance) captures the nuances of champion resources and growth over time:

Mana Scaling This axis is heavily influenced by mana-related stats, specifically Mana.per.lvl (-0.466), resource_bin (-0.444), and Mana.regeneration.per.lvl (-0.433).

Sustain and Durability HP.regeneration (-0.237) and HP.per.lvl (-0.204) also pull significantly in the negative direction.

PC2 distinguishes champions based on their ability to sustain combat. Champions on the negative end of this axis are typically mana-dependent “scalers” who rely on large resource pools, whereas the positive end identifies champions with unique or more static resource profiles.

8.2 PCA colored by final K-means clusters (K = 4)

ggplot(lol, aes(x = PC1, y = PC2, color = cluster)) +
  geom_point(size = 3, alpha = 0.7) +
  theme_minimal() +
  labs(
    title = "PCA (colored by final K-means clusters)",
    subtitle = paste0("Final K = ", k_final),
    x = paste0("PC1 (", round(var_explained[1], 1), "%)"),
    y = paste0("PC2 (", round(var_explained[2], 1), "%)")
  )

This spatial separation is consistent with the convex hull visualization presented earlier in Section 5, where the same four groupings appeared as non-overlapping regions - with the exception of Cluster 4 (purple) partially overlapping Cluster 2 (green), confirming that these champions sit on the boundary of the melee archetype rather than forming a fully independent group.

8.3 PCA colored by fuzzy labels (K_fuzzy = 2)

ggplot(lol, aes(x = PC1, y = PC2, color = fuzzy_cluster_label)) +
  geom_point(size = 3, alpha = 0.7) +
  scale_color_manual(values = c("Melee/Tanky-like" = "steelblue",
                                "Ranged/Mana-like" = "coral")) +
  theme_minimal() +
  labs(
    title = "PCA (colored by fuzzy c-means labels)",
    subtitle = paste0("Fuzzy K = ", K_fuzzy, " (main axis)"),
    x = paste0("PC1 (", round(var_explained[1], 1), "%)"),
    y = paste0("PC2 (", round(var_explained[2], 1), "%)"),
    color = "Fuzzy cluster"
  )

Switching to the fuzzy K=2 view reveals the fundamental binary structure underlying all four K-Means archetypes. The Melee/Tanky-like (blue) and Ranged/Mana-like (orange) groups are cleanly separated along PC1 with almost no overlap, confirming that the ranged vs. melee axis is the single strongest organising principle in the dataset. Compared to the K=4 plot, this view collapses the green/pink distinction and the small purple cluster into one melee pole, which shows that while those sub-groups are statistically distinct, they all share the same fundamental identity. The handful of points near PC1 ≈ 0 are the genuine borderline cases, champions whose stat profiles sit between both worlds and whose in-game design likely reflects a hybrid archetype such as a bruiser or battlemage.

8.4 PCA with uncertainty

ggplot(lol, aes(x = PC1, y = PC2, color = uncertainty)) +
  geom_point(size = 3, alpha = 0.7) +
  scale_color_viridis_c(option = "plasma", direction = -1) +
  theme_minimal() +
  labs(
    title = "Champion Versatility (Fuzzy): Uncertainty (PCA view)",
    subtitle = "Higher = more mixed memberships (more hybrid / potential misfit)",
    x = paste0("PC1 (", round(var_explained[1], 1), "%)"),
    y = paste0("PC2 (", round(var_explained[2], 1), "%)")
  )

The uncertainty plot adds the most analytically rich layer to the PCA analysis. Champions deep in either the melee (far left) or ranged (far right) zones are overwhelmingly yellow, meaning the algorithm assigns them with high confidence to their archetype. The highest-uncertainty points (dark purple) are not randomly scattered, they concentrate predictably near the boundary zone around PC1 ≈ 0 and within the upper portion of the melee cluster (high PC2), confirming that the fuzzy membership scores are geometrically meaningful rather than statistical noise. Notably, the isolated outlier at the very top (PC2 ≈ 5.5) shows near-maximum uncertainty despite sitting firmly in melee territory, suggesting this champion has an unusual stat profile that genuinely straddles both archetypes, making it a strong candidate for further investigation in the hybrid analysis section.

The three PCA views together tell a coherent story about champion design in League of Legends. The K=4 plot confirms that the four archetypes occupy statistically distinct regions of the feature space, while the fuzzy K=2 view reveals that all four ultimately reduce to a single fundamental axis the melee/ranged divide encoded in PC1. The uncertainty plot then adds precision to this picture by showing that most champions are unambiguous members of one pole, and that hybridity is a genuine but rare statistical property concentrated at the boundary between the two worlds. Taken together, these three views validate the entire clustering pipeline: the hard clusters capture meaningful sub-structure, the fuzzy model captures the underlying spectrum, and the uncertainty scores identify the most design-complex champions in the dataset.

8.5 PCA: correlation structure between champion statistics

To understand how champion attributes relate to each other, we visualize: 1) a feature correlation heatmap (pairwise correlations) 2) a PCA variable correlation circle, which shows how features align with PC1/PC2.

corr_mat <- cor(lol_clean, use = "pairwise.complete.obs")
op <- par(mar = c(1, 1, 2, 4))

corrplot(
  corr_mat,
  method = "color",
  type = "upper",
  order = "hclust",
  diag = FALSE,

  # readability
  tl.col = "black",
  tl.cex = 0.75,      
  tl.srt = 45,        
  tl.offset = 0.8,     
  insig = "blank",
  addCoef.col = NULL,
  col = colorRampPalette(c("#B2182B", "#F7F7F7", "#2166AC"))(200),
  cl.cex = 0.9
)

title("Correlation heatmap (numeric features)", cex.main = 1.1)

par(op)

fviz_pca_var(
  pca_result,
  col.var = "contrib",  # color by contribution to PC1/PC2
  gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
  repel = TRUE
) +
  labs(
    title = "PCA variable correlation circle",
    subtitle = "Vectors show how champion statistics relate to PC1/PC2; color = contribution"
  ) +
  theme_minimal()

The correlation heatmap reveals two distinct clusters of inter-related features that directly explain why PCA found such clean separation. The most striking pattern is the strong negative correlation between Attack.range and the melee-side stats - Base.HP, Attack.damage, Base.armor, Magic.resistance.per.lvl, and Movement.speed all correlate negatively with range, confirming that high reach and physical durability are genuinely opposing design principles in League of Legends. On the other side, the mana-related features (Mana.per.lvl, Base.mana, Mana.regeneration.per.lvl, and resource_bin) form a tight positive cluster among themselves, meaning champions that use mana tend to have consistently high values across all mana stats rather than just one. This internal correlation structure is precisely what makes PCA effective here - because the features are not independent but grouped into meaningful blocks, the algorithm can compress them into a small number of components without losing much information.

The correlation circle provides a visual summary of everything the loadings table already told us, but in a more intuitive form. Attack.range points strongly to the right (positive PC1) and is the longest, most orange vector on that axis - confirming it as the single highest-contributing feature in the entire analysis. Directly opposing it on the left are Magic.resistance.per.lvl, Movement.speed, Base.armor, and Attack.damage, all pointing in the same direction and therefore correlated with each other, collectively defining the melee/tanky pole. The mana cluster (Mana.per.lvl, resource_bin, Mana.regeneration.per.lvl, Base.mana) points downward and slightly right, nearly perpendicular to the melee vectors - which confirms that mana dependency is an independent dimension of champion design, not simply a consequence of being ranged or melee. Features like Attack.speed and AS.ratio point almost straight upward with short vectors, meaning they contribute little to either PC1 or PC2 and are not particularly useful for distinguishing champion archetypes in this two-dimensional view.

Together, the correlation heatmap and the variable correlation circle confirm that the feature space is not random - it is structured around two orthogonal design principles that Riot Games appears to have built into champion statistics: the ranged vs. melee divide on PC1, and the resource dependency spectrum on PC2. This structure is what makes unsupervised clustering effective on this dataset.


9. Dimension Reduction (four additional methods): MDS, UMAP, t-SNE, SOM

This section complements PCA with: - Classical MDS (distance-preserving, metric) - UMAP (non-linear embedding) - t-SNE (non-linear embedding, local neighborhoods) - SOM (grid-based 2D mapping)

Each is visualized with embedding colored by: 1) final K-means clusters (K = 4) 2) fuzzy uncertainty (hybrid / misfit signal)


9.1 Classical MDS

D <- dist(lol_scaled, method = "euclidean")

mds <- cmdscale(D, k = 2, eig = TRUE)

lol$MDS1 <- mds$points[, 1]
lol$MDS2 <- mds$points[, 2]

mds_var <- mds$eig[mds$eig > 0]
prop_2d <- sum(mds_var[1:2]) / sum(mds_var)

cat("MDS: approx. proportion captured by 2D =", round(prop_2d, 3), "\n")
## MDS: approx. proportion captured by 2D = 0.435
ggplot(lol, aes(x = MDS1, y = MDS2, color = cluster)) +
  geom_point(size = 3, alpha = 0.7) +
  theme_minimal() +
  labs(
    title = "Classical MDS (2D) colored by K-means clusters",
    subtitle = paste0("K-means K = ", k_final, " | 2D capture ≈ ", round(prop_2d * 100, 1), "%"),
    x = "MDS1", y = "MDS2"
  )

The Classical MDS embedding captures approximately 43.5% of the total variance in 2D - identical to PCA, which is expected since both methods are linear and distance-preserving on Euclidean data. The four clusters are clearly separated, with Cluster 3 (teal) occupying the left side, Clusters 1 (pink) and 2 (green) on the right but separated along MDS2, and Cluster 4 (purple) again appearing as a small isolated group near the green cluster. The fact that MDS produces virtually the same spatial structure as PCA is itself an important validation - it confirms that the cluster separation is not an artifact of how PCA rotates the axes, but reflects genuine distances between champions in the original 20-dimensional feature space.

ggplot(lol, aes(x = MDS1, y = MDS2, color = uncertainty)) +
  geom_point(size = 3, alpha = 0.7) +
  scale_color_viridis_c(option = "plasma", direction = -1) +
  theme_minimal() +
  labs(
    title = "Classical MDS (2D) colored by fuzzy uncertainty",
    subtitle = "Higher uncertainty = more hybrid / potential misfit",
    x = "MDS1", y = "MDS2"
  )

The uncertainty overlay on the MDS plot reinforces the same pattern seen in PCA, high-uncertainty champions (dark purple) concentrate in the transitional zone between clusters rather than at the extremes. The teal cluster on the left is predominantly yellow, confirming that ranged champions form a tight, unambiguous group with low hybridity. The most uncertain points appear scattered along the upper portion of the plot and at the boundary between the pink and green clusters on the right side, suggesting that the melee/tanky archetypes produce more hybrid edge cases than the ranged group. The isolated dark purple point at the very top (MDS2 ≈ 5) appears consistently as an outlier across both PCA and MDS, further confirming this is a genuinely unusual champion worth identifying by name in the hybrid analysis section.


9.2 UMAP

if (!requireNamespace("uwot", quietly = TRUE)) install.packages("uwot")
library(uwot)

set.seed(123)
umap_xy <- uwot::umap(
  lol_scaled,
  n_neighbors = 15,
  min_dist = 0.10,
  metric = "euclidean"
)

lol$UMAP1 <- umap_xy[, 1]
lol$UMAP2 <- umap_xy[, 2]
ggplot(lol, aes(x = UMAP1, y = UMAP2, color = cluster)) +
  geom_point(size = 3, alpha = 0.7) +
  theme_minimal() +
  labs(
    title = "UMAP (2D) colored by K-means clusters",
    subtitle = paste0("K-means K = ", k_final, " | non-linear embedding"),
    x = "UMAP1", y = "UMAP2"
  )

UMAP, as a non-linear embedding, reveals something that PCA and MDS could not show as clearly, the four clusters are not just statistically distinct, they are genuinely separated in local neighborhood structure. All four groups appear as completely isolated islands with no overlap whatsoever, which is a stronger statement than the linear methods could make. Particularly striking is that Cluster 4 (purple) now separates cleanly from Cluster 2 (green), appearing as its own distinct island rather than an embedded sub-group, confirming that these champions are not just statistical outliers within the melee archetype but a genuinely different local neighborhood in feature space. The single teal point sitting near the pink cluster is the only cross-boundary case visible, consistent with the borderline champion identified in previous plots.

ggplot(lol, aes(x = UMAP1, y = UMAP2, color = uncertainty)) +
  geom_point(size = 3, alpha = 0.7) +
  scale_color_viridis_c(option = "plasma", direction = -1) +
  theme_minimal() +
  labs(
    title = "UMAP (2D) colored by fuzzy uncertainty",
    subtitle = "Higher uncertainty = champions between prototypes",
    x = "UMAP1", y = "UMAP2"
  )

The uncertainty overlay on the UMAP reveals a particularly interesting pattern, the two isolated lower clusters (green and purple in the previous plot) are almost entirely dark purple, meaning these champions have the highest fuzzy uncertainty in the entire dataset despite forming their own tight local neighborhoods. This is an important finding: UMAP says they are internally similar to each other, but the fuzzy model says they don’t belong cleanly to either the melee or ranged prototype. This combination suggests these are champions with a genuinely unique stat profile that sits between both worlds, not random noise, but a coherent hybrid archetype that neither the melee nor ranged pole fully captures. The pink and teal clusters on the other hand show a gradient from orange to yellow toward their cores, confirming that champions at the heart of each archetype are the most statistically unambiguous.


9.3 t-SNE

if (!requireNamespace("Rtsne", quietly = TRUE)) install.packages("Rtsne")
library(Rtsne)

set.seed(123)

# Perplexity must be < (n-1)/3
perp <- 20

tsne_out <- Rtsne(
  lol_scaled,
  dims = 2,
  perplexity = perp,
  pca = TRUE,
  check_duplicates = FALSE,
  verbose = FALSE
)

lol$TSNE1 <- tsne_out$Y[, 1]
lol$TSNE2 <- tsne_out$Y[, 2]
ggplot(lol, aes(x = TSNE1, y = TSNE2, color = cluster)) +
  geom_point(size = 3, alpha = 0.7) +
  theme_minimal() +
  labs(
    title = "t-SNE (2D) colored by K-means clusters",
    subtitle = paste0("perplexity = ", perp, " | K-means K = ", k_final),
    x = "t-SNE 1", y = "t-SNE 2"
  )

The t-SNE embedding (perplexity = 20) confirms the four-cluster structure for the third time using a completely different algorithmic approach. All four groups appear as spatially distinct regions with minimal overlap, consistent with UMAP and PCA. Cluster 3 (teal) is the most spread out, suggesting it contains the most internal diversity among ranged champions, while Cluster 2 (green) forms the tightest and most compact island at the top, indicating that the melee/tanky archetype it represents is the most statistically homogeneous group in the dataset. Cluster 4 (purple) again appears as a small isolated group sitting between the green and pink clusters, reinforcing the finding from UMAP that these champions occupy a genuinely distinct position in feature space. As with all t-SNE results, the absolute distances between clusters should not be over-interpreted, but the internal structure and separation are meaningful and fully consistent with the other methods.

ggplot(lol, aes(x = TSNE1, y = TSNE2, color = uncertainty)) +
  geom_point(size = 3, alpha = 0.7) +
  scale_color_viridis_c(option = "plasma", direction = -1) +
  theme_minimal() +
  labs(
    title = "t-SNE (2D) colored by fuzzy uncertainty",
    subtitle = paste0("perplexity = ", perp, " | higher uncertainty = more hybrid / potential misfit"),
    x = "t-SNE 1", y = "t-SNE 2"
  )

The uncertainty overlay on t-SNE confirms and sharpens the findings from UMAP. The green cluster (top center) is almost entirely purple, meaning these champions, despite forming a tight, internally coherent group, are statistically the most hybrid in the dataset, sitting between both fuzzy prototypes. The same applies to the small purple Cluster 4 points visible at t-SNE coordinates around (0, 3), which also show high uncertainty. In contrast, the teal cluster (bottom left) is predominantly yellow at its core, confirming that ranged champions are the most archetypal and unambiguous group. The pink cluster (right) shows a clear gradient, yellow at the center, shifting to orange and pink toward the edges, which suggests that while most melee/tanky champions fit their archetype well, those at the periphery of the group are the most likely candidates for hybrid or bruiser classifications. This gradient pattern is consistent across all three non-linear embeddings and strongly supports the validity of the fuzzy uncertainty scores.

topN <- 10
misfits <- lol |> dplyr::arrange(dplyr::desc(uncertainty)) |> head(topN)

ggplot(lol, aes(x = TSNE1, y = TSNE2, color = cluster)) +
  geom_point(size = 2.5, alpha = 0.6) +
  geom_point(data = misfits, aes(x = TSNE1, y = TSNE2), color = "black", size = 3.2) +
  ggrepel::geom_text_repel(
    data = misfits, aes(label = Name),
    size = 3, max.overlaps = Inf
  ) +
  theme_minimal() +
  labs(
    title = "t-SNE with top hybrid/misfit champions highlighted",
    subtitle = paste0("perplexity = ", perp, " | black points = highest uncertainty"),
    x = "t-SNE 1", y = "t-SNE 2"
  )

The labeled plot identifies the ten most statistically hybrid champions by name, and the selection is immediately interpretable from a game knowledge perspective. Bel’Veth, Vladimir, Gnar, Briar, and Kled all appear within the green cluster but with high uncertainty, these are champions whose kits blend melee combat with unconventional resource mechanics or transforming playstyles, making them genuine design hybrids that the algorithm correctly flags as difficult to classify. Kennen sits isolated in the Cluster 4 (purple) zone, which makes sense as he is a ranged champion with melee-like durability stats. Nilah and Ryze appear at the edge of the pink cluster, both being statistically unusual within their archetype, Nilah as a melee ADC and Ryze as a mana-scaling champion with atypically high resource stats. Thresh sits deep in the teal cluster but is flagged as uncertain, reflecting his unique resource design as a champion with no mana. Jhin appears as an isolated outlier at the far bottom of the teal group, consistent with his exceptionally unusual attack speed mechanics which make his stat profile unlike any other ranged champion in the dataset.


9.4 SOM (Self-Organizing Map)

if (!requireNamespace("kohonen", quietly = TRUE)) install.packages("kohonen")
library(kohonen)

set.seed(123)

X <- as.matrix(lol_scaled)

som_grid <- somgrid(xdim = 10, ydim = 10, topo = "hexagonal")

som_model <- kohonen::som(
  X = X,
  grid = som_grid,
  rlen = 200,
  alpha = c(0.05, 0.01),
  keep.data = TRUE
)

lol$SOM_unit <- som_model$unit.classif
total_nodes <- som_model$grid$xdim * som_model$grid$ydim
node_counts <- as.integer(table(factor(lol$SOM_unit, levels = 1:total_nodes)))
empty_nodes <- sum(node_counts == 0)

cat("SOM nodes:", total_nodes, "\n")
## SOM nodes: 100
cat("Empty nodes:", empty_nodes, "(", round(empty_nodes / total_nodes * 100, 1), "% )\n\n")
## Empty nodes: 22 ( 22 % )
codes <- som_model$codes[[1]]
bmu_codes <- codes[lol$SOM_unit, , drop = FALSE]
qe <- sqrt(rowSums((as.matrix(lol_scaled) - bmu_codes)^2))

cat("Mean quantization error:", round(mean(qe), 3), "\n")
## Mean quantization error: 1.82
cat("Median quantization error:", round(median(qe), 3), "\n")
## Median quantization error: 1.642
cat("Max quantization error:", round(max(qe), 3), "\n")
## Max quantization error: 5.497
plot(som_model, type = "dist.neighbours",
     main = "SOM U-Matrix (neighbor distances)")

The U-Matrix shows neighbor distances across the 10×10 SOM grid, where lighter colors indicate greater distance between adjacent nodes, essentially revealing the “walls” between clusters. The two bright white/yellow regions visible in the upper-center and left-center areas of the grid mark the boundaries between champion archetypes, confirming that the clusters are not gradually merging but separated by genuine gaps in feature space. The predominantly red background indicates that most nodes within each region are densely packed and internally similar, which is consistent with the tight cluster structure seen in all previous methods.

plot(
  som_model, type = "mapping",
  main = "SOM mapping (points colored by K-means cluster)",
  pchs = 19,
  col = as.integer(lol$cluster)
)
add.cluster.boundaries(som_model, as.integer(lol$cluster))

The mapping plot shows that the four K-means clusters occupy largely separate regions of the SOM grid with minimal mixing. Green (Cluster 2) dominates the left and upper-left portion, black (Cluster 3) covers the lower-right, pink (Cluster 1) sits in the upper-right, and the small blue (Cluster 4) group appears as an isolated pair of nodes in the center, again confirming their status as a distinct archetype. The thick black boundaries drawn by the cluster detection algorithm align well with where champions of different colors actually separate, validating the K=4 solution on this entirely different representational framework.

node_unc <- tapply(lol$uncertainty, lol$SOM_unit, mean)

prop <- rep(NA_real_, total_nodes)
prop[as.integer(names(node_unc))] <- as.numeric(node_unc)

if (!requireNamespace("viridisLite", quietly = TRUE)) install.packages("viridisLite")

plot(
  som_model, type = "property",
  property = prop,
  palette.name = viridisLite::viridis,
  main = "SOM nodes colored by mean fuzzy uncertainty"
)

The uncertainty map reveals a striking spatial pattern, the highest uncertainty nodes (yellow) are not uniformly distributed but concentrated in specific transition zones between the cluster regions, particularly in the center and lower-left of the grid. This means the SOM has physically placed the most hybrid champions at the geographic boundaries between archetypes, which is exactly what a well-trained map should do. The deep purple nodes (low uncertainty) dominate the corners and edges, confirming that the most archetypal champions cluster at the periphery of the map away from the transition zones. The 22% empty nodes and mean quantization error of 1.82 are acceptable for a dataset of this size, indicating the map has learned a reasonable representation of the feature space.

9.5 Cross-method Summary: Dimension Reduction

Across all five dimensionality reduction methods PCA, MDS, UMAP, t-SNE, and SOM the results are remarkably consistent and mutually reinforcing. Every method independently recovers the same four-cluster structure, with Cluster 3 (ranged/mana-dependent) and Clusters 1 and 2 (melee/tanky sub-archetypes) forming the primary division, and the small Cluster 4 appearing as a coherent isolated group rather than noise. The linear methods (PCA and MDS) confirm that this structure is preserved in Euclidean distances, while the non-linear methods (UMAP and t-SNE) additionally reveal that the clusters have tight local neighborhood structure with virtually no overlap. The SOM independently reproduces the same topology through a completely different learning mechanism, with the U-Matrix boundaries aligning precisely with the K-means partition. The fuzzy uncertainty signal is equally consistent across all views, hybrid champions always appear at the geometric boundaries between clusters regardless of which method is used, confirming that the uncertainty scores are measuring a real property of the data rather than an artifact of any single algorithm. Taken together, these results provide strong multi-method validation that the champion archetypes identified in this analysis are stable, interpretable, and genuinely present in the League of Legends stat design.


10. Champion Analysis (Fuzzy versatility)

specialists <- lol %>%
  arrange(uncertainty) %>%
  select(Name, uncertainty, max_membership, fuzzy_cluster_label) %>%
  head(10)

knitr::kable(
  specialists, digits = 3,
  col.names = c("Champion", "Uncertainty", "Max Membership", "Fuzzy label"),
  caption = "Top 10 Specialist Champions (Lowest Uncertainty)"
)
Top 10 Specialist Champions (Lowest Uncertainty)
Champion Uncertainty Max Membership Fuzzy label
Twisted Fate 0.018 0.982 Ranged/Mana-like
Syndra 0.020 0.980 Ranged/Mana-like
Ahri 0.025 0.975 Ranged/Mana-like
Zilean 0.027 0.973 Ranged/Mana-like
Jarvan IV 0.028 0.972 Melee/Tanky-like
Varus 0.029 0.971 Ranged/Mana-like
Xin Zhao 0.030 0.970 Melee/Tanky-like
Malzahar 0.032 0.968 Ranged/Mana-like
Poppy 0.032 0.968 Melee/Tanky-like
Ziggs 0.032 0.968 Ranged/Mana-like

Specialist Champions The ten most archetypal champions are dominated by the Ranged/Mana-like group, with Twisted Fate (uncertainty = 0.018, max membership = 0.982) and Syndra (0.020) being the most statistically “pure” champions in the entire dataset. Their stat profiles, high attack range, large mana pools, strong mana scaling, align so closely with the ranged prototype that the fuzzy model assigns them with near-certainty. The three melee representatives in this list (Jarvan IV, Xin Zhao, Poppy) are equally unambiguous within their archetype, characterized by high base armor, magic resistance scaling, and low attack range with no deviation toward ranged stats whatsoever.

hybrids <- lol %>%
  arrange(desc(uncertainty)) %>%
  select(Name, uncertainty, max_membership, fuzzy_cluster_label, cluster, hclust_cluster) %>%
  head(15)

knitr::kable(
  hybrids, digits = 3,
  col.names = c("Champion", "Uncertainty", "Max Membership", "Fuzzy label", "K-means", "Hierarchical"),
  caption = "Top Hybrid / Potential Misfit Champions (Highest Uncertainty)"
)
Top Hybrid / Potential Misfit Champions (Highest Uncertainty)
Champion Uncertainty Max Membership Fuzzy label K-means Hierarchical
Briar 0.497 0.503 Melee/Tanky-like 2 1
Gnar 0.472 0.528 Melee/Tanky-like 2 1
Thresh 0.443 0.557 Ranged/Mana-like 3 2
Vladimir 0.441 0.559 Ranged/Mana-like 2 1
Kled 0.436 0.564 Melee/Tanky-like 2 1
Jhin 0.426 0.574 Ranged/Mana-like 3 2
Nilah 0.418 0.582 Melee/Tanky-like 1 4
Bel’Veth 0.410 0.590 Melee/Tanky-like 2 1
Kennen 0.404 0.596 Ranged/Mana-like 4 3
Ryze 0.399 0.601 Ranged/Mana-like 3 4
Wukong 0.398 0.602 Melee/Tanky-like 1 4
Rakan 0.392 0.608 Melee/Tanky-like 1 4
Graves 0.389 0.611 Melee/Tanky-like 1 4
Taric 0.386 0.614 Melee/Tanky-like 1 4
Kassadin 0.371 0.629 Ranged/Mana-like 3 2

Hybrid Champions The hybrid table tells a more nuanced story. Briar (0.497) and Gnar (0.472) have max memberships barely above 0.5, meaning the algorithm is almost unable to decide which archetype they belong to, they are as close to a true statistical midpoint as possible. Notably, several champions in this list also show disagreement between K-Means and Hierarchical clustering (e.g. Vladimir assigned to Cluster 2 by K-Means but Cluster 1 by Hierarchical, Nilah to Cluster 1 vs Cluster 4), which independently confirms their borderline status. Every champion on this list has a clear in-game reason for their hybridity, as discussed in the t-SNE section, reinforcing that these are genuine design patterns rather than data artefacts.


11. Interactive Champion Table

The interactive table provides a complete per-champion summary of all clustering results, combining K-Means assignment, Hierarchical clustering label, fuzzy archetype, uncertainty score, max membership, and all dimensionality reduction coordinates in one searchable view. It is sorted by uncertainty by default, meaning the most statistically ambiguous champions appear first. This table serves as a reference tool, readers can search for any specific champion to inspect how consistently it was classified across all methods, or filter by fuzzy label to compare uncertainty distributions within each archetype. Champions where K-Means and Hierarchical labels disagree (e.g. Nilah: KM1 vs H4, Kennen: KM4 vs H3) are particularly worth examining, as cross-method disagreement is an independent signal of borderline archetype membership that complements the fuzzy uncertainty scores.

champion_summary <- lol %>%
  select(
    Name, cluster, hclust_cluster, fuzzy_cluster_label,
    uncertainty, max_membership, entropy_norm,
    PC1, PC2, MDS1, MDS2, UMAP1, UMAP2, TSNE1, TSNE2, SOM_unit
  ) %>%
  mutate(
    cluster = paste0("KM", as.integer(cluster)),
    hclust_cluster = paste0("H", as.integer(hclust_cluster))
  ) %>%
  arrange(desc(uncertainty))

datatable(
  champion_summary,
  options = list(pageLength = 15, autoWidth = TRUE),
  caption = "Interactive Champion Results (sorted by uncertainty)",
  filter = "top",
  rownames = FALSE
) %>%
  formatRound(
    columns = c(
      "uncertainty", "max_membership", "entropy_norm",
      "PC1", "PC2", "MDS1", "MDS2",
      "UMAP1", "UMAP2", "TSNE1", "TSNE2"
    ),
    digits = 3
  )

12. Key Findings

The most fundamental discovery of this analysis is that League of Legends champion design is organized around two orthogonal statistical principles that together explain 43.5% of all variance.

The first and strongest is the ranged vs. melee divide, Attack.range is the single most important feature in the entire dataset, and it negatively correlates with almost every durability stat (armor, magic resistance, base HP, movement speed), confirming that Riot Games has built a systematic trade-off between reach and survivability into champion design. The second principle is resource dependency - mana-related stats form a tightly correlated cluster that is largely independent of the ranged/melee axis, meaning a champion’s resource identity is a separate design dimension from their combat positioning.

The hard clustering analysis reveals that these two principles together produce four stable archetypes: a large ranged/mana-dependent group (marksmen and mages), two distinct melee sub-archetypes separated by resource scaling and sustain, and a small but genuine fourth group of champions with extreme stat profiles that no coarser model would have detected.

The fuzzy analysis adds an important nuance, while most champions (median uncertainty 0.118) fit cleanly into one archetype, a meaningful minority are genuine statistical hybrids. The labeled t-SNE plot identifies these by name: Jhin, Nilah, Vladimir, Kennen, Thresh, and Gnar are among the champions whose stat profiles most straddle the boundary between archetypes, and in every case the statistical hybridity reflects a real design decision - a melee ADC, a manaless support, a transforming champion, or an unconventional resource mechanic.


13. Could this reflect data limitations rather than genuine design patterns?

This is an important methodological question. Several potential data issues could in principle produce apparent hybrids artificially. Champions with missing or imputed stats, those added late in the game’s lifecycle with non-standard base values, or champions whose mechanics are not fully captured by static base stats (such as transformation champions like Gnar or resource-agnostic champions like Thresh) could appear as statistical outliers for reasons unrelated to their actual gameplay identity.

However, several factors suggest the hybrid signal in this analysis is genuine rather than artifactual. First, the same champions appear as high-uncertainty cases consistently across all five dimensionality reduction methods, if the hybridity were noise or a data error, it would be unlikely to reproduce so reliably. Second, in every identified case the statistical ambiguity has a clear in-game explanation: Nilah is a melee champion designed to fill the ADC role, Jhin has a deliberately unique attack speed mechanic, Thresh uses souls instead of mana, and Kennen is a ranged champion with unusually high base durability.

Third, the uncertainty scores are continuous and graded rather than binary, which is more consistent with genuine design variation than with data errors which would tend to produce sharp outliers.

The main genuine limitation is that base stats alone do not capture the full complexity of a champion’s kit, abilities, scaling, and itemization are not included in this dataset. Some champions may therefore appear statistically hybrid simply because their power is concentrated in their abilities rather than their base stats, which is a dataset limitation rather than a reflection of true archetype ambiguity.


14. Conclusion

This analysis set out to answer three research questions using unsupervised learning on League of Legends champion statistics, and all three can now be answered with confidence.

How many distinct champion archetypes exist? The optimal hard partition is K=4, supported by convergent evidence from the Elbow method, NbClust consensus, silhouette analysis, and near-perfect agreement (ARI = 0.978) between K-Means and Hierarchical clustering. These four archetypes are stable, reproducible across algorithms, and visible in every dimensionality reduction method applied.

Which attributes best discriminate between champion types? Attack.range is the single most discriminative feature in the dataset, driving the primary axis of separation (PC1, 28% of variance) and negatively correlating with almost every durability stat. The secondary axis (PC2, 15.5%) is defined by mana dependency and resource scaling, revealing that champion identity is structured around two orthogonal design dimensions that appear to be deliberately built into Riot Games’ champion design philosophy.

Can hybridity be quantified using fuzzy memberships? Yes. The fuzzy c-means model (K=2, m=1.5) produces uncertainty scores that are geometrically meaningful, consistent across all five dimension reduction methods, and interpretable by name, champions like Briar, Gnar, Thresh, and Jhin score highest not by accident but because their in-game design deliberately blends statistical properties from both archetypes.

Taken together, the results suggest that League of Legends champion design is not arbitrary but follows an underlying statistical grammar with two primary axes and four stable role families. The unsupervised approach used here recovers this structure without any prior knowledge of champion roles or tags, which validates both the methodology and the insight. Future work could extend this analysis by incorporating ability data, patch history, or win-rate statistics to move from descriptive archetype discovery toward predictive modeling of champion performance and balance.

Beyond academic interest, this analysis has a direct practical application for players. A player who enjoys a particular champion can use the clustering and uncertainty scores to find statistically similar alternatives, champions that share the same fundamental stat profile and therefore likely feel similar to play. For example, a player who enjoys Twisted Fate (the most archetypal Ranged/Mana-like champion) can look for low-uncertainty neighbours in the same cluster, while a player drawn to hybrid champions like Gnar or Vladimir might find other high-uncertainty champions equally satisfying due to their similarly flexible stat designs. The interactive table provided in this report makes this kind of personalised exploration directly accessible.


Session Info

sessionInfo()
## R version 4.5.2 (2025-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.5
## 
## Matrix products: default
## BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Europe/Warsaw
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] kohonen_3.0.13   Rtsne_0.17       uwot_0.2.4       Matrix_1.7-4    
##  [5] corrplot_0.95    mclust_6.1.2     DT_0.34.0        ggrepel_0.9.6   
##  [9] gridExtra_2.3    e1071_1.7-17     NbClust_3.0.1    factoextra_1.0.7
## [13] cluster_2.1.8.1  lubridate_1.9.4  forcats_1.0.1    stringr_1.6.0   
## [17] dplyr_1.1.4      purrr_1.2.1      readr_2.1.6      tidyr_1.3.2     
## [21] tibble_3.3.0     ggplot2_4.0.1    tidyverse_2.0.0 
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6       xfun_0.55          bslib_0.9.0        htmlwidgets_1.6.4 
##  [5] rstatix_0.7.3      lattice_0.22-7     tzdb_0.5.0         crosstalk_1.2.2   
##  [9] vctrs_0.6.5        tools_4.5.2        generics_0.1.4     proxy_0.4-27      
## [13] pkgconfig_2.0.3    RColorBrewer_1.1-3 S7_0.2.1           lifecycle_1.0.4   
## [17] FNN_1.1.4.1        compiler_4.5.2     farver_2.1.2       carData_3.0-5     
## [21] htmltools_0.5.9    class_7.3-23       sass_0.4.10        yaml_2.3.12       
## [25] Formula_1.2-5      pillar_1.11.1      car_3.1-3          ggpubr_0.6.2      
## [29] jquerylib_0.1.4    cachem_1.1.0       viridis_0.6.5      abind_1.4-8       
## [33] RSpectra_0.16-2    gtools_3.9.5       tidyselect_1.2.1   digest_0.6.39     
## [37] stringi_1.8.7      reshape2_1.4.5     labeling_0.4.3     fastmap_1.2.0     
## [41] grid_4.5.2         cli_3.6.5          magrittr_2.0.4     broom_1.0.11      
## [45] withr_3.0.2        scales_1.4.0       backports_1.5.0    timechange_0.3.0  
## [49] rmarkdown_2.30     otel_0.2.0         ggsignif_0.6.4     hms_1.1.4         
## [53] evaluate_1.0.5     knitr_1.51         viridisLite_0.4.2  rlang_1.1.6       
## [57] dendextend_1.19.1  Rcpp_1.1.0         glue_1.8.0         rstudioapi_0.17.1 
## [61] jsonlite_2.0.0     R6_2.6.1           plyr_1.8.9