1. Introduction

This project performs a multivariate analysis of Pokemon battle statistics. We utilize Principal Component Analysis (PCA) for dimensionality reduction and K-Means Clustering to identify distinct battle archetypes (e.g., Tanks, Sweepers).

  1. Data Loading & Preparation

We select the 6 numerical battle stats (HP, Attack, Defense, Sp. Atk, Sp. Def, Speed) and standardize them to ensure fair comparison.

# 1. Load Data 
df<-read.csv("Pokemon.csv")

# 2. Select only numeric battle stats
df_stats<-df %>% select(HP, Attack, Defense, Sp..Atk, Sp..Def, Speed)
df_stats<-na.omit(df_stats)
pokemon_names<-df$Name

# 3. Standardization (Scale)
df_scaled<-scale(df_stats)
rownames(df_scaled)<-pokemon_names
  1. Exploratory Data Analysis

We start by examining the relationships between different stats.

Correlation Heatmap

The heatmap below shows how variables correlate (e.g., Sp.Def and Defense often rise together).

M<-cor(df_scaled)
corrplot(M,method="color",type="upper",addCoef.col="black", 
         tl.col="black",tl.srt=45,title="Correlation Heatmap",mar=c(0,0,2,0))

  1. Dimension Reduction (PCA)

We use PCA to compress the 6 dimensions into 2 main components for visualization.

# Run PCA
pca_res<-prcomp(df_scaled,center=TRUE,scale.=TRUE)

# Scree Plot
fviz_eig(pca_res,addlabels=TRUE,ylim=c(0, 50),main="Scree Plot")

# Biplot
fviz_pca_biplot(pca_res,label="var",col.var="red",alpha.ind=0.5)

  1. K-Means Clustering

We classify the Pokemon into 3 distinct groups.

Optimal Clusters (Elbow Method)

fviz_nbclust(df_scaled,kmeans,method="wss")+labs(title="Elbow Method")

Cluster Visualization

# Run K-Means
set.seed(123)
km_res<-kmeans(df_scaled,centers=3,nstart=25)

# Visualize
fviz_cluster(km_res,data=df_scaled,geom="point",ellipse.type="convex", 
             palette="jco",main="K-Means Clustering on PCA Axes")

Silhouette Plot (Quality Check)

This plot checks how well each Pokemon fits into its assigned cluster.

# Calculate Distance and Silhouette
dist_matrix<-dist(df_scaled,method="euclidean")
sil_obj<-silhouette(km_res$cluster,dist_matrix)

# Plot
fviz_silhouette(sil_obj,palette="jco",ggtheme=theme_minimal())
##   cluster size ave.sil.width
## 1       1  243          0.19
## 2       2  343          0.41
## 3       3  214          0.08

The obtained silhouette scores indicate that the clusters are somewhat overlapping. For the Pokemon dataset, it is expected behavior due to the nature of the data, i.e, biological statistics typically follow a continuous distribution rather than discrete, isolated groups. While distinct archetypes (e.g., Defensive Tanks vs. Fast Sweepers) exist, many Pokemon possess ‘hybrid’ stats that place them on the boundaries between clusters

  1. Hierarchical Clustering (Dendrogram)

Alternative clustering method showing the hierarchy of relationships.

# Calculate Hierarchical Clustering
hc_res<-hclust(dist_matrix,method="ward.D2")

# Plot Tree
plot(hc_res,labels=FALSE,main="Hierarchical Dendrogram",xlab="",sub="")
rect.hclust(hc_res,k = 3,border= 2:4)

6.1 Zoomed-In Dendrogram (Sample of 50)

Since the full dataset is too large to read, we visualize a random sample of 50 Pokemon to clearly see how the clustering works on individual names.

# 1. Take a random sample of 50 Pokemon
set.seed(123)
# We sample the row numbers
sample_indices<-sample(1:nrow(df_scaled),50)
# We create a smaller dataset
df_sample<-df_scaled[sample_indices, ]

# 2. Calculate Distance & Clustering for just this sample
dist_sample<-dist(df_sample,method="euclidean")
hc_sample<-hclust(dist_sample,method="ward.D2")

# 3. Plot the fancy dendrogram
fviz_dend(hc_sample, 
          k=3,                
          cex=0.8,             
          k_colors="jco",     
          rect=TRUE,           
          rect_border="jco", 
          rect_fill=TRUE,
          main="Dendrogram: Random Sample of 50 Pokemon")

  1. Conclusion

The analysis successfully grouped Pokemon into three archetypes:

Cluster 1: Likely “balanced” or weaker Pokemon.

Cluster 2: Pokemon with high Defense/Sp.Def (Tanks).

Cluster 3: Pokemon with high Speed and Attack (Sweepers).