The objective is to analyze the different profiles of offensive players (excluding support and jungle roles) in League of Legends matches, and to identify various profiles characterized by performance metrics such as kills, deaths, damage dealt, and gold earned. To achieve this, a Principal Component Analysis (PCA) is used to better understand the relationships between individual success variables. Subsequently, a clustering algorithm (K-means) is applied to identify natural groupings of players based on their playstyle and combat effectiveness. This approach allows for the exploration of which profiles tend to achieve higher win rates, offering a more nuanced perspective than simply analyzing the played or system-suggested position.
# Librerías principales para análisis
library(dplyr) # Manipulación de datos
##
## Adjuntando el paquete: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2) # Visualizaciones
library(tidyr) # Transformación de datos
library(FactoMineR) # PCA avanzado
## Warning: package 'FactoMineR' was built under R version 4.4.3
library(factoextra) # Visualización PCA y clustering
## Warning: package 'factoextra' was built under R version 4.4.3
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(cluster) # Algoritmos de clustering
library(class) # KNN para validación
library(fastDummies) # Creación de variables dummy
## Warning: package 'fastDummies' was built under R version 4.4.3
library(grid)
library(gridExtra) # Visualización múltiple
## Warning: package 'gridExtra' was built under R version 4.4.3
##
## Adjuntando el paquete: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
library(class) # Implementación del algoritmo
library(caret) # Partición de los datos
## Warning: package 'caret' was built under R version 4.4.3
## Cargando paquete requerido: lattice
library(lattice) # para hacer la matriz de confusion
df <- read.csv("D:/UPV/2º/Proyecto II/lol_dataset.csv")
Before applying multivariate analysis techniques, it was necessary to perform a series of transformations on the data to ensure its quality and suitability for modeling. # 2.1- Data Filtering Records were filtered to retain only those players who played offensive roles: TOP, MID, and ADC. This decision is based on the fact that SUPPORT and JUNGLE roles have different gameplay dynamics, with less emphasis on directly earning gold and dealing damage, which could introduce noise into the analysis of offensive profiles.
# Para quedarnos solo con los delanteros (excluyendo SUPPORT y JUNGLE)
df<- df %>%
filter(!(team_position %in% c("SUPPORT", "JUNGLE")))
Variables reflecting individual player performance in combat and their impact on the match were selected:
From the first dataset (match data):
gold_earned: Total gold earned (target variable of the analysis). kills: Number of kills. deaths: Number of deaths. assists: Number of assists. total_damage_dealt_to_champions: Total damage dealt to enemy champions. From the second dataset (champion data):
damage: Total damage dealt (including structures and minions). damage_dealt_to_turrets: Damage dealt to turrets. herotype and adaptivetype: Champion types, which will be transformed into dummy categorical variables. These variables capture both direct offensive performance and the player’s playstyle.
data <- df %>%
select(gold_earned, kills, deaths, total_damage_dealt_to_champions,
herotype, adaptivetype, damage, damage_dealt_to_turrets,
assists)
The variables herotype and adaptivetype were transformed into dummy variables to be included in the Principal Component Analysis (PCA). The first category of each variable was removed to avoid collinearity.
All numerical variables were scaled (normalized) to have a mean of zero and a standard deviation of one. This step is essential in PCA, as it prevents variables with larger scales (such as damage or gold) from dominating the analysis.
# Convertimos herotype y adaptivetype en variables dummy
data_convertido <- data %>%
mutate(across(c(herotype, adaptivetype), as.factor)) %>%
mutate(across(where(is.factor), as.character)) %>%
dummy_cols(select_columns = c("herotype", "adaptivetype"), remove_first_dummy = TRUE) %>%
select(-herotype, -adaptivetype)
# Guardamos el nombre de los campeones para análisis posterior
champions <- df$champion_name
# Normalizamos todas las variables
data_scaled <- scale(data_convertido)
The goal of this analysis is to divide offensive players (TOP, MID, and ADC) into groups with similar playstyles and, within each group, identify which champions generate the most gold, which is the chosen metric for individual performance.
To achieve this, a two-phase strategy is followed:
A Principal Component Analysis (PCA) is applied, not with the goal of reducing dimensionality, but to understand how performance variables (such as kills, damage, deaths, assists, etc.) relate to each other and how they cluster around the key variable: gold earned.
This analysis allows us to:
Detect which variables tend to appear together in the same player profiles. Identify combinations of features that explain different offensive playstyles. Facilitate the later interpretation of the groups formed, helping to better define each player’s profile. In summary, PCA acts as an exploratory tool that helps us understand what defines an effective offensive player and how playstyles differ from one another.
Based on the relationships observed in the PCA, the K-means algorithm is applied to group players into clusters or offensive profiles. Each cluster represents a type of player with a characteristic playstyle.
Once the groups are defined, the analysis focuses on:
Which champions are most frequent within each profile. Which of those champions generate the most gold in that playstyle. This enables personalized recommendations: if a player identifies with a specific profile, they can know which champions to choose to maximize their gold income without changing their playstyle.
To begin the analysis, a Principal Component Analysis (PCA) was applied to the previously normalized variables. Although dimensionality reduction was not the main goal, PCA was key as an exploratory tool to understand how the various offensive performance metrics relate to each other, especially around the target variable: gold earned.
The first 10 principal components were extracted, and their explained variance was analyzed. In the scree plot, a red horizontal line was added to represent the average explained variance.
The first 5 components were selected for further analysis, as all of them exceeded the average explained variance threshold and together explained more than 70% of the total variance. These components reveal combinations of variables that explain different aspects of offensive performance and will be essential for interpreting the player profiles obtained in the clustering phase.
# Aplicar PCA
library(FactoMineR)
library(factoextra)
res.pca <- PCA(data_scaled, scale.unit = TRUE, graph = FALSE, ncp = 10)
eig.val <- get_eigenvalue(res.pca)
VPmedio = 100 * (1/nrow(eig.val))
fviz_eig(res.pca, addlabels = TRUE) +
geom_hline(yintercept=VPmedio, linetype=2, color="red")
Through the correlation and variable contribution plots generated by the PCA, patterns were identified that help understand how the different offensive performance metrics relate to each other. This stage is key to gaining a structured view of player behavior before applying any clustering technique.
Principal components do not represent individual variables, but rather linear combinations of them. Therefore, their interpretation helps uncover natural groupings of variables that tend to appear together in the same players. The main findings are summarized below:
Component 1 – Direct Offensive Impact This component is dominated by variables such as gold_earned, kills, total_damage_dealt_to_champions, and damage. This indicates that these variables are strongly correlated and collectively define a pure offensive performance axis. In other words, players who deal a lot of damage and secure many kills also tend to generate more gold. This direct relationship with the target variable is especially relevant for the subsequent analysis.
Component 2 – Participation and Risk This axis is characterized by deaths and assists, suggesting a more participatory but also riskier playstyle. Players appearing in this component are often involved in many team actions, which can result in both assists and deaths. This component helps distinguish players who contribute to the team without necessarily being the top damage dealers.
Components 3, 4, and 5 – Tactical and Structural Nuances These components capture more specific aspects of gameplay. For example, damage_dealt_to_turrets appears as a separate dimension from champion damage, indicating that not all offensive players contribute to map progression in the same way. Additionally, the transformed categorical variables (herotype, adaptivetype) also weigh into these components, suggesting that the type of champion chosen influences playstyle and how other metrics combine.
Overall, PCA allows us to visualize how variables cluster around certain behavioral axes and how these groupings relate to gold generation. This information will be essential for interpreting the clustering results, as it will help us understand what kind of performance defines each player group and how they differ from one another.
fviz_pca_var(res.pca,
axes = c(1,2),
repel = TRUE,
col.var= "contrib",
gradientes.cols=c("#00AFBB", "#E7B800","#FC4E07"))
fviz_pca_var(res.pca,
axes = c(3,4),
repel = TRUE,
col.var= "contrib",
gradientes.cols=c("#00AFBB", "#E7B800","#FC4E07"))
fviz_pca_var(res.pca,axes = c(1,5), repel = TRUE, col.var= "contrib",gradientes.cols=c("#00AFBB", "#E7B800","#FC4E07"))
fviz_contrib(res.pca, choice = "var", axes = 1)
fviz_contrib(res.pca, choice = "var", axes = 2)
fviz_contrib(res.pca, choice = "var", axes = 3)
fviz_contrib(res.pca, choice = "var", axes = 4)
fviz_contrib(res.pca, choice = "var", axes = 5)
#OBTENER CARGAS DE LAS VARIABLES
# Obtener las cargas de las variables (matriz de varianzas)
var_cargas <- res.pca$var$coord
#OBTENER CONTRIBUCION DE LAS VARIABLES
var_contrib <- res.pca$var$contrib
To identify how many distinct offensive profiles exist among players, the K-means algorithm was applied, which requires predefining the number of groups (clusters) to form. To determine the optimal value of k, two complementary methods were used:
Elbow Method (WSS): Evaluates intra-cluster variation. The goal is to find the point where the improvement from increasing the number of clusters begins to level off, indicating a balance between simplicity and accuracy. Silhouette Coefficient: Measures the internal cohesion of clusters and their separation from others. Higher values indicate better-defined groups. Both methods agreed that k = 4 is an appropriate value, suggesting that there are four distinct offensive profiles among the players analyzed.
# Probamos diferentes valores de k (número de clusters)
fviz_nbclust(data_scaled, kmeans, method = "wss") +
labs(title = "Método del Codo para Selección de K")
set.seed(123) # Fijamos una semilla para resultados reproducibles
k_optimo <- 4
p1 = fviz_nbclust(x = data_scaled, FUNcluster = kmeans, method = "silhouette",
k.max = 10, verbose = FALSE) +
labs(title = "K-means")
p2 = fviz_nbclust(x = data_scaled, FUNcluster = kmeans, method = "wss",
k.max = 10, verbose = FALSE) +
labs(title = "K-means")
grid.arrange(p1, p2, nrow = 1)
Once the number of clusters was defined, the K-means algorithm was applied to the normalized data. Each player was assigned to one of the four groups, and the distribution of the data in the multidimensional space was visualized.
The visualization shows that the clusters are reasonably well separated, indicating that the identified profiles have distinct characteristics. This segmentation will be key in the following sections to analyze what defines each group and which champions are most effective within each playstyle. However, to ensure this, a nearest neighbor analysis was also applied.
# Aplicar K-Means
kmeans_result <- kmeans(data_scaled, centers = k_optimo, nstart = 25)
# Agregar los clusters al dataframe original
res.pca$cluster <- as.factor(kmeans_result$cluster)
fviz_cluster(kmeans_result, data = data_scaled) +
labs(title = "Visualización de Clustering con K-Means")
To assess the coherence and stability of the clusters generated by K-means, a supervised classification model was applied using the K-Nearest Neighbors (KNN) algorithm. The goal is to verify whether the identified offensive profiles can be automatically recognized based on the original variables, which would reinforce the validity of the clustering.
Procedure:
The dataset was split into 80% for training and 20% for testing, maintaining the proportion of each cluster (stratified split). A KNN model was trained with k = 4 neighbors. The model’s performance was evaluated using a confusion matrix and classification metrics. Results:
Overall Accuracy: 99.47% Kappa Index: 0.9929, indicating near-perfect agreement between the model’s predictions and the actual clusters. 95% Confidence Interval for Accuracy: (0.9878, 0.9983) P-value vs. No Information Rate: < 2.2e-16, confirming that the model performs significantly better than random classification. The confusion matrix shows that the model correctly classifies almost all cases, with minimal errors (e.g., only 1 error in class 1 and 3 in class 4). This indicates that the generated clusters are internally consistent and well-defined, and that the variables used are sufficient to distinguish between the different offensive profiles.
# Creamos un modelo KNN para predecir el cluster y validar
set.seed(42)
data_clustered <- as.data.frame(cbind(data_scaled, cluster = kmeans_result$cluster, champion_name = champions))
# Crear partición estratificada basada en los clusters
trainIndex <- createDataPartition(data_clustered$cluster, p = 0.8, list = FALSE)
# Dividir los datos escalados
trainData <- data_scaled[trainIndex, ]
testData <- data_scaled[-trainIndex, ]
# Etiquetas de cluster
trainLabels <- as.factor(data_clustered$cluster[trainIndex])
testLabels <- as.factor(data_clustered$cluster[-trainIndex])
# Aplicar KNN
knn_pred <- knn(train = trainData, test = testData, cl = trainLabels, k = 4)
# Evaluar con matriz de confusión
confusionMatrix(knn_pred, testLabels)
## Confusion Matrix and Statistics
##
## Reference
## Prediction 1 2 3 4
## 1 237 5 0 0
## 2 0 276 0 0
## 3 0 0 184 0
## 4 0 0 0 250
##
## Overall Statistics
##
## Accuracy : 0.9947
## 95% CI : (0.9878, 0.9983)
## No Information Rate : 0.2952
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9929
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 1 Class: 2 Class: 3 Class: 4
## Sensitivity 1.0000 0.9822 1.0000 1.0000
## Specificity 0.9930 1.0000 1.0000 1.0000
## Pos Pred Value 0.9793 1.0000 1.0000 1.0000
## Neg Pred Value 1.0000 0.9926 1.0000 1.0000
## Prevalence 0.2489 0.2952 0.1933 0.2626
## Detection Rate 0.2489 0.2899 0.1933 0.2626
## Detection Prevalence 0.2542 0.2899 0.1933 0.2626
## Balanced Accuracy 0.9965 0.9911 1.0000 1.0000
Once players were assigned to their respective clusters using the K-means algorithm, the average profile of each group was analyzed. To do this, the means of all normalized variables within each cluster were calculated, allowing us to identify which characteristics stand out in each group.
These averages are graphically represented using a line profile chart, where each colored line represents a different cluster. The X-axis shows the analyzed variables (such as damage, kills, assists, etc.), and the Y-axis shows their standardized mean values.
This type of visualization allows us to:
Directly compare differences between clusters. Identify which variables define each group (e.g., one cluster with high damage and gold, another with many assists, etc.). Begin to infer the different offensive playstyles that exist among players, although a more detailed interpretation will be developed in the next section. This analysis is key to moving from a mathematical grouping to a practical and understandable interpretation of offensive profiles, which will be completed with the analysis of the most frequent and effective champions in each group.
# Crear data frame con clusters
data_scaled_df <- as.data.frame(data_scaled)
data_scaled_df$cluster <- kmeans_result$cluster
# Calcular medias por cluster
mediasCluster <- data_scaled_df %>%
group_by(cluster) %>%
summarise(across(where(is.numeric), mean, na.rm = TRUE))
## Warning: There was 1 warning in `summarise()`.
## ℹ In argument: `across(where(is.numeric), mean, na.rm = TRUE)`.
## ℹ In group 1: `cluster = 1`.
## Caused by warning:
## ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
## Supply arguments directly to `.fns` through an anonymous function instead.
##
## # Previously
## across(a:b, mean, na.rm = TRUE)
##
## # Now
## across(a:b, \(x) mean(x, na.rm = TRUE))
# Convertir a matriz para graficar
mediasCluster_matrix <- as.matrix(mediasCluster[,-1])
# Colores para cada cluster
colores <- rainbow(nrow(mediasCluster))
# Gráfico de líneas tipo perfil
matplot(t(mediasCluster_matrix), type = "l", col = colores, ylab = "", xlab = "", lwd = 2,
lty = 1, main = "Perfil medio de los clusters", xaxt = "n")
# Etiquetas del eje X
axis(side = 1, at = 1:ncol(mediasCluster_matrix), labels = colnames(mediasCluster_matrix), las = 2)
# Leyenda
legend("topleft", legend = paste("Cluster", 1:nrow(mediasCluster)), col = colores, lwd = 2, ncol = 3, bty = "n")
Players in this cluster show a profile focused on damage and kills, with above-average values in kills, gold earned, and damage dealt. Although their assist numbers are lower, they tend to play Marksman-type champions, suggesting a predominant role as ADCs or assassins in the bottom lane. This group represents an aggressive, self-sufficient playstyle aimed at securing kills.
This group is characterized by a high number of assists and significant participation in objectives like turrets, although they also show a moderate number of deaths. Players often use Fighter-type champions, suggesting a role as top laners or active junglers in team fights. These players are constantly involved in the action, facilitating plays for the team and balancing damage, durability, and support.
Members of this cluster excel in damage dealt but have low levels of kills, gold, and direct turret involvement. They are mostly Mage-type users, indicating a preference for ability power and zone control champions, likely in the mid lane or as long-range poke. Their approach is more strategic and positional, prioritizing poke and utility over direct aggression.
This cluster represents players with high assist participation but very low values in kills, gold, damage, and objective control. Support and Tank roles dominate here, indicating a focus on team support, ally protection, and fight control. These players are key to team success despite not standing out in individual stats. Their style centers on teamwork, vision, and utility.
data_clustered_champions <- data.frame(
champion_name = champions,
cluster = as.factor(kmeans_result$cluster),
gold_earned = df$gold_earned
)
min_porcentaje <- 0.025 # Mínimo 2,5%
# Calcular total de campeones por cluster
top10_champions <- data_clustered_champions %>%
group_by(cluster) %>%
mutate(total_cluster = n()) %>%
group_by(cluster, champion_name) %>%
summarise(
count = n(),
avg_gold = mean(gold_earned, na.rm = TRUE),
total_cluster = first(total_cluster), # total del cluster para ese grupo
.groups = "drop"
) %>%
mutate(porcentaje = count / total_cluster) %>%
filter(porcentaje >= min_porcentaje) %>% # <- Aquí filtras por porcentaje mínimo
arrange(cluster, desc(count)) %>%
group_by(cluster) %>%
slice_max(order_by = count, n = 10)
# Calcular resumen por cluster
resumen_cluster <- top10_champions %>%
group_by(cluster) %>%
summarise(
total_campeones_cluster = sum(count), # Total de apariciones en el cluster
n_campeones_que_cumplen = n(),
oro_promedio_cluster = mean(avg_gold, na.rm = TRUE),# Campeones que cumplen el % mínimo
.groups = "drop"
)
print(resumen_cluster)
## # A tibble: 4 × 4
## cluster total_campeones_cluster n_campeones_que_cumplen oro_promedio_cluster
## <fct> <int> <int> <dbl>
## 1 1 628 10 8808.
## 2 2 789 10 12570.
## 3 3 566 13 11773.
## 4 4 598 10 10939.
clusters_a_mostrar <- unique(top10_champions$cluster)
# Filtrar los datos para solo esos clusters
data_para_facetas <- top10_champions %>%
filter(cluster %in% clusters_a_mostrar)
# Crear el gráfico con facet_wrap
ggplot(data_para_facetas, aes(x = reorder(champion_name, -avg_gold), y = avg_gold, fill = champion_name)) +
geom_bar(stat = "identity", show.legend = FALSE) +
labs(title = "Top 10 Campeones por Cluster",
x = "Campeón",
y = "Oro Promedio") +
facet_wrap(~ cluster, ncol = 2, scales = "free_x") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Although the general profile of Cluster 1 is associated with offensive and balanced players, the most frequent champions in this group —Malphite, Cho’Gath, and Tahm Kench— are tanks with initiation and crowd control capabilities, suggesting a more robust version of the offensive style. These champions stand out for having high average gold, indicating that despite their more defensive or hybrid roles, they manage to accumulate resources efficiently, possibly through consistent farming or objective participation. Their presence may reflect a tendency to play self-sufficiently in solo lanes like TOP, where they can scale well and impact both fights and lane pressure.
Although Cluster 2 is generally associated with active and participatory fighters, the most frequent champions in this group are ADCs, suggesting that this cluster also includes bottom lane players with a highly participatory focus.These champions are known for high mobility, sustained damage output, and strong team fight presence, aligning with the profile of players who actively engage in objectives and assists.The presence of champions like Samira, Twitch, or Ezreal indicates a dynamic and aggressive playstyle, while Jhin or Caitlyn contribute zone control and lane pressure.
Although Cluster 3 is generally associated with tactical and safe mages, the most frequent champions in this group are melee fighters and hybrid assassins, such as Yasuo, Yone, Riven, or Irelia.These champions are characterized by an aggressive yet technical playstyle that requires strong individual mechanics and positioning, which may explain their association with a more “strategic” and “safe” profile.The presence of champions like Garen, Sett, or Darius also suggests a preference for top lane champions with durability and sustained damage.
Although Cluster 4 is generally associated with pure supports and tanks, the most frequent champions in this group are utility and control mages, such as Lux, Karma, Xerath, and Ahri, as well as scaling mages like Veigar and Viktor. These champions typically play poke, zone control, and utility roles, fitting a more passive, strategic, and team-oriented playstyle. Their average gold is moderate, reinforcing the idea that their impact is not measured by resource accumulation, but by their ability to support the team from a distance and maintain map control.
From the intersection of statistical performance and champion choice, four distinct offensive profiles crystallize:
Cluster 1 – Offensive and Balanced Players General Profile: High damage, many kills, and strong economic performance. Aggressive and self-sufficient playstyle. Representative Champions: Malphite, Cho’Gath, Tahm Kench. Interpretation: Although ADCs were expected, tanky champions with offensive capabilities dominate, suggesting a robust and adaptable approach. These players combine durability and damage, excelling in gold efficiency and impact in fights and objectives.
Cluster 2 – Active and Team-Oriented Fighters General Profile: High participation in assists and objectives, with a balance between damage and durability. Representative Champions: Miss Fortune, Vayne, Jhin, Samira, Caitlyn, Twitch, Ezreal, Jinx, Corki, Ashe. Interpretation: Although fighters were expected, this cluster is dominated by ADCs with strong presence in team fights. They represent a collaborative offensive style, with high mobility and constant map pressure.
Cluster 3 – Technical and Self-Sufficient Players General Profile: High damage dealt, but low gold and direct objective involvement. Strategic and positional playstyle. Representative Champions: Irelia, Garen, Sett, Riven, Darius, Mordekaiser, Yone, Yasuo, Aatrox, Jax. Interpretation: Although mages were expected, this group consists of fighters and hybrid assassins. These are mechanically skilled players focused on dueling and lane control, with strong individual carry potential.
Cluster 4 – Tactical and Utility Supports General Profile: High assist participation, but low values in kills, gold, and damage. Team-focused playstyle. Representative Champions: Veigar, Viktor, Syndra, Vladimir, Aurora, Ahri, Helel, Lux, Xerath, Karma. Interpretation: Although tanks were expected, utility and poke mages dominate. These players prioritize zone control, vision, and backline support, playing a fundamental role in team coordination.
These profiles are more than statistical clusters—they are tools for self-awareness and strategic refinement. Players who understand their natural style can adjust their champion pool and in-game decisions to align with their strengths, ultimately increasing their performance and impact.
To sum up, there is no single formula for playing offensively in League of Legends. Some players dominate by sheer damage output, others by mobility and coordination, others by raw mechanical skill or tactical support. Recognizing and embracing your own style is the first step to mastering it. Players can focus on champions and roles that naturally complement their style, thereby maximizing both performance and satisfaction in the game.