objective 1

The dataset is first read, and variables are selected based on their direct influence on a player’s performance in League of Legends, allowing for the measurement of different aspects of success:

champion_mastery_points & champion_mastery_level: These reflect a player's knowledge and experience with a particular champion. Higher mastery indicates a better understanding of its mechanics, which often translates into more consistent performance.

flex_tier, flex_wins & flex_losses: Matches in the flexible mode allow for the evaluation of skill in a variable team environment. A high number of victories in this mode indicates adaptability and teamwork.

solo_tier, solo_wins & solo_losses: Solo ranked matches serve as a key metric for assessing individual player skill. A high rank in this mode demonstrates strategic and mechanical ability to influence match outcomes without fully relying on the team.

vision_score: It is essential for evaluating map control and decision-making. A high vision score indicates effective placement and denial of vision, which significantly contributes to victory.

gold_earned: It indicates efficiency in resource accumulation. Players who generate more gold tend to have a greater impact, as they can acquire items more quickly, enhancing their performance in the match.

win: The most direct metric of success: winning matches. Analyzing this variable in combination with others helps identify the factors that contribute to a higher win rate.

summoner_level: Although not always a determinant of skill level, it reflects the player's overall experience in the game and may indicate extended playtime.

difficulty: This variable may be related to the difficulty of the champions used or the level of the opponents faced. It serves as an important measure for assessing how challenging the matches have been.

lol_dataset <- read.csv("D:/UPV/2º/Proyecto II/lol_dataset.csv")
columns <- c("champion_mastery_points", "champion_mastery_level", "flex_tier", "flex_wins", "flex_losses", "solo_tier", "solo_wins", "solo_losses", "vision_score", "gold_earned", "win", "summoner_level", "difficulty")  

data <- lol_dataset[, columns]

This code snippet calculates key metrics related to player performance in League of Legends and selects variables for further analysis, specifically for clustering players based on their characteristics.

First, two new variables are created to represent the win ratio in two game modes: solo ranked and flexible ranked. This allows for measuring each player’s efficiency in each type of match, based on the number of games won relative to the total played.

Next, certain variables relevant for analyzing similarities between players are selected, such as champion mastery level, total in-game experience, and win rates in both modes. A new dataset is then created using only these variables, with the aim of conducting a more focused study, possibly clustering players with similar characteristics using a clustering algorithm.

data$solo_ratio <- data$solo_wins / (data$solo_wins + data$solo_losses)
data$flex_ratio <- data$flex_wins / (data$flex_wins + data$flex_losses)

v_clust <- c( "champion_mastery_points", "champion_mastery_level", "flex_ratio", "solo_ratio", "summoner_level")

df_clust <- data[, v_clust]

df_scaled <- data.frame(scale(df_clust))

p1 = fviz_nbclust(x = df_scaled, FUNcluster = kmeans, method = "silhouette", 
             k.max = 10, verbose = FALSE) + labs(title = "K-means")
p2 = fviz_nbclust(x = df_scaled, FUNcluster = kmeans, method = "wss", 
             k.max = 10, verbose = FALSE) + labs(title = "K-means")
grid.arrange(p1, p2, nrow = 1)

Although the silhouette method advises us to choose two clusters, we decided to choose three because we wanted to know why it didn’t advise us to choose more than two and to know what type of players made up the third cluster.

This code is preparing the data for applying Principal Component Analysis (PCA) followed by clustering using K-means. The reasoning behind each step is as follows:

Data normalization: The selected variables are scaled so that all have equal importance in the analysis, preventing those with larger values from dominating the results.

Library loadings: Essential packages such as dplyr (for data manipulation), ggplot2 (for visualization), cluster, and factoextra (for clustering and PCA analysis) are loaded.

Principal Component Analysis (PCA): PCA is applied to reduce the dimensionality of the data and capture the most relevant information in just a few components.

Explained variance visualization: A plot is generated to show how much information is retained by each principal component, helping to determine how many to use.

Component selection: The first two principal components are chosen, as they typically capture the majority of the variability in the data.

Application of K-means on principal components: Players are grouped into three clusters using the K-means algorithm, allowing for the identification of similarity patterns.

Cluster assignment to the data: A cluster label is added to each player, identifying the group they belong to based on their performance.

Cluster visualization: A plot is created to show the distribution of players based on the first two principal components, with colors representing different groups.

pca <- prcomp(df_scaled, center = TRUE, scale. = TRUE)

#  Visualize the explained variance
fviz_eig(pca)

#  Obtain the principal components
pca_data <- as.data.frame(pca$x[, 1:2]) # Tomamos los dos primeros componentes

#  Apply k-means to the principal components
set.seed(123)
kmeans_result <- kmeans(pca_data, centers = 3, nstart = 25)

#  Add the clusters to the PCA dataset
pca_data$cluster <- as.factor(kmeans_result$cluster)

# Visualize the clusters
ggplot(pca_data, aes(x = PC1, y = PC2, color = cluster)) +geom_point(size = 3) +
 labs(title = "Clustering con PCA + K-means") +
 theme_minimal()

When PCA analysis and K-means clustering are applied to League of Legends player data, interesting conclusions can be drawn about their performance and play styles.

Player differentiation based on performance: Through the principal components, players can be grouped into different profiles. Some clusters may represent players with high champion knowledge, while others may focus on ranked match efficiency.

Influence of champion mastery: Players with higher mastery levels may form a distinct group in the results. This suggests that deep knowledge of a champion impacts the win rate.

Patterns in win rates: Win ratios in soloQ and flexQ can help differentiate players who excel in individual strategies from those who rely more on team coordination.

Impact of game experience: The summoner level could be related to the generated clusters, indicating that more experienced players tend to have more stable performance.

Segmentation of play style: The visual distribution of the clusters in the plot provides clues about different play styles: players who prioritize strategic vision, gold efficiency, or mechanical performance.

K-means clustering is designed to group League of Legends players into three clusters based on their performance characteristics. A breakdown of what is happening:

Player grouping: The K-means algorithm is applied to divide the data into three groups, each containing players with similar performance patterns.

Cluster mean calculation: The average of each variable is obtained within each group, allowing key differences between clusters to be identified. For example, one cluster might represent players with high champion mastery, while another might group players with a high win rate.

Cluster interpretation: By analyzing the means, it is possible to determine which characteristics predominate in each group. This helps to understand which factors differentiate players and how they relate to success in the game.

p1 = fviz_nbclust(x = df_scaled, FUNcluster = hcut, method = "silhouette", 
                  hc_method = "ward.D2", k.max = 10, verbose = FALSE, hc_metric = "manhattan")
p2 = fviz_nbclust(x = df_scaled, FUNcluster = hcut, method = "wss", 
                  hc_method = "ward.D2", k.max = 10, verbose = FALSE, hc_metric = "manhattan")
grid.arrange(p1, p2, nrow = 1)

Here, the Manhattan distance is used, which measures the sum of absolute differences between observations. Instead of penalizing large deviations like the Euclidean distance, it treats each difference equally. Manhattan distance is ideal when changes between variables are independent and each unit of change is equally important. In this case, ‘performance’, ‘usage frequency’, ‘wins’, ‘deaths’, ‘assists’, and ‘roles’ are dimensions of the player. Manhattan distance treats each difference (for example, between the number of matches played with champion A vs. B) as cumulative, rather than as a global distance.

set.seed(111)
distance <- dist(df_scaled, method = "manhattan")

# Apply hierarchical clustering
hc <- hclust(distance, method = "ward.D2")

# Dendrogram
plot(hc, labels = FALSE, main = "Dendrogram - Hierarchical Clustering")

# Cut the tree into k clusters
clusters <- cutree(hc, k = 3)

colores = pal_npg("nrc")(6)
colores2 = pal_npg("nrc")(7)
par(mfrow = c(1,3))
plot(silhouette(clusters, distance), col=colores, border=NA, main = "WARD")
plot(silhouette(kmeans_result$cluster, distance), col=colores, border=NA, main = "K-MEDIAS")

The K-Means method achieved greater overall clustering coherence with an average silhouette of 0.18, while WARD obtained 0.15. This suggests that K-Means provides better separation between clusters.

df_scaled$cluster <- as.factor(kmeans_result$cluster)

means_cluster <- df_scaled %>%
 group_by(cluster) %>%
summarise(across(where(is.numeric), mean, na.rm = TRUE))

## Warning: There was 1 warning in `summarise()`.
## ℹ In argument: `across(where(is.numeric), mean, na.rm = TRUE)`.
## ℹ In group 1: `cluster = 1`.
## Caused by warning:
## ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
## Supply arguments directly to `.fns` through an anonymous function instead.
## 
##   # Previously
##   across(a:b, mean, na.rm = TRUE)
## 
##   # Now
##   across(a:b, \(x) mean(x, na.rm = TRUE))

print(means_cluster)

## # A tibble: 3 × 6
##   cluster champion_mastery_points champion_mastery_level flex_ratio solo_ratio
##   <fct>                     <dbl>                  <dbl>      <dbl>      <dbl>
## 1 1                        -0.162                 -0.164     0.570      0.551 
## 2 2                         3.04                   3.06     -0.0718     0.0244
## 3 3                        -0.229                 -0.229    -0.586     -0.579 
## # ℹ 1 more variable: summoner_level <dbl>

df_plot <- melt(means_cluster, id.vars = "cluster")
ggplot(df_plot, aes(x = variable, y = value, fill = cluster)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Cluster Profile", x = "Variables", y = "Mean") +
  theme_minimal()

Conclusions based on the clusters obtained from the analysis of League of Legends players:

Cluster 1: Players with High Specialization and Mastery

 Values close to the average in champion_mastery_points and champion_mastery_level, indicating a moderate knowledge of the champions.

High flex_ratio and solo_ratio, suggesting strong performance in flexible and solo queue matches.

Average summoner_level, indicating an intermediate level of experience.

 Conclusion: These players excel in flex queue matches, likely because their playstyle is better suited to team cooperation. However, their performance in solo queue is less consistent.

Cluster 2: Balanced Ranked Players

 High values in champion_mastery_points and champion_mastery_level, indicating deep knowledge of specific champions.

Flex_ratio and solo_ratio close to the average, suggesting moderately good performance in both modes.

Above-average summoner_level, indicating that these players are experienced and have been playing for a long time.

Conclusion: This group represents experienced players who excel with certain champions and possess a solid gameplay foundation, although their performance in ranked matches does not stand out significantly.

Cluster 3: Players with Lower Specialization and Underperformance

Low values in champion_mastery_points and champion_mastery_level, indicating less specialization in champions.

Low flex_ratio and solo_ratio, suggesting below-average performance in both game modes.

Lower summoner_level, indicating that these players have less experience in the game.

Conclusion: This group represents players who are still developing their skills. They may be new to the game or have yet to find an optimal strategy to improve their performance in ranked matches.

This code performs a series of steps to analyze and visualize the gold_earned variable across different clusters:

Data normalization: The scale() function is used to standardize the gold_earned variable, ensuring that the values have a mean of 0 and a standard deviation of 1. This helps compare the data more effectively.

Success rate calculation by cluster: The data is grouped by cluster, and the mean of gold_earned is calculated, providing a measure of the average economic performance of each group. Additionally, the number of observations in each cluster is counted.

Bar chart visualization: The ggplot2 package is used to generate a bar chart where:

The X-axis represents the different clusters.

The Y-axis shows the average success rate (gold_earned).

Each bar is filled with a different color according to the cluster.

Numeric labels are added above the bars to enhance interpretation.

The Y-axis scale is adjusted using scale_y_continuous(expand = c(0, 0)) to improve visualization.

Overall, this code allows for evaluating how economic performance varies across clusters and facilitates visual comparison between them.

# Normalize the variable 'gold_earned'
df_scaled$gold_earned <- scale(data$gold_earned)

# Calculate win rate by cluster
cluster_success <- df_scaled %>%
  group_by(cluster) %>%
  summarise(
    success_rate = mean(gold_earned, na.rm = TRUE),
    count = n()
  )

# Improved graphic
ggplot(cluster_success, aes(x = factor(cluster), y = success_rate, fill = factor(cluster))) +
  geom_bar(stat = "identity") +
  geom_text(aes(label = round(success_rate, 2)), vjust = -0.5, size = 5) +
  scale_y_continuous(expand = c(0, 0)) +
  scale_fill_viridis(discrete = TRUE) + 
  labs(title = "Money rate per cluster", x = "Cluster", y = "Money rate (gold_earned)") +
  theme_minimal()

Cluster 1 stands out in terms of gold performance: It is the only one with a positive rate of approximately 0.004, indicating that the players or elements in this group generate more revenue than those in the other clusters.

Clusters 2 and 3 have negative rates: Both exhibit lower financial performance, with values close to -0.01. This suggests that these groups may be facing challenges that affect their profitability, such as less efficient strategies or poor resource utilization.

Practical implications: If the goal of the analysis is to improve economic performance, understanding what characterizes Cluster 1—whether player habits, strategies employed, or available resources—can provide valuable information for optimizing the performance of the other clusters.

This code performs a detailed analysis of the clusters based on the variable gold_earned and other relevant metrics. Here is the breakdown of what it does:

Cluster success rate calculation: The data is grouped by cluster, and the mean of gold_earned is calculated, providing a measure of the average economic performance of each group.

Boxplot Visualization: A boxplot is generated to observe the distribution of gold_earned in each cluster, allowing for the identification of differences in dispersion and potential outliers.

ANOVA for Statistical Differences: An analysis of variance (ANOVA) is run to determine if there are significant differences in gold_earned between clusters.

Post Hoc (Tukey) Test: If the ANOVA is significant, a Tukey test is performed to identify which clusters present significant differences.

Nonparametric (Kruskal-Wallis) Test: If the data do not meet the ANOVA assumptions, a nonparametric test is run to verify differences between groups.

Correlation Calculation: The correlation between gold_earned and other key variables such as vision_score, champion_mastery_points, summoner_level, among others, is calculated.

Cluster to Numeric Conversion: The cluster variable is transformed into numeric format for inclusion in the correlation analysis.

Correlation matrix visualization: A correlation graph is generated using corrplot, allowing you to visually identify relationships between variables.

This analysis provides a comprehensive view of cluster performance and helps identify key factors influencing success.

df_scaled$vision_score <- scale(data$vision_score)

# Calcular tasa de éxito (gold_earned promedio) por cluster
exito_por_cluster <- df_scaled %>%
  group_by(cluster) %>%
  summarise(
    tasa_exito = mean(gold_earned, na.rm = TRUE),
    conteo = n()
  )

# Visualización con boxplot para ver distribución de ganancias por cluster
ggplot(df_scaled, aes(x = factor(cluster), y = gold_earned, fill = factor(cluster))) +
  geom_boxplot() +
  scale_fill_brewer(palette = "Set1") +
  labs(title = "Distribución de ganancias por Cluster", x = "Cluster", y = "Gold Earned") +
  theme_minimal()

# ANOVA para ver diferencias estadísticas entre clusters
anova_model <- aov(gold_earned ~ cluster, data = df_scaled)
summary(anova_model)

##               Df Sum Sq Mean Sq F value Pr(>F)
## cluster        2      3  1.5200    1.52  0.219
## Residuals   5936   5935  0.9998

# Si el ANOVA es significativo, realizar prueba post hoc (Tukey)
TukeyHSD(anova_model)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = gold_earned ~ cluster, data = df_scaled)
## 
## $cluster
##            diff        lwr        upr     p adj
## 2-1 -0.02345462 -0.1548866 0.10797734 0.9080515
## 3-1 -0.04668822 -0.1094588 0.01608237 0.1891118
## 3-2 -0.02323359 -0.1550009 0.10853367 0.9101330

# Prueba no paramétrica en caso de no cumplir supuestos de ANOVA
kruskal.test(gold_earned ~ cluster, data = df_scaled)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  gold_earned by cluster
## Kruskal-Wallis chi-squared = 1.3856, df = 2, p-value = 0.5002

# Calcular correlaciones con otras variables relevantes
cor(df_scaled[, c("gold_earned", "vision_score", "champion_mastery_points", "champion_mastery_level", "flex_ratio", "solo_ratio", "summoner_level")], use = "complete.obs")

##                          gold_earned vision_score champion_mastery_points
## gold_earned              1.000000000  0.010844962             0.006137293
## vision_score             0.010844962  1.000000000             0.047099943
## champion_mastery_points  0.006137293  0.047099943             1.000000000
## champion_mastery_level   0.008360617  0.044311642             0.995674736
## flex_ratio               0.054664864 -0.009733826            -0.005752769
## solo_ratio               0.030502416  0.004483129             0.025506460
## summoner_level          -0.024058422  0.083432545             0.225686285
##                         champion_mastery_level   flex_ratio  solo_ratio
## gold_earned                        0.008360617  0.054664864 0.030502416
## vision_score                       0.044311642 -0.009733826 0.004483129
## champion_mastery_points            0.995674736 -0.005752769 0.025506460
## champion_mastery_level             1.000000000 -0.002589294 0.026231911
## flex_ratio                        -0.002589294  1.000000000 0.073982441
## solo_ratio                         0.026231911  0.073982441 1.000000000
## summoner_level                     0.228787688 -0.004678797 0.036148439
##                         summoner_level
## gold_earned               -0.024058422
## vision_score               0.083432545
## champion_mastery_points    0.225686285
## champion_mastery_level     0.228787688
## flex_ratio                -0.004678797
## solo_ratio                 0.036148439
## summoner_level             1.000000000

df_scaled$cluster <- as.numeric(as.factor(df_scaled$cluster))

# Calcular la matriz de correlación
correlaciones <- cor(df_scaled, use = "complete.obs")

# Visualización con 'corrplot'
corrplot(correlaciones, method = "color", type = "upper", tl.col = "black", tl.srt = 45)

General Conclusion on the Distribution of Gold Earned by Cluster: Similar Distributions

The three clusters have a fairly similar distribution in terms of Gold Earned, both in terms of the median and the interquartile range (IQR).

This suggests that no group clearly stands out in terms of average gold earnings during matches.

Outliers in the Three Groups

All clusters have positive outliers, indicating that there are players in each group who earn significantly more gold than average.

There are also some negative outliers, but they are less frequent.

Cluster 3 Appears to Have a Slight Advantage

Although the differences are subtle, Cluster 3 (green) shows a slight tendency to have greater positive dispersion (high values), which could indicate that some players in this group are able to generate gold more efficiently.

This code generates a correlation heatmap using ggplot2 and reshape2. Here’s the breakdown of what it does:

Converting the correlation matrix: Melt(correlations) is used to transform the correlation matrix into a long form, suitable for visualization with ggplot2.

Creating the heatmap:

ggplot(cor_data, aes(x = Var1, y = Var2, fill = value)) defines the X and Y axes with the variables, and uses

Using a color scale:

scale_fill_gradient2(low = “blue”, mid = “white”, high = “red”, midpoint = 0) assigns colors based on the correlation value:

Blue for negative correlations.

White for values close to 0 (no correlation).

Red for positive correlations.

Applying a clean design:

theme_minimal() uses a minimalist style to improve readability.

labs(title = “Correlation Heatmap”, x = ““, y =”“) adds a title without labels on the axes for a clearer presentation.

This chart will allow you to intuitively visualize which variables are most closely related to each other.

library(ggplot2)
library(reshape2)

# Convertir matriz de correlaciones en dataframe
cor_data <- melt(correlaciones)

# Heatmap con ggplot2
ggplot(cor_data, aes(x = Var1, y = Var2, fill = value)) +
  geom_tile() +
  scale_fill_gradient2(low = "blue", mid = "white", high = "red", midpoint = 0) +
  theme_minimal() +
  labs(title = "Mapa de calor de correlaciones", x = "", y = "")

High Correlation between Champion Mastery Level and Champion Mastery Points

Correlation very close to 1.0 (deep red).

This indicates that the higher a champion’s mastery level, the more mastery points they will have. This relationship is expected, as both indicators measure the player’s experience with a specific champion.

Moderate Positive Correlations

Summoner Level has a moderate correlation with Champion Mastery Level and Champion Mastery Points (light reddish tones).

This suggests that players with higher summoner levels tend to have more accumulated experience with champions.

Flex Ratio and Solo Ratio are not highly correlated with other variables.

This is important to keep in mind, as they are variables that measure a player’s performance throughout their career, and only one cluster has high values for them.

Vision Score and Gold Earned have a low correlation with most variables.

This could imply that performance in terms of vision and gold is not strongly determined by experience (mastery or level), but by the player’s role or play style.

This code creates and visualizes a decision tree model with the goal of predicting the cluster variable based on the other characteristics of the dataset. Here is the breakdown of what it does:

Loading Required Packages:

rpart: Library used to create decision tree models in R.

rpart.plot: Package that allows you to visualize decision trees in a clear and structured way.

Installing the rpart.plot Package: Ensures the package is installed to use its visualization features.

Building the Decision Tree Model:

rpart(cluster ~ ., data = df_scaled, method = “class”) generates a classification model where cluster is the target variable.

The model learns patterns and rules from the other variables in df_scaled to predict cluster assignment.

Tree visualization:

rpart.plot(model_tree) displays the decision tree graphically, allowing you to interpret the generated rules and the most important factors in the classification.

library(rpart)
library(rpart.plot)

# Crear modelo de árbol de decisión
modelo_arbol <- rpart(kmeans_result$cluster ~ ., data = df_scaled, method = "class")

# Visualizar el árbol
rpart.plot(modelo_arbol)

The decision tree graph provides a clear view of how players have been classified into the three clusters based on key variables. A more in-depth analysis is provided below: Cluster 1: Specialization and Consistency Key Variables: Champion Mastery Points & Champion Mastery Level → Indicate a high level of knowledge of specific champions.

Solo Ratio & Flex Ratio → Performance differences between solo and team modes.

Behavior and Roleplay: This group consists of players who have developed a clear specialization in certain champions, accumulating high levels of mastery.

Advantages: These players often demonstrate greater mastery of specific mechanics, ensuring more consistent performance with their favorite champions.

Limitations: Their summoner level (summoner_level) is relatively low or moderate, indicating that they may not have as much experience in the game overall.

Conclusion: Highly competent players with their preferred champions, capable of excelling in games based on their specific knowledge, even if they are not yet veterans in general terms.

Cluster 2: Intermediate and Varied Group Key Variables: There is no single variable that stands out, but rather a balanced combination of Champion Mastery Points, Summoner Level, Flex Ratio, and Solo Ratio.

Behavior and Interpretation: This cluster groups the largest number of players, showing a more heterogeneous performance without a clearly defined pattern.

Advantages: They possess moderate characteristics in almost all metrics, suggesting versatile players.

Limitations: They do not have a decisive advantage in any of the variables analyzed, making them less specialized or strategically outstanding.

Conclusion: Group representing players with average performance, without a clear inclination toward extreme specialization or poor performance.

Cluster 3: Emerging and Talented Players Key Variables: Low Champion Mastery Points → Little accumulated experience with champions.

High Flex Ratio → Excellent performance in flex games.

Low Summoner Level → Indicates that they are relatively new players or have secondary accounts (smurfs).

Behavior and Roleplay: Although these players have been in the game for a short time, their flex stats indicate great performance.

Advantages: Their ability to excel in flex may be due to previous skills in other games, quick learning, or efficient team strategies.

Limitations: Because they have less overall experience in the game, they may face a steeper learning curve in other modes.

Conclusion: These players progress quickly and are talented in flex games, suggesting that, despite their lesser experience, they have the potential to become strong players in the future.

library(ggplot2)
library(dplyr)

# Convert 'cyl' (number of cylinders) to a categorical variable
df_scaled$difficulty <- as.factor(data$difficulty)

# Filtrar el dataset para el cluster 3
data_cluster <- df_scaled[df_scaled$cluster == "3", ]

# Create a bar plot
ggplot(data_cluster, aes(x = difficulty)) +
 geom_bar(fill = "skyblue") +
labs(title = "Complejidad de campeones del cluster 3",
 x = "Complejidad",
 y = "Cantidad") +
theme_minimal()

# Convert 'cyl' (number of cylinders) to a categorical variable
df_scaled$difficulty <- as.factor(data$difficulty)

# Filtrar el dataset para el cluster 3
data_cluster <- df_scaled[df_scaled$cluster == "2", ]

# Create a bar plot
ggplot(data_cluster, aes(x = difficulty)) +
 geom_bar(fill = "skyblue") +
labs(title = "Complejidad de campeones del cluster 2",
 x = "Complejidad",
 y = "Cantidad") +
theme_minimal()

# Convert 'cyl' (number of cylinders) to a categorical variable
df_scaled$difficulty <- as.factor(data$difficulty)

# Filtrar el dataset para el cluster 3
data_cluster <- df_scaled[df_scaled$cluster == "1", ]

# Create a bar plot
ggplot(data_cluster, aes(x = difficulty)) +
 geom_bar(fill = "skyblue") +
labs(title = "Complejidad de campeones del cluster 1",
 x = "Complejidad",
 y = "Cantidad") +
theme_minimal()

GENERAL CONCLUSION OF THE OBJECTIVE:

As we have seen in this analysis, three clusters have formed. Summarizing the previous description, Cluster 3 is comprised of low-level, inexperienced players with generally low stats. This cluster is the least relevant of the three and contributes little beyond the division of novice players. Focusing on Clusters 1 and 2, both are high-level players and differ primarily in the amount of champion mastery, with Cluster 2 being the highest-level. The cluster division has been further explored in the last tree.

Based on this differentiation, the following question was posed: Is it more cost-effective, in terms of game performance and career, to master one character or divide one’s time to learn the mechanics of several? League of Legends is a highly complex game, with over 160 characters and over 200 items, allowing you to choose any six of them each match, resulting in billions of possible combinations. A game where even the smallest variation is important. Each champion is unique, and knowing how to use them all is virtually impossible. Many players choose to dedicate themselves entirely to one champion, gaining greater mastery with their champion; others prefer to learn to use several, learning to control them to a lesser extent.

To address this issue, we sought to determine which of the clusters, 1 with low mastery levels or 2 with high levels, performed better in matches. Using gold as a measure of individual performance in the match, we found that cluster 1 had the highest average gold and better performance, although its superiority was quite low. We subsequently tested whether any variables correlated with gold, but found no evidence of this. However, a slight positive correlation was observed between flex_ratio and gold_earned.

By definition, flex_ratio measures the number of wins in flex mode relative to the number of games played, meaning it acts as a measure of success throughout a player’s career in the game. It makes sense that there is a slight correlation between gold_earned and flex_ratio, as players who excel in flex mode may have a better understanding of team strategy, which is very important in the game. Another similar variable by definition is solo_ratio, although this refers to solo mode, where the player’s individual skill takes precedence over team strategy, unlike in flex.

Focusing on clusters 1 and 2, of the two ratios mentioned above, both flex_ratio and solo_ratio stand out in the first cluster. This means that they have performed much better than the rest of the clusters throughout their career. Furthermore, the amount of gold earned by individuals in the first cluster is significantly higher than in the rest. This is a sign that players belonging to this cluster perform better in the game. Finally, it’s worth noting that Cluster 1 differs from Cluster 2 primarily in their lower champion mastery levels, meaning they vary more in their strategies and character selection.

In conclusion, players with greater diversity in their character selection perform better in their games and have better career trajectories. This skill diversification may be the reason for their success in the video game, as by knowing more game variants, they better understand how to counter their opponent’s strategies and skills.

objective 1

Alexa

2025-05-15