## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.0.6     ✓ dplyr   1.0.4
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows

Abstract

League of Legends is a online video game that has been around for over a decade that had a monthly active user base of 115 million people in 2021. People that play the game, pros and casuals alike, claim that the first step to becoming proficient at the game is to practice CS’ing. CS’ing is short for creep score and is a tally of how many minions a player killed in the game. Users can spend hundreds or even thousands of hours playing this game, so I wanted to test the validity of this claim. Do better players kill more minions in a game on average?

\[ H_{0} = There\ is\ no\ correlation\ between\ rank\ and\ creepscore \] \[ H_{a} = There\ is\ a\ correlation\ between\ rank\ and\ creepscore \]

My personal hypothesis is that higher ranking players on average can kill more minions in a game compared to lower ranked players. To test my hypothesis I collected data directly from League of Legends API. I then ran a linear regression model on Rank and Total minions Killed in a game. The result of my linear model showed there is a positive correlation between rank and the resulting function is: \[ Creep Score = 126.9362 + 2.7321 * Rank \] After our results I would advise that a new player practice and become better at CS’ing

Part 1 - Introduction

I have been playing a game named “League of Legends” on and off for aproximetly a decade. During my journey as a player I began to practice and study methods to get better at the game. One of the most common advice I would hear from friends and professional players online was: “Get better at CS’ing”. I won’t get into too much of the mechanics of the game, but CS stand for creep score. Or, in other words, how many creeps a player kills throughout the game. It’s a simple aspect of the game, but it’s important to become proficient at in order to get better at the game. I want to answer the question: Are higher ranked players “better” at CS’ing when compared to lower ranked players.

I will try to answer this question by comparing creep scores from lower ranked players to higher ranked players. I will be using a linear model to see if there is a positive correlation between rank and creep score. The data is actually pulled using League of Legends API which can be found here: https://developer.riotgames.com/apis. Unfortunelty, since I do not have a production API key it took my script quite a while to pull the data necessary for this analysis. Script can be found here: https://github.com/jglendrange/data606/blob/main/leagueAPI.ipynb. This is something I have always been interested in and I’m glad I have the opportunity to test it’s validity.

Part 2 - Data

As I mentioned in the introduction, the data was obtained using the league of legends API. I used a website that generates random matchIds (Each matchId corresponds to a an individual game), and each match returns performance metrics of each player that participated in the match. So, I queried 1,000 matches which gave me data on 10,000 random players.

Now, let’s take a look at the data set I put together. We have 8 columns, but the columns we will be most interested in is rank and totalMinionsKilled. This is what we will use to create the linear model. We may be interested in role and lane after our first attempt at the model, because these can influence the totalMinionKilled in a game. For example when role = “DUO_SUPPORT” we can expect creep score to be low, since they are not focused on killing minions.

head(playerStats) %>%
  kbl() %>%
  kable_styling()
X gameId visionScore goldEarned totalMinionsKilled role lane userId rank
0 3856873255 13 9173 139 SOLO MIDDLE F_7N8wSdjoPdUw1SBn6P2t62PeDArMYt_GmtbU4RJe9rr9k SILVERII
1 3856873255 39 8568 12 NONE JUNGLE TAdsWNVJiqCj0hA5npp4-z5EKXkkBJw7ss5vOhdFldZw-kI GOLDIV
2 3856873255 10 8377 155 SOLO TOP M01izMLGcM-Q5WjIkGooQHDSljWmjY0uIDHyeV-lCO6hEYM GOLDIV
3 3856873255 10 11397 168 DUO_CARRY BOTTOM hQwpywrGvpGKyK0Dcz006NBlzUe4mEewMjEQuPgkgq5JrYk GOLDIV
4 3856873255 37 7035 25 DUO_SUPPORT BOTTOM v1MLAUQfrq1FadXN0cPPbxkpAKxxLsdkcNUmv0kfuOM6TiA GOLDIV
5 3856873255 13 12398 147 SOLO TOP 4DnCOG5yQV5SFo5n59BaBU2gXk-AjGgZUTJflaHOaIvqaXk GOLDIII

Every player has a rank going into the game. They get it after playing a minimum amount of games and getting placed. Ideally, as you get better at the game you rise through the ranks. The highest ranks have very few people, with the highest, challenger, having hundreds of people. There are around 30 ranks. Comparing our distribution to the distribution stated by league of legends we are pretty close. https://www.leagueofgraphs.com/rankings/rank-distribution

playerStats %>%
  group_by(rank) %>%
  summarise(count = n()) %>%
  ggplot(aes(x=reorder(rank, -count), y=count)) + geom_bar(stat="identity") + coord_flip() 

Part 3 - Exploratory data analysis

First let’s take a look at the summary statistics on all of our fields to get a better understanding of our data.

summary(playerStats)
##        X             gameId           visionScore       goldEarned   
##  Min.   :    0   Min.   :3.857e+09   Min.   :  0.00   Min.   :  667  
##  1st Qu.: 3137   1st Qu.:3.857e+09   1st Qu.: 12.00   1st Qu.: 8011  
##  Median : 6274   Median :3.857e+09   Median : 19.00   Median :10326  
##  Mean   : 6274   Mean   :3.857e+09   Mean   : 23.62   Mean   :10554  
##  3rd Qu.: 9412   3rd Qu.:3.857e+09   3rd Qu.: 29.00   3rd Qu.:12821  
##  Max.   :12549   Max.   :3.857e+09   Max.   :171.00   Max.   :38844  
##  totalMinionsKilled     role               lane              userId         
##  Min.   :  0.0      Length:12550       Length:12550       Length:12550      
##  1st Qu.: 35.0      Class :character   Class :character   Class :character  
##  Median :112.0      Mode  :character   Mode  :character   Mode  :character  
##  Mean   :105.3                                                              
##  3rd Qu.:162.0                                                              
##  Max.   :403.0                                                              
##      rank          
##  Length:12550      
##  Class :character  
##  Mode  :character  
##                    
##                    
## 

Now let’s look at our target variable totalMinionsKilled. This is really interesting. Plotting totalMinionsKilled gives us a bimodal distribution. I suspect the lower peak is from support roles and jungle roles that are not killing as many creeps throughout the game. We will likely have to separate the roles.

hist(playerStats$totalMinionsKilled)

Let’s take a look at the average totalMinionsKilled by lane and role. My suspicions were correct, DUO_SUPPORT and NONE have low averages.

playerStats %>%
  group_by(lane, role) %>%
  summarise(avg = mean(totalMinionsKilled),
            med = median(totalMinionsKilled)) %>%
  kbl() %>%
  kable_styling()
## `summarise()` has grouped output by 'lane'. You can override using the `.groups` argument.
lane role avg med
BOTTOM DUO 109.80000 94.0
BOTTOM DUO_CARRY 164.94041 163.0
BOTTOM DUO_SUPPORT 34.04340 32.0
BOTTOM SOLO 109.20253 111.5
JUNGLE NONE 46.03748 34.0
MIDDLE DUO 158.50820 155.0
MIDDLE DUO_CARRY 159.62000 152.5
MIDDLE DUO_SUPPORT 47.48077 37.0
MIDDLE SOLO 153.28166 151.0
NONE DUO 50.22530 14.0
NONE DUO_SUPPORT 60.41154 61.0
TOP DUO 168.65972 165.0
TOP DUO_CARRY 160.08571 156.0
TOP DUO_SUPPORT 52.04762 36.0
TOP SOLO 161.00209 158.0
ggplot(data = playerStats, aes(x=totalMinionsKilled, y=role)) + geom_boxplot()

Let’s see how the histogram changes when I remove the 2 roles. It turns into a pretty normal distribution, which is a good sign for our analysis.

playerStats %>%
  filter(role != "DUO_SUPPORT" & role != "NONE") %>%
  ggplot(aes(x = totalMinionsKilled)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Here I break the distibution out by rank to get a visual feel for how these distributions look. You can see that some ranks are have a greater population. For now this will have to do because it will be difficult to make each section equal.

playerStats %>%
  filter(role != "DUO_SUPPORT" & role != "NONE") %>%
  ggplot(aes(x = totalMinionsKilled)) + geom_histogram() + 
  facet_wrap(vars(rank))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

This section of code is just converting the ranks into numeric form 1-26.

playerStats$rank_num1 <- ifelse(grepl("IRON", playerStats$rank),0,
                               ifelse(grepl("BRONZE", playerStats$rank),4,
                               ifelse(grepl("SILVER",playerStats$rank),8,
                               ifelse(grepl("GOLD",playerStats$rank),12,
                               ifelse(grepl("PLATINUM",playerStats$rank),16,
                               ifelse(grepl("DIAMOND",playerStats$rank),20,
                               ifelse(grepl("MASTER",playerStats$rank),24,
                              ifelse(grepl("GRANDMASTER",playerStats$rank),25,
                               ifelse(grepl("CHALLENGER",playerStats$rank),26,0
                                      )))))))))

playerStats$rank_num2 <- ifelse(grepl("III$",playerStats$rank), 1, 
                                ifelse(grepl("II$",playerStats$rank),2,
                                ifelse(grepl("MASTER",playerStats$rank) | grepl("CHALLENGER",playerStats$rank) ,0,
                                ifelse(grepl("I$",playerStats$rank),3,
                                ifelse(grepl("IV$",playerStats$rank),0,0
                          )))))

playerStats$rank_num <- playerStats$rank_num1 + playerStats$rank_num2 

The first view below looks at totalMinionsKilled vs rank_num for our entire dataset. As we could expect from the previous graphs I think removing some of the lanes will help our model greatly.

ggplot(data=playerStats,aes(x=rank_num,y=totalMinionsKilled)) + geom_point() + geom_jitter()

Filtering our set down makes it look like there is a clear trend: The higher the rank the more likely the player is to have a higher on average creepscore in a game.

laners <- playerStats %>%
  filter(role != 'DUO_SUPPORT' & role != 'NONE' & rank != "INVALID" & totalMinionsKilled > 5)

ggplot(data=laners,aes(x=rank_num,y=totalMinionsKilled)) + geom_point() + geom_jitter() + geom_smooth(method = "lm")
## `geom_smooth()` using formula 'y ~ x'

Just to be sure we can check the correlation and see it’s positive between the two variables.

cor(laners$rank_num,laners$totalMinionsKilled)
## [1] 0.23255
ggplot(data=playerStats,aes(x=rank_num,y=totalMinionsKilled)) + geom_point() + geom_jitter() + facet_wrap(vars(role)) + geom_smooth(method = "lm")
## `geom_smooth()` using formula 'y ~ x'

Part 4 - Inference

Now let’s plug our dataset into a linear regression model and see what function is returned. Note the Pr(>t) value. This corresponds to the probability of observing any value equal or larger than t. Since we have a extremely small p-value it means there is almost no chance we observed this relationship due to chance. Since our Pr(>t) is so low we can conclude that the results are statistically significant.

\[ Creep Score = 126.9362 + 2.7321 * Rank \]

minion_model <- lm (totalMinionsKilled ~ rank_num, laners)
summary(minion_model)
## 
## Call:
## lm(formula = totalMinionsKilled ~ rank_num, data = laners)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -161.649  -27.721   -1.721   27.671  248.743 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 126.9362     1.6138   78.66   <2e-16 ***
## rank_num      2.7321     0.1391   19.64   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 46.02 on 6748 degrees of freedom
## Multiple R-squared:  0.05408,    Adjusted R-squared:  0.05394 
## F-statistic: 385.8 on 1 and 6748 DF,  p-value: < 2.2e-16

Now that we have already created our model lets make sure a inear regression fits this data set by checking each condition:

Linearity: Our view shows a positive linear trend. This condition is met. Nearly Normal Residuals: The QQ plot and histogram below show that the residuals are almost a normal distribution. This condition is met. Constant Variability: The 3rd plot shows there is no noticeable trend and the variability is constant. This condition is met. Independent Observations: The site I pulled the matchIds from claim its random. The rank distribution is fairly close to the actual population. This condition is met.

All of the conditions for a least squares line has been met. A linear model is perfect to describe this correlation.

qqnorm(minion_model$residuals)
qqline(minion_model$residuals)

hist(minion_model$residuals)

plot(x=minion_model$residuals)

Part 5 - Conclusion

In conclusion, there is a positive relationship between rank and creep score, and our Pr(>t) value means our results are significant. There was almost 0 chance these results were observed by chance, which means we can reject the null hypothesis that there is no correlation between the 2 variables.

Furthermore, after evaluating the residuals we confirm that all the necessary conditions are met and a linear regression model fits this data set. With these result I would feel comfortable telling a beginer getting into the game to invest time into improving this skill.

References

https://developer.riotgames.com/apis