## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.0.6 ✓ dplyr 1.0.4
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
League of Legends is a online video game that has been around for over a decade that had a monthly active user base of 115 million people in 2021. People that play the game, pros and casuals alike, claim that the first step to becoming proficient at the game is to practice CS’ing. CS’ing is short for creep score and is a tally of how many minions a player killed in the game. Users can spend hundreds or even thousands of hours playing this game, so I wanted to test the validity of this claim. Do better players kill more minions in a game on average?
\[ H_{0} = There\ is\ no\ correlation\ between\ rank\ and\ creepscore \] \[ H_{a} = There\ is\ a\ correlation\ between\ rank\ and\ creepscore \]
My personal hypothesis is that higher ranking players on average can kill more minions in a game compared to lower ranked players. To test my hypothesis I collected data directly from League of Legends API. I then ran a linear regression model on Rank and Total minions Killed in a game. The result of my linear model showed there is a positive correlation between rank and the resulting function is: \[ Creep Score = 126.9362 + 2.7321 * Rank \] After our results I would advise that a new player practice and become better at CS’ing
I have been playing a game named “League of Legends” on and off for aproximetly a decade. During my journey as a player I began to practice and study methods to get better at the game. One of the most common advice I would hear from friends and professional players online was: “Get better at CS’ing”. I won’t get into too much of the mechanics of the game, but CS stand for creep score. Or, in other words, how many creeps a player kills throughout the game. It’s a simple aspect of the game, but it’s important to become proficient at in order to get better at the game. I want to answer the question: Are higher ranked players “better” at CS’ing when compared to lower ranked players.
I will try to answer this question by comparing creep scores from lower ranked players to higher ranked players. I will be using a linear model to see if there is a positive correlation between rank and creep score. The data is actually pulled using League of Legends API which can be found here: https://developer.riotgames.com/apis. Unfortunelty, since I do not have a production API key it took my script quite a while to pull the data necessary for this analysis. Script can be found here: https://github.com/jglendrange/data606/blob/main/leagueAPI.ipynb. This is something I have always been interested in and I’m glad I have the opportunity to test it’s validity.
As I mentioned in the introduction, the data was obtained using the league of legends API. I used a website that generates random matchIds (Each matchId corresponds to a an individual game), and each match returns performance metrics of each player that participated in the match. So, I queried 1,000 matches which gave me data on 10,000 random players.
Now, let’s take a look at the data set I put together. We have 8 columns, but the columns we will be most interested in is rank and totalMinionsKilled. This is what we will use to create the linear model. We may be interested in role and lane after our first attempt at the model, because these can influence the totalMinionKilled in a game. For example when role = “DUO_SUPPORT” we can expect creep score to be low, since they are not focused on killing minions.
head(playerStats) %>%
kbl() %>%
kable_styling()
| X | gameId | visionScore | goldEarned | totalMinionsKilled | role | lane | userId | rank |
|---|---|---|---|---|---|---|---|---|
| 0 | 3856873255 | 13 | 9173 | 139 | SOLO | MIDDLE | F_7N8wSdjoPdUw1SBn6P2t62PeDArMYt_GmtbU4RJe9rr9k | SILVERII |
| 1 | 3856873255 | 39 | 8568 | 12 | NONE | JUNGLE | TAdsWNVJiqCj0hA5npp4-z5EKXkkBJw7ss5vOhdFldZw-kI | GOLDIV |
| 2 | 3856873255 | 10 | 8377 | 155 | SOLO | TOP | M01izMLGcM-Q5WjIkGooQHDSljWmjY0uIDHyeV-lCO6hEYM | GOLDIV |
| 3 | 3856873255 | 10 | 11397 | 168 | DUO_CARRY | BOTTOM | hQwpywrGvpGKyK0Dcz006NBlzUe4mEewMjEQuPgkgq5JrYk | GOLDIV |
| 4 | 3856873255 | 37 | 7035 | 25 | DUO_SUPPORT | BOTTOM | v1MLAUQfrq1FadXN0cPPbxkpAKxxLsdkcNUmv0kfuOM6TiA | GOLDIV |
| 5 | 3856873255 | 13 | 12398 | 147 | SOLO | TOP | 4DnCOG5yQV5SFo5n59BaBU2gXk-AjGgZUTJflaHOaIvqaXk | GOLDIII |
Every player has a rank going into the game. They get it after playing a minimum amount of games and getting placed. Ideally, as you get better at the game you rise through the ranks. The highest ranks have very few people, with the highest, challenger, having hundreds of people. There are around 30 ranks. Comparing our distribution to the distribution stated by league of legends we are pretty close. https://www.leagueofgraphs.com/rankings/rank-distribution
playerStats %>%
group_by(rank) %>%
summarise(count = n()) %>%
ggplot(aes(x=reorder(rank, -count), y=count)) + geom_bar(stat="identity") + coord_flip()
First let’s take a look at the summary statistics on all of our fields to get a better understanding of our data.
summary(playerStats)
## X gameId visionScore goldEarned
## Min. : 0 Min. :3.857e+09 Min. : 0.00 Min. : 667
## 1st Qu.: 3137 1st Qu.:3.857e+09 1st Qu.: 12.00 1st Qu.: 8011
## Median : 6274 Median :3.857e+09 Median : 19.00 Median :10326
## Mean : 6274 Mean :3.857e+09 Mean : 23.62 Mean :10554
## 3rd Qu.: 9412 3rd Qu.:3.857e+09 3rd Qu.: 29.00 3rd Qu.:12821
## Max. :12549 Max. :3.857e+09 Max. :171.00 Max. :38844
## totalMinionsKilled role lane userId
## Min. : 0.0 Length:12550 Length:12550 Length:12550
## 1st Qu.: 35.0 Class :character Class :character Class :character
## Median :112.0 Mode :character Mode :character Mode :character
## Mean :105.3
## 3rd Qu.:162.0
## Max. :403.0
## rank
## Length:12550
## Class :character
## Mode :character
##
##
##
Now let’s look at our target variable totalMinionsKilled. This is really interesting. Plotting totalMinionsKilled gives us a bimodal distribution. I suspect the lower peak is from support roles and jungle roles that are not killing as many creeps throughout the game. We will likely have to separate the roles.
hist(playerStats$totalMinionsKilled)
Let’s take a look at the average totalMinionsKilled by lane and role. My suspicions were correct, DUO_SUPPORT and NONE have low averages.
playerStats %>%
group_by(lane, role) %>%
summarise(avg = mean(totalMinionsKilled),
med = median(totalMinionsKilled)) %>%
kbl() %>%
kable_styling()
## `summarise()` has grouped output by 'lane'. You can override using the `.groups` argument.
| lane | role | avg | med |
|---|---|---|---|
| BOTTOM | DUO | 109.80000 | 94.0 |
| BOTTOM | DUO_CARRY | 164.94041 | 163.0 |
| BOTTOM | DUO_SUPPORT | 34.04340 | 32.0 |
| BOTTOM | SOLO | 109.20253 | 111.5 |
| JUNGLE | NONE | 46.03748 | 34.0 |
| MIDDLE | DUO | 158.50820 | 155.0 |
| MIDDLE | DUO_CARRY | 159.62000 | 152.5 |
| MIDDLE | DUO_SUPPORT | 47.48077 | 37.0 |
| MIDDLE | SOLO | 153.28166 | 151.0 |
| NONE | DUO | 50.22530 | 14.0 |
| NONE | DUO_SUPPORT | 60.41154 | 61.0 |
| TOP | DUO | 168.65972 | 165.0 |
| TOP | DUO_CARRY | 160.08571 | 156.0 |
| TOP | DUO_SUPPORT | 52.04762 | 36.0 |
| TOP | SOLO | 161.00209 | 158.0 |
ggplot(data = playerStats, aes(x=totalMinionsKilled, y=role)) + geom_boxplot()
Let’s see how the histogram changes when I remove the 2 roles. It turns into a pretty normal distribution, which is a good sign for our analysis.
playerStats %>%
filter(role != "DUO_SUPPORT" & role != "NONE") %>%
ggplot(aes(x = totalMinionsKilled)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Here I break the distibution out by rank to get a visual feel for how these distributions look. You can see that some ranks are have a greater population. For now this will have to do because it will be difficult to make each section equal.
playerStats %>%
filter(role != "DUO_SUPPORT" & role != "NONE") %>%
ggplot(aes(x = totalMinionsKilled)) + geom_histogram() +
facet_wrap(vars(rank))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
This section of code is just converting the ranks into numeric form 1-26.
playerStats$rank_num1 <- ifelse(grepl("IRON", playerStats$rank),0,
ifelse(grepl("BRONZE", playerStats$rank),4,
ifelse(grepl("SILVER",playerStats$rank),8,
ifelse(grepl("GOLD",playerStats$rank),12,
ifelse(grepl("PLATINUM",playerStats$rank),16,
ifelse(grepl("DIAMOND",playerStats$rank),20,
ifelse(grepl("MASTER",playerStats$rank),24,
ifelse(grepl("GRANDMASTER",playerStats$rank),25,
ifelse(grepl("CHALLENGER",playerStats$rank),26,0
)))))))))
playerStats$rank_num2 <- ifelse(grepl("III$",playerStats$rank), 1,
ifelse(grepl("II$",playerStats$rank),2,
ifelse(grepl("MASTER",playerStats$rank) | grepl("CHALLENGER",playerStats$rank) ,0,
ifelse(grepl("I$",playerStats$rank),3,
ifelse(grepl("IV$",playerStats$rank),0,0
)))))
playerStats$rank_num <- playerStats$rank_num1 + playerStats$rank_num2
The first view below looks at totalMinionsKilled vs rank_num for our entire dataset. As we could expect from the previous graphs I think removing some of the lanes will help our model greatly.
ggplot(data=playerStats,aes(x=rank_num,y=totalMinionsKilled)) + geom_point() + geom_jitter()
Filtering our set down makes it look like there is a clear trend: The higher the rank the more likely the player is to have a higher on average creepscore in a game.
laners <- playerStats %>%
filter(role != 'DUO_SUPPORT' & role != 'NONE' & rank != "INVALID" & totalMinionsKilled > 5)
ggplot(data=laners,aes(x=rank_num,y=totalMinionsKilled)) + geom_point() + geom_jitter() + geom_smooth(method = "lm")
## `geom_smooth()` using formula 'y ~ x'
Just to be sure we can check the correlation and see it’s positive between the two variables.
cor(laners$rank_num,laners$totalMinionsKilled)
## [1] 0.23255
ggplot(data=playerStats,aes(x=rank_num,y=totalMinionsKilled)) + geom_point() + geom_jitter() + facet_wrap(vars(role)) + geom_smooth(method = "lm")
## `geom_smooth()` using formula 'y ~ x'
Now let’s plug our dataset into a linear regression model and see what function is returned. Note the Pr(>t) value. This corresponds to the probability of observing any value equal or larger than t. Since we have a extremely small p-value it means there is almost no chance we observed this relationship due to chance. Since our Pr(>t) is so low we can conclude that the results are statistically significant.
\[ Creep Score = 126.9362 + 2.7321 * Rank \]
minion_model <- lm (totalMinionsKilled ~ rank_num, laners)
summary(minion_model)
##
## Call:
## lm(formula = totalMinionsKilled ~ rank_num, data = laners)
##
## Residuals:
## Min 1Q Median 3Q Max
## -161.649 -27.721 -1.721 27.671 248.743
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 126.9362 1.6138 78.66 <2e-16 ***
## rank_num 2.7321 0.1391 19.64 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 46.02 on 6748 degrees of freedom
## Multiple R-squared: 0.05408, Adjusted R-squared: 0.05394
## F-statistic: 385.8 on 1 and 6748 DF, p-value: < 2.2e-16
Now that we have already created our model lets make sure a inear regression fits this data set by checking each condition:
Linearity: Our view shows a positive linear trend. This condition is met. Nearly Normal Residuals: The QQ plot and histogram below show that the residuals are almost a normal distribution. This condition is met. Constant Variability: The 3rd plot shows there is no noticeable trend and the variability is constant. This condition is met. Independent Observations: The site I pulled the matchIds from claim its random. The rank distribution is fairly close to the actual population. This condition is met.
All of the conditions for a least squares line has been met. A linear model is perfect to describe this correlation.
qqnorm(minion_model$residuals)
qqline(minion_model$residuals)
hist(minion_model$residuals)
plot(x=minion_model$residuals)
In conclusion, there is a positive relationship between rank and creep score, and our Pr(>t) value means our results are significant. There was almost 0 chance these results were observed by chance, which means we can reject the null hypothesis that there is no correlation between the 2 variables.
Furthermore, after evaluating the residuals we confirm that all the necessary conditions are met and a linear regression model fits this data set. With these result I would feel comfortable telling a beginer getting into the game to invest time into improving this skill.