Heya, Go4ino here again with another SGC data analytics report. The original inspiration for this report was RebelFox on twitter asking if I could investigate if differences in performances existed between different champions with different types of player bases.
For example: Is there a difference between Ornn and Sona? Ornn has been played extensively by practically every top laner in SGC, while Sona is more or less exclusively played by Dean. Does this mean Sona has higher average performance? Or is it overall better to stick primarily to the meta?
Thankfully @pookarGG’s SGC data set exists and has concrete data to analyze. I did heavy data modification / rearranging / tinkering / etc to get the data sets I used in R-Studio.
As mentioned previously, the original source for this data was from Pookar’s SGC stats doc. In specific this report pulled data from the Champion By Player, and DATA tab. I put all this data together into an Excel spreadsheet to read into R-Studio.
Pookar’s data takes match results data from almost every AM league up to 7/27, with only BIG League missing. Even with some matches being absent, there are 493 games present to analyze.
The sheet is heavily automated, meaning every tab of interest is updated when match histories are entered. I chose the Champion By Player tab as it allowed me to separate champions by position more easily to distinguish flex picks, and I felt like lumping flex champs together would skew results (eg: Top, Jungle, and Support Sett have vastly different statistics). The Data tab was used for calculations for the entire sample population.
fig 1: Interactive dot-plot of player champ play rate vs win rate
note: only champions with at least 10 games played, and players with 10 games on a champ are shown.
To interact with the graph hover over the graph to display options, hover over data points to display critical information, and click on the positions in the legend to show/hide data points for a specific role.
The vertical lines at 20% and 60% of total play rate is where I drew lines for separating the 3 factor groups. I decided on these 3 groupings based off what I perceived to be 3 distinct groups of data, and picked 20% and 60% since they would divide the 3 groups fairly well. Of course, these separations are by no means written in stone and just my personal interpretation which I based purely off of Proportion of Picks, and my eyeball’s guessing power. Important to note is that due to the minimum game requirements this graph is naturally biased towards players who have played the most games. For example 100T FallenBandit moved to starting top during UPL playoffs and has only 8 games in the recorded data, meaning it is impossible for him to even be included anywhere on fig 1. Likewise, teams who play more games per set on average are more favored to have their players appear on the graph. Lets say Team X is extremely dominant, and 2-0s every opponent, but Team Y is less consistent and all their series go to game 3. This means that Team Y plays 50% more games than Team X despite playing the same number of Bo3 series, and inherently is biased in favor of Team Y.
Points to the left are more widely played meta champs, whereas points to the right tend to have players who represent a significant portion of a given champ’s playerbase. As such I have categorized the 3 playerbase groups as: Broad, Medium, and Narrow. You can think of Broad being meta champions, Medium being somewhat meta champions that certain players/teams may more heavily favor than others, and lastly Narrow is champions that are almost exclusively played by single players.
As expected, the further along the x-axis you go the fewer data points there are. But are there any differences to be noted?
So we now have a concrete definition for the 3 primary types of playerbases. This means we can look at each of the 3 groups and try and discern any trends, differences, anomalies, etc.
| Playerbase descriptive stats from fig 1 | ||||
|---|---|---|---|---|
| Playerbase | N | Distinct Champions | Mean Win Rate | Mean Champion Total Games |
| Broad | 74 | 23 | 55.48% | 141.39 |
| Medium | 10 | 10 | 59.18% | 52.70 |
| Narrow | 4 | 4 | 75.76% | 19.75 |
table 1
The Broad category has astronomically more data points than the other 2 groups by a mile, however there is plenty of overlap with a relatively meager 23 distinct champions. For example: Aphelios appears 9 different times in the Broad category (click the legend to change which positions show on fig 1 to make it easier to visualize). On the flip side, Medium and Narrow both are entirely made up of unique champion picks. It should be noted that it is impossible for any duplicate champions in the same role to appear in Narrow due to the requirements. Furthermore, while it is possible for duplicates to appear in Medium, it’s unlikely since all the champions in it are played somewhat widely and it just happens some teams / players value them higher. As for Narrow’s champions, 3/4 of them were explicitly named as examples when the topic was brought up with Hunter’s Nunu being a surprise point to me at least. This is one frontier where I absolutely hate the missing data from certain series / leagues / etc. because I’m sure there would be more points in Narrow I feel.
Now for win rates. I should preface this by saying to take this all with a grain of salt. As noted previously, the population sizes for Medium and Narrow are really small due to my limitations for games played. Even with those caveats I do still think it is worthwhile to examine the results to look for trends. With that being said, there is no significant difference in win rate between Broad and Medium champions aside from a slight bump upwards. I personally expected there to be at least some more of a difference in win rate. All the Medium champions do have decent sized playerbases though and are seen enough most players are at least familiar with the matchups, so maybe familiarity is the reason for only a slight deviation? I’m not entirely sure as it could be a variety of factors, and this is something I feel maybe the players themselves might be able to give good insight into this. Now for Narrow we see a massive jump to a win rate over 75%. Everyone from pro-players to silver analysts on Reddit can tell you 75% win rate is really high, but I want to put this into perspective with numbers using the raw data from raw data 10g min.csv.
Because a match’s result can only be win or lose, and the data’s sample population is large (\(n=66\)) I will approximate the binomial distribution with the following normal distribution:
\[ p=0.5, n=66\\ \mu=np=33\\ \sigma = \sqrt{np(1-p)} \approx 4.062019 \]
Where \(p\) is the population average win rate, \(n\) is the sample size (aka the sum of games in Narrow), \(\mu\) is our distribution’s mean, and \(\sigma\) is the distribution’s standard deviation. Now we can run a single sample 1 sided hypothesis test to check how much of an outlier that win rate is. For overkill we will do the test at a 99% significance level, so there’s only a 1% chance of a type-1 error.
Narrow champions is the same as \(\mu\)
Narrow champions is higher than \(\mu\)
\[ x=50, \alpha=0.01\\ Z_0=\frac{x-\mu}{\sigma} \approx 4.185111 \\ Pvalue=1-P(Z_0) \approx 1.425136e-05 \]
Because the P-value is lower than the chosen significance level \(\alpha\), we can reject \(H_0\) and very confidently claim that the champions in the Narrow category have significantly higher win rates.
Performing a full ass hypothesis test was kind of overkill ngl, but it does help illustrate my point that these picks bring in really poggers results. Keep in mind in most instances the champions in this category typically receive few changes (usually buffs) over entire seasons sometimes. This means that the potency of these picks remains the same over competitive splits, forcing enemy teams to either prepare for a pick that only really one person may play, or in most cases use up a ban. “Just use 1 ban no biggie” may be how some interpret this, but I want to paint a picture here to show this in action:
So it’s no secret that for the majority of the split, red side was forced to ban Varus in first ban phase every game. Now lets say your team is red side against SuperNova, you ban Varus of course, and then Always Plan Ahea’s A-sol. But now you only have 1 ban left in first phase, meaning a strong pick you wanted to ban slips through and SuperNova grab that champion. And the cherry on top is that Always Plan Ahea just locks in Syndra and still performs well.
That may sound like hyperbole, but it was fairly common Always Plan Ahea said himself.
“It for sure made atleast the first half of pick/ban easier with them banning asol every game. It would make it so that we could draft around getting certain picks because we only know that they are working with at most 2 bans”
A glance at the Team Win/Loss % - Side tab on Pookar’s sheet shows that the average blue side win rate is 53.75%, but SuperNova has a significantly higher average blue side win rate at 72.41% and having the enemy usually use a ban on A-Sol may be partially why.
Lastly, we have the average number of total games a champion has been played across all of SGC depending on which playerbase category the data point falls into. Unsurprisingly this number drops off heavily like in the column N. Turns out if you have 10-15 games on a champion but account for only 10% of the picks overall, then there’s going to be a very high total game count for that champion! For the Narrow category I do wish I had ban data to see what the overall (pick+ban)/total games ratio is for those points, which would make it easier to draw definitive conclusions. For Always Plan Ahea’s A-Sol he was able to share with me that SuperNova would always pick A-Sol if it managed to get through bans, but I lack the critical information for the other 3. Like are these players picking these champions in Narrow whenever they can? Or do they save them for specific situations?
This was a question that popped into my head for the 4 players in the Narrow category. While boasting high performance on those champions they’re known for is great, can they still perform when not on said champions? Because if they struggle on different champions, well then said power picks could be a potential weakness for their teams rather than a strength.
To analyze this question I decided to perform hypothesis testing to check the differences.
Due to time limitations I’ve decided to only analyze the win rate as it is a fairly popular metric to judge one’s performance. However, Win rate won’t give us as complete of a picture in champion performance than if I were to include other metrics. For example: comparing gold/minute, or lane difference @ 10 minutes could help show how well the players can generate leads.
My original plan was to compare the mean Win rate for each of the 4 players on and off their champions, using 2 sample hypothesis testing. However, in order to do that each player would need at least 10 wins, and 10 losses on their champions and at least 10 wins, and 10 losses off their signature champs. Sadly, only Dean meets this criteria. Thus for the 4 players I will just be running a single sample hypothesis test to check if they still have a significantly better Win rate compared to the population 50% average, and just do a direct comparison of Win rate for each player. These single hypothesis tests will be done basically the same as the hypothesis test in 3.1.1, with just some numbers changed. For all 4 of these tests we will be testing the following null and alternative hypotheses at a 95% confidence interval:
\[ H_0: \mu=x\\ H_1: \mu < x \]
Furthermore we already know that the population proportion is \(p=0.5\), aka a 50% wr.
Thankfully when all 4 players are combined we meet the requirements for a 2 sample hypothesis test. As such I will run a 2 sample hypothesis test there.
| Always Plan Ahea Averages | |||
|---|---|---|---|
| Metric | Total | A-Sol | Non A-Sol Champs |
| Games | 61 | 15 | 46 |
| Wins | 36 | 12 | 24 |
| Win rate | 0.5902 | 0.8000 | 0.5217 |
table 2
\[ n = 46\\ \mu = np = 23\\ \sigma =\sqrt{23*0.5(1-0.5)} \approx 3.3912 \\ Z_0 = \frac{24-23}{\sigma} \approx 0.4170\\ Pvalue = 1 - P(Z_0) \approx 0.3383 > 0.05 = \alpha \]
We fail to reject \(H_0\) because our P-value is greater than \(\alpha\). Thus there is no significant statistical difference in Win rate between Always Plan Ahea when he’s not on A-Sol and the overall average.
| Dean Averages | |||
|---|---|---|---|
| Metric | Total | Sona | Non Sona Champs |
| Games | 59 | 26 | 33 |
| Wins | 37 | 20 | 17 |
| Win rate | 0.6271 | 0.7692 | 0.5152 |
table 3
\[ n = 33\\ \mu = np = 16.5\\ \sigma =\sqrt{16.5*0.5(1-0.5)} \approx 2.0310 \\ Z_0 = \frac{17-16.5}{\sigma} \approx 0.2462\\ Pvalue = 1 - P(Z_0) \approx 0.4028 > 0.05 = \alpha \]
We fail to reject \(H_0\) because our P-value is greater than \(\alpha\). Thus there is no significant statistical difference in Win rate between Dean when he’s not on Sona and the overall average.
| Hunter Averages | |||
|---|---|---|---|
| Metric | Total | Nunu | Non Nunu Champs |
| Games | 70 | 10 | 60 |
| Wins | 39 | 7 | 32 |
| Win rate | 0.5571 | 0.7000 | 0.5333 |
table 4
\[ n = 60\\ \mu = np = 30\\ \sigma =\sqrt{30*0.5(1-0.5)} \approx 2.739 \\ Z_0 = \frac{32-30}{\sigma} \approx 0.7303\\ Pvalue = 1 - P(Z_0) \approx 0.2326 > 0.05 = \alpha \]
We fail to reject \(H_0\) because our P-value is greater than \(\alpha\). Thus there is no significant statistical difference in Win rate between Hunter when he’s not on Nunu and the overall average.
| Lobozz Averages | |||
|---|---|---|---|
| Metric | Total | Jinx | Non Jinx Champs |
| Games | 61 | 15 | 46 |
| Wins | 37 | 11 | 26 |
| Win rate | 0.6066 | 0.7333 | 0.5652 |
table 5
\[ n = 46\\ \mu = np = 23\\ \sigma =\sqrt{23*0.5(1-0.5)} \approx 3.3912 \\ Z_0 = \frac{26-23}{\sigma} \approx 1.2511\\ Pvalue = 1 - P(Z_0) \approx 0.1055 > 0.05 = \alpha \]
We fail to reject \(H_0\) because our P-value is greater than \(\alpha\). Thus there is no significant statistical difference in Win rate between Lobozz when he’s not on Jinx and the overall average.
| All Four Averages | |||
|---|---|---|---|
| Metric | Total | Signature Champ | Non Signature Champs |
| Games | 250 | 66 | 184 |
| Wins | 149 | 50 | 99 |
| Win rate | 0.5960 | 0.7576 | 0.5380 |
table 6
Now for the based 2 sample hypothesis test. This will tell us if there’s any significant drop in Win rate when these players are not playing their signature champs. Ideally we want the players to be just as good off their signature champions, as they are on their signature champs. Thus in this instance we want to fail to reject \(H_0\), as that would mean there is no significant difference in the means.
Using the data from table 6, and a confidence level of 95% we get the following values and equations:
\[ \alpha = 0.05\\ H_0:p_1=p_2\\ H_1: p_1 > p_2\\ p_1 = \frac{50}{66} \approx0.7576 \\ p_2 = \frac{99}{184}\approx 0.5380 \\ \hat{p}=\frac{149}{250} \approx 0.5960 \\ Z_0 = \frac{p_1-p_2}{\sqrt{\hat{p} (1- \hat{p})(\frac{1}{66} + \frac{1}{184})}} \approx 3.1181 \]
Thus our resulting P-value is:
\[ 1-P(Z_0) \approx 0.00148 < 0.05 = \alpha \]
Because the P-value is lower than our \(\alpha\) we reject \(H_0\). Sadly this means that in general, these players perform significantly better when they are on their champs. I’m not significantly surprised though, since maintaining a close to 75% wr overall is giga hard to do.
Given the results of 3.2.1.1-4, it appears as if ‘just ban their champ lmao’ is a good strat with low time investment since all 4 of these players aren’t significantly better than the average player win rate wise off their champs.
So as we have seen these Narrow category champions are very high performers in terms of Win rate with that astonishing 75% wr. Of course it was shown that when these players are off their signature champions, they drop to basically the average Win rate which shows merit in the argument to ban said champs.
But I guess that brings us to the big question of this paper: “Is there differences in performance between champions of different playerbase types?”
To which I can confidently say yes. They have insane average win rates on those champions. It is a very massive difference, and the good kind at that.
Below is a list of things I wish I included in this report, and things one could do to further explore the data in the future. Basically all of them come down to perfect universe where I had more time, but SGC ended like 2 months ago and school started again so sadly I wanted to get this out before I died of work load.
Win rate.Champion By Player, and DATA tabs in the CSV format so I can load them into R-Studio. + They also have new variables added that weren’t in the original dataset. Furthermore, most existing variables were renamed to be “code friendly”.all role champs 7-27.csv is just a fusion of all 5 role’s individual spreadsheets.all role champs 10g min 7-27.csv is a filtered version of all role champs 7-27.csv with only observations that meet the minimum criteria for a minimum of 10 games in total for a champion, and players who have played at least 10 games of said champion.top/jung/mid/adc/sup champs 7-27.csv are the spreadsheets for each individual position. All 5 combine into all role champs 7-27.csv.APA/Dean/Hunter/Lobozz avgs table.csv are just compiled averages for the metrics shown in tables 2-5.raw data entry 7-27.csv is modified data from Pookar’s DATA tab.raw data entry 7-27 with double data.csv is raw data entry 7-27 with double data.csv but with double the data for the purposes of having the all roles category and being able to put the 5 positions + the overall data results in the same graph.Some random supplementary stuff that isn’t exactly relevant to the report.
fig 2: Interactive dot-plot of player champ play rate vs win rate
I had originally planned to also compare KDA alongside Win rate for the players with hypothesis testing. Sadly I have an acute case of being all dummy and no thicc and after doing a ton of data transforming and stuff realized I couldn’t do a proper hypothesis test for KDA since there are always zero death cases and I can’t plug infinity into the formulas. I didn’t want all that work to go to waste though so I’m plopping the graphs and tables here in the appendix. I could have grouped the data by player to get non infinite values but I felt like that wouldn’t be accurate enough for my tastes.
| Always Plan Ahea Averages | ||||
|---|---|---|---|---|
| Kills | Deaths | Assists | KDA | |
| APA Total | 4.3279 | 2.7049 | 7.2623 | 4.2848 |
| A-Sol | 4.8667 | 2.0667 | 7.0667 | 5.7742 |
| Non A-Sol Champs | 4.1522 | 2.9130 | 7.3261 | 3.9403 |
| Overall Mid Avg | 4.5142 | 3.3966 | 6.6410 | 3.2843 |
table 7
fig 3
| Dean Averages | ||||
|---|---|---|---|---|
| Kills | Deaths | Assists | KDA | |
| Dean Total | 1.2542 | 3.0339 | 12.8136 | 4.6369 |
| Sona | 1.4231 | 2.8077 | 14.2308 | 5.5753 |
| Non Sona Champs | 1.1212 | 3.2121 | 11.6970 | 3.9906 |
| Overall Sup Avg | 1.4351 | 3.9260 | 10.7424 | 3.1018 |
table 8
fig 4
| Hunter Averages | ||||
|---|---|---|---|---|
| Kills | Deaths | Assists | KDA | |
| Hunter Total | 3.4571 | 3.4714 | 10.8714 | 4.1276 |
| Nunu | 2.8000 | 2.9000 | 14.7000 | 6.0345 |
| Non Nunu Champs | 3.5667 | 3.5667 | 10.2333 | 3.8692 |
| Overall Jung Avg | 3.8377 | 3.8651 | 8.1258 | 3.0953 |
table 9
fig 5
| Lobozz Averages | ||||
|---|---|---|---|---|
| Kills | Deaths | Assists | KDA | |
| Lobozz Total | 5.2295 | 3.4590 | 6.6066 | 3.4218 |
| Jinx | 7.0000 | 3.8000 | 7.4667 | 3.8070 |
| Non Jinx Champs | 4.6522 | 3.3478 | 6.3261 | 3.2792 |
| Overall ADC Avg | 5.1572 | 3.3732 | 6.2535 | 3.3827 |
table 10
fig 6
| Overall Averages | ||||
|---|---|---|---|---|
| Position | Kills | Deaths | Assists | KDA |
| Sup | 1.4351 | 3.9260 | 10.7424 | 3.1018 |
| ADC | 5.1572 | 3.3732 | 6.2535 | 3.3827 |
| Mid | 4.5142 | 3.3966 | 6.6410 | 3.2843 |
| Jung | 3.8377 | 3.8651 | 8.1258 | 3.0953 |
| Top | 3.5730 | 4.0071 | 6.1978 | 2.4384 |
| All | 3.7034 | 3.7136 | 7.5921 | 3.0417 |
table 11
fig 7
Basically the overall population average KDAs by role.
| Overall Averages | ||||
|---|---|---|---|---|
| Position | Kills | Deaths | Assists | KDA |
| Sup | 1.4351 | 3.9260 | 10.7424 | 3.1018 |
| ADC | 5.1572 | 3.3732 | 6.2535 | 3.3827 |
| Mid | 4.5142 | 3.3966 | 6.6410 | 3.2843 |
| Jung | 3.8377 | 3.8651 | 8.1258 | 3.0953 |
| Top | 3.5730 | 4.0071 | 6.1978 | 2.4384 |
| All | 3.7034 | 3.7136 | 7.5921 | 3.0417 |
table 12