MA500 Presentation: League of Legends Anaylsis and Modeling

Mark Ira Galang

Introduction to Data set

What is the dataset about?

The datasets are about a MOBA (Multiplayer Online Battle Arena) video game called League of Legends.

There are two,

match_player_stats_info_full.csv
match_general_info_full.csv

Both datasets is about metadata of matches in a MOBA (Multiplayer Online Battle Arena) video game called League of Legends.

match_player_stats_info_full.csv is about metadata of the results of match

match_general_info_full.csv is about metadata before the match started.

What is in the dataset?

match_player_stats_info_full.csv dataset includes thing like stats of each players like KDA (kills, deaths, assists), CS (Minions Killed), Gold Earned, Towers Destroyed, or things generally about the match like how long the game was played.

match_general_info_full.csv dataset includes what champion a player bans before a match and which team wins!

Introduction to Data set

Where did I get the dataset?

I made the dataset! I use riot’s api to fetch all this match data.

The following below is grabbing match data:

# get the api key from user, this is used for everything
api = "YOUR_API_KEY_HERE"

captcha = "?api_key="
captcha2 = "&api_key="

Created a function that fetches match data

def get_matches(api, puuid):
    url = f"https://americas.api.riotgames.com/lol/match/v5/matches/by-puuid/{puuid}/ids?queue=420&start=0&count=100{captcha2}{api}"
    data = requests.get(url)

    return data.json()

Fetch the top 100 players in League of Legends (in North America, aka NA)

def fetch_challenger_players(api):
    url = f"https://na1.api.riotgames.com/lol/league/v4/challengerleagues/by-queue/RANKED_SOLO_5x5{captcha}{api}"

    # this returns it as a dictionary
    return requests.get(url).json()

challenger_players = fetch_challenger_players(api)

Then from here, grab the most 100 recent matches of said player (Due to riots 100 fetches per 2 minutes, I must pause for at least 1.3 seconds)

matches = []
for puuid in puuid_players_100:
    # let user know what is going on
    print(f"Doing {puuid}")

    match_data = get_matches(api, puuid)

    print (f"Here are the matches {match_data}")

    time.sleep(1.3)
    for match in match_data:
        matches.append(match)

Then after that, with EACH match, we grab the metadata of that match

# for each match we need to grab the meta data for 1 match
def get_match_data(match_num):
    url = f"https://americas.api.riotgames.com/lol/match/v5/matches/{match_num}{captcha}{api}"
    data = requests.get(url).json()
    info_data = data['info']
    
    # lets get list of those players in the match
    players = info_data['participants']

    df_temp_match = pd.DataFrame()
    for player in players:
        row = {"match" : match_num, **player}
        df_temp_match = pd.concat([df_temp_match, pd.DataFrame([row])])
    
    df_temp_match_info = pd.DataFrame()

    for team in info_data['teams']:
        for pickTurn in team['bans']:
            row = {"match" : match_num, **pickTurn, "teamId" : team['teamId'], "win" : team['win']}

            df_temp_match_info = pd.concat([df_temp_match_info, pd.DataFrame([row])])
        
    return [df_temp_match, df_temp_match_info]

You can imagine that since have 100 matches per player, have 100 players, and can only get 100 fetches per 2 minutes (or 50 fetches per minute), how long this took me to get the data :(

# save the dataframes
df_match.to_csv("match_player_stats_info_full.csv", index=False)
df_match_info.to_csv("match_general_info_full.csv", index=False)

Then we save the dataframes after

How big is the dataset?

For both datasets, due to matches having the same players, I had to filter before hand because I don’t want to wait 4 hours to get this dataset with duplicates.

After filtering, I have found that I have 5845 matches.

For the match_player_stats_info_full.csv dataset, each match has 10 players, each player having 145 columns (including matchID).

Making the total data set 58450 rows and 145 columns!

For match_general_info.csv dataset, it is still 10 players but there are only 6 columns (including matchID), maing this total dataset 58450 rows and 145 columns.

Question Formulation and Hypothesis

General Questions

So my general questions are the following:

Which side statistically wins more games?
Which champion is the top 10 most picked within the matches per role.
Which champion is the most banned within the matches per role.
What bot duo is most picked?
What are the most popular matchups (per role)

The more ambitious side is also how can I the results screen after every game to try and predict which team is more likely to win. Things like:

Does having this much KDA at this time give you a higher chance of winning?
How much does objectives like turrets, dragons, barons, affect your winrate as time goes?
Does the role play an affect on a persons winrate?
How does damage, mitigation, healing, etc. of each role affect the winrate?

And so my main goal in general is the following:

Create a best model that uses as many (or as little) predictors for response, winrate.

Cleaning/Mutating the dataset

Preliminary

First lets load some necessary libraries for the rest of the presentation:

library(tidyverse) # reshaping, manipulating data
library(jsonlite) # load in json files
library(ggplot2) # for visuals
library(knitr) # genral stuff
library(caret) # for assesing how good model is
library(pROC) # for getting best fitting threshold

match_info <- read_csv("datasets/match_general_info_full.csv")
match_data <- read_csv("datasets/match_player_stats_info_full.csv")

NOTE: For the purposes of this presentation, I will use the reduced dataset to save time (otherwise we will be here for longer than 10 minutes)

Those columns will be the following: match, teamId, teamPosition, championId, championName, kills, deaths, assists, totalMinionsKilled, turretKills, dragonKills,baronKills, challenges, turretsLost, inhibitorsLost, objectivesStolen, timePlayed, win

Cleaning/Mutating Datasets

The match_general_info.csv dataset has a couple issues. First, the champions that are banned are not in a string form, but rather numeric.

print (match_info)

# A tibble: 54,850 × 5
   match          championId pickTurn teamId win  
   <chr>               <dbl>    <dbl>  <dbl> <lgl>
 1 NA1_5361398251        517        1    100 TRUE 
 2 NA1_5361398251         -1        2    100 TRUE 
 3 NA1_5361398251         78        3    100 TRUE 
 4 NA1_5361398251        268        4    100 TRUE 
 5 NA1_5361398251         92        5    100 TRUE 
 6 NA1_5361398251         53        6    200 FALSE
 7 NA1_5361398251         -1        7    200 FALSE
 8 NA1_5361398251        119        8    200 FALSE
 9 NA1_5361398251          9        9    200 FALSE
10 NA1_5361398251         80       10    200 FALSE
# ℹ 54,840 more rows

To users, what do these numbers mean? Its not very consistent as there are 171 champions, so we expect 171 ID from 1-172, but, why do we have championId 517, 887, and 268? To fix this, through the riot api documentation there is an entire json file of the champion data.

We can simply port this dataset into our R

url <- "https://ddragon.leagueoflegends.com/cdn/15.18.1/data/en_US/champion.json"
champion_data <- fromJSON(url)

champion_lookup_table <- sapply(champion_data$data, function(x) x$key)
names(champion_lookup_table) <- names(champion_data$data)

# reverse to make it easier
champion_map_rev <- setNames(names(champion_lookup_table), as.integer(champion_lookup_table))

Then from here, we can add a new column called championName that simply checks the look up table and adds the champion name based on ID.

match_info <- match_info %>%
  mutate(championName = champion_map_rev[as.character(championId)]) %>%
  # then we want to filter all NA as None
  mutate(championName = ifelse(is.na(championName), "NONE", championName))

The results from above is the following:

head (match_info)

# A tibble: 6 × 6
  match          championId pickTurn teamId win   championName
  <chr>               <dbl>    <dbl>  <dbl> <lgl> <chr>       
1 NA1_5361398251        517        1    100 TRUE  Sylas       
2 NA1_5361398251         -1        2    100 TRUE  NONE        
3 NA1_5361398251         78        3    100 TRUE  Poppy       
4 NA1_5361398251        268        4    100 TRUE  Azir        
5 NA1_5361398251         92        5    100 TRUE  Riven       
6 NA1_5361398251         53        6    200 FALSE Blitzcrank

Cleaning/Mutating the dataset

For the match_player_stats_info_full.csv, there are a bunch of issues.

teamPosition

For some reason, the “Support” role is indicated as “Utility” so we will rename that. Then also, teamId is numerical, 100 indicating blue team and 200 being red, we will also rename that.

match_data <- match_data %>%
  mutate(teamPosition = ifelse(teamPosition == "UTILITY", "SUPPORT", teamPosition)) %>%
  mutate(teamId = ifelse(teamId == 100, "Blue", "Red")) %>%
  filter(teamPosition != "")

We will also do the same for match_info

match_info <- match_info %>%
  mutate(teamId = ifelse(teamId == 100, "Blue", "Red"))

Cleaning/Mutating the dataset

Elder Dragon is a strong objective that could flip the chances of a game.

There is currently no way to tell which team killed the objective, Elder Dragon, but there is a Challenge (which is like achievements) that sees an Elder Dragon killed, earliestElderDragon, elderDragonKillsWithOpposingSoul, elderDragonMultikills, and teamElderDragonKills.

For simplicity sake, we will just extract teamElderDragonKills.

print (match_data$challenges[1])

[1] "{'12AssistStreakCount': 0, 'HealFromMapSources': 392, 'InfernalScalePickup': 25, 'SWARM_DefeatAatrox': 0, 'SWARM_DefeatBriar': 0, 'SWARM_DefeatMiniBosses': 0, 'SWARM_EvolveWeapon': 0, 'SWARM_Have3Passives': 0, 'SWARM_KillEnemy': 0, 'SWARM_PickupGold': 0, 'SWARM_ReachLevel50': 0, 'SWARM_Survive15Min': 0, 'SWARM_WinWith5EvolvedWeapons': 0, 'abilityUses': 322, 'acesBefore15Minutes': 0, 'alliedJungleMonsterKills': 5, 'baronBuffGoldAdvantageOverThreshold': 1, 'baronTakedowns': 0, 'blastConeOppositeOpponentCount': 0, 'bountyGold': 0, 'buffsStolen': 1, 'completeSupportQuestInTime': 0, 'controlWardTimeCoverageInRiverOrEnemyHalf': 0.03365060556653759, 'controlWardsPlaced': 2, 'damagePerMinute': 698.2248328341794, 'damageTakenOnTeamPercentage': 0.21064628016827877, 'dancedWithRiftHerald': 0, 'deathsByEnemyChamps': 3, 'dodgeSkillShotsSmallWindow': 0, 'doubleAces': 0, 'dragonTakedowns': 0, 'earliestBaron': 1558.636567753, 'earlyLaningPhaseGoldExpAdvantage': 0, 'effectiveHealAndShielding': 0, 'elderDragonKillsWithOpposingSoul': 0, 'elderDragonMultikills': 0, 'enemyChampionImmobilizations': 8, 'enemyJungleMonsterKills': 1, 'epicMonsterKillsNearEnemyJungler': 0, 'epicMonsterKillsWithin30SecondsOfSpawn': 0, 'epicMonsterSteals': 0, 'epicMonsterStolenWithoutSmite': 0, 'firstTurretKilled': 1, 'firstTurretKilledTime': 856.058216872, 'fistBumpParticipation': 0, 'flawlessAces': 1, 'fullTeamTakedown': 0, 'gameLength': 1775.8252846250002, 'getTakedownsInAllLanesEarlyJungleAsLaner': 0, 'goldPerMinute': 390.70391695530105, 'hadOpenNexus': 0, 'immobilizeAndKillWithAlly': 1, 'initialBuffCount': 0, 'initialCrabCount': 0, 'jungleCsBefore10Minutes': 0, 'junglerTakedownsNearDamagedEpicMonster': 0, 'kTurretsDestroyedBeforePlatesFall': 0, 'kda': 2.3333333333333335, 'killAfterHiddenWithAlly': 0, 'killParticipation': 0.2413793103448276, 'killedChampTookFullTeamDamageSurvived': 0, 'killingSprees': 0, 'killsNearEnemyTurret': 0, 'killsOnOtherLanesEarlyJungleAsLaner': 0, 'killsOnRecentlyHealedByAramPack': 0, 'killsUnderOwnTurret': 0, 'killsWithHelpFromEpicMonster': 0, 'knockEnemyIntoTeamAndKill': 0, 'landSkillShotsEarlyGame': 0, 'laneMinionsFirst10Minutes': 87, 'laningPhaseGoldExpAdvantage': 0, 'legendaryCount': 0, 'legendaryItemUsed': [6692, 3161, 6699], 'lostAnInhibitor': 0, 'maxCsAdvantageOnLaneOpponent': 1, 'maxKillDeficit': 0, 'maxLevelLeadLaneOpponent': 1, 'mejaisFullStackInTime': 0, 'moreEnemyJungleThanOpponent': 0, 'multiKillOneSpell': 0, 'multiTurretRiftHeraldCount': 0, 'multikills': 0, 'multikillsAfterAggressiveFlash': 0, 'outerTurretExecutesBefore10Minutes': 0, 'outnumberedKills': 0, 'outnumberedNexusKill': 0, 'perfectDragonSoulsTaken': 0, 'perfectGame': 0, 'pickKillWithAlly': 6, 'playedChampSelectPosition': 1, 'poroExplosions': 0, 'quickCleanse': 0, 'quickFirstTurret': 0, 'quickSoloKills': 0, 'riftHeraldTakedowns': 0, 'saveAllyFromDeath': 0, 'scuttleCrabKills': 0, 'skillshotsDodged': 9, 'skillshotsHit': 0, 'snowballsHit': 0, 'soloBaronKills': 0, 'soloKills': 0, 'stealthWardsPlaced': 7, 'survivedSingleDigitHpCount': 0, 'survivedThreeImmobilizesInFight': 2, 'takedownOnFirstTurret': 0, 'takedowns': 7, 'takedownsAfterGainingLevelAdvantage': 0, 'takedownsBeforeJungleMinionSpawn': 0, 'takedownsFirstXMinutes': 3, 'takedownsInAlcove': 0, 'takedownsInEnemyFountain': 0, 'teamBaronKills': 1, 'teamDamagePercentage': 0.21906032820656154, 'teamElderDragonKills': 0, 'teamRiftHeraldKills': 0, 'tookLargeDamageSurvived': 0, 'turretPlatesTaken': 0, 'turretTakedowns': 4, 'turretsTakenWithRiftHerald': 0, 'twentyMinionsIn3SecondsCount': 0, 'twoWardsOneSweeperCount': 0, 'unseenRecalls': 0, 'visionScoreAdvantageLaneOpponent': -0.3411335349082947, 'visionScorePerMinute': 0.7272037436479267, 'voidMonsterKill': 0, 'wardTakedowns': 4, 'wardTakedownsBefore20M': 1, 'wardsGuarded': 0}"

# then extract from challenges, Elder Dragon Killed
match_data$elderDragonsKilled <- map_int(match_data$challenges, function(x) {
  fromJSON(gsub("'", "\"", x))$teamElderDragonKills
})

summary(match_data$elderDragonsKilled)

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
0.000000 0.000000 0.000000 0.009938 0.000000 2.000000

Despite the low numbers, this is fine as ElderDragon spawns during the later half (40+ minutes) of the game.

Answering General Questions (Exploratory Analysis)

Which side statistically wins more games?

To do this, we need to convert the data from long to wide. This is because we want the teams to be in 1 row.

match_info_wide <- match_info %>% 
  group_by(match, teamId, win) %>%
  mutate(player = paste0("pick", row_number())) %>%
  pivot_wider(id_cols = c(match, teamId, win),
              names_from = player,
              values_from = championId)

head (match_info_wide)

# A tibble: 6 × 8
# Groups:   match, teamId, win [6]
  match          teamId win   pick1 pick2 pick3 pick4 pick5
  <chr>          <chr>  <lgl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 NA1_5361398251 Blue   TRUE    517    -1    78   268    92
2 NA1_5361398251 Red    FALSE    53    -1   119     9    80
3 NA1_5361352205 Blue   FALSE   887    35    -1   555    37
4 NA1_5361352205 Red    TRUE    887    96    -1     9   141
5 NA1_5360829750 Blue   FALSE    92    -1    -1   200   201
6 NA1_5360829750 Red    TRUE     20   200    92    -1    -1

Then from here, we can get how many times one side wins and one side loses:

match_wins_stats <- match_info_wide %>%
  group_by(teamId) %>%
  summarise(games = n(),
            wins = sum(win), 
            winrate = wins / games * 100)

Which side statistically wins more games?

Using ggplot, we can see which side wins and loses!

ggplot(data = match_wins_stats, aes(teamId, wins)) +
  geom_col(fill = c("blue", "red"), col = "black")

Which champion is most picked per role?

Do to this, we must look at our match_data. We will reshape and then summarise by counting the amount of champions are in total by teamPosition (role).

total_games = sum(match_wins_stats$wins)

reshape_data_champions <- match_data %>% 
  group_by(teamPosition, championName) %>%
  summarise(total_picks = n(),
            percent_picked = total_picks / total_games * 100,
            .groups = "drop")

Then since I am looking for the top 10 per role, I make a new variable that holds that information. I also want to filter any one time or off-meta picks by checking if the percentage picked of a champion for that role is greater than 1% (or around 60 games)

top_per_role <- reshape_data_champions %>%
  group_by(teamPosition) %>%
  filter(percent_picked >= 1) %>%
  arrange(desc(percent_picked))  %>%
  slice_head(n = 10) %>%
  ungroup()

Which champion is most picked per role?

We can make it look nicer with a bar plot:

ggplot(top_per_role, aes(x = fct_reorder(championName, -percent_picked), y = percent_picked, fill = championName)) + geom_col() + facet_wrap(~ teamPosition, scales = "free") +  
  theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "none") +
  labs(title = "Champion Pick Rates by Position", x = "Champion", y = "Percent Picked")

Which champion is most banned per role?

For this, we go back to the match_info and look at the count.

head (match_info, 15)

# A tibble: 15 × 6
   match          championId pickTurn teamId win   championName
   <chr>               <dbl>    <dbl> <chr>  <lgl> <chr>       
 1 NA1_5361398251        517        1 Blue   TRUE  Sylas       
 2 NA1_5361398251         -1        2 Blue   TRUE  NONE        
 3 NA1_5361398251         78        3 Blue   TRUE  Poppy       
 4 NA1_5361398251        268        4 Blue   TRUE  Azir        
 5 NA1_5361398251         92        5 Blue   TRUE  Riven       
 6 NA1_5361398251         53        6 Red    FALSE Blitzcrank  
 7 NA1_5361398251         -1        7 Red    FALSE NONE        
 8 NA1_5361398251        119        8 Red    FALSE Draven      
 9 NA1_5361398251          9        9 Red    FALSE Fiddlesticks
10 NA1_5361398251         80       10 Red    FALSE Pantheon    
11 NA1_5361352205        887        1 Blue   FALSE Gwen        
12 NA1_5361352205         35        2 Blue   FALSE Shaco       
13 NA1_5361352205         -1        3 Blue   FALSE NONE        
14 NA1_5361352205        555        4 Blue   FALSE Pyke        
15 NA1_5361352205         37        5 Blue   FALSE Sona

Note that in a match, due to the Riot Policy, we cannot determine which role banned what champion. But at least we have information of the bans. Also, both sides can ban the same champion, we will count that as two instead of one ban per game.

banned_champions <- match_info %>%
  filter (championName != "NONE") %>%
  group_by(championName) %>%
  summarise(total_banned = n(), 
            percent_banned = total_banned / total_games * 100) %>%
  arrange(desc(percent_banned))

Which champion is most banned per role?

Then we can make a plot for all using facet_wrap:

ggplot(data = banned_champions %>% slice_head(n = 20), 
       aes(x = reorder(championName, -percent_banned), y = percent_banned, fill = championName)) +
  geom_col() +
  labs(title = paste0("Most banned champion in every role"), 
       x = "Champion Name", y = "Percent Banned in over 5000 matches") + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
  theme(legend.position = "none")

What bot duo is most picked?

Now for the bot duos, we can filter to Support and Bottom. Then also, we need to account for some ordering by letting things like Yunara + Pyke = Pyke + Yunara

bot_duos <- match_data %>%
  filter(teamPosition %in% c("SUPPORT", "BOTTOM")) %>%
  group_by(match, teamId) %>%
  summarise(bot_duo = paste(sort(championName), collapse = " + "), .groups = "drop")

From here, we can sumamrise the overall pickrate for each bot duo and filter by the top 10:

bot_duos_summary <- bot_duos %>%
  group_by(bot_duo) %>%
  summarise(count = n(),
            percent_picked = count / total_games * 100) %>%
  arrange(desc(count)) %>%
  head(n = 10)

Now we can make a table of the most picked bot duos

kable(bot_duos_summary,
  col.names = c("Bot Duo", "Total Count", "Percent Picked"),
      caption = paste("Top 10 Most Played Bot Duos"))

Top 10 Most Played Bot Duos
Bot Duo	Total Count	Percent Picked
Lucian + Nami	167	3.044667
Lulu + Yunara	149	2.716499
Alistar + Yunara	85	1.549681
Braum + Lucian	80	1.458523
Kaisa + Nautilus	80	1.458523
Lucian + Milio	75	1.367366
Ezreal + Karma	73	1.330903
Aphelios + Lulu	72	1.312671
Sona + Yunara	69	1.257976
Aphelios + Thresh	66	1.203282

Most popular matchups

For matchups, similar to bot duos, we can filter by just the two positions per match

popular_matchups_per_role <- match_data %>%
  group_by(match, teamPosition, teamId, championName) %>%
  filter(!is.na(championName)) %>%
  select(match, teamPosition, teamId, championName) %>%
  pivot_wider(names_from = teamId, 
              values_from = championName) %>%
  mutate(match_up = paste(sort(c(Red, Blue)), collapse = " vs ")) %>%
  select(match, teamPosition, match_up)

Then summarise by just counting each matchup:

# count each matchup
matchup_stats <- popular_matchups_per_role %>%
  group_by(teamPosition, match_up) %>%
  summarise(picked_total = n(), .groups = "drop") %>%
  group_by(teamPosition) %>%
  mutate(percent_picked = picked_total / total_games * 1000)

Note we multiply by 1000 as we are combining our 10 players from the entire match into 1 match.

Then get the top 10 per role:

top_matchups_per_role <- matchup_stats %>%
  group_by(teamPosition) %>%
  slice_max(percent_picked, n = 10) %>%
  filter(teamPosition != "")

Most popular matchups (by role)

Now we can finally filter then plot each one:

ggplot(top_matchups_per_role, aes(x = fct_reorder(match_up, -percent_picked), 
                                  y = percent_picked, fill = match_up)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ teamPosition, scales = "free") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(title = "Top 10 Most Picked Champion Matchups per Role",
       x = "Matchup",
       y = "Percent Picked")

Statistical Analysis

Logistic Equation

The next goal is to create a model that calculates the probability a team wins based on certain parameters.

Since wins just a loss and wins, that is 0 for loss or 1 for win. We can’t use linear regression, thus the next model to use is logistic regression.

Recall that the equation for logistic regression is the following:

\[\begin{align*} \hat{Y} &= \beta_0 + \beta_1X_1 + \beta_2X_2 + ... + \beta_nX_n \\ p &= \dfrac{e^{\hat{Y}}}{1 + e^{\hat{Y}}} = \dfrac{e^{\beta_0 + \beta_1X_1 + \beta_2X_2 + ... + \beta_nX_n }}{1 + e^{\beta_0 + \beta_1X_1 + \beta_2X_2 + ... + \beta_nX_n }} \end{align*}\]

For this, p would be the probability of winning and the linear equation is the parameters.

Assumptions

Considering this is a team based game, each member has a particular probability of winning. Example:

If the bottom lane has a 45% of winning, support has a 52%, middle lane has a 90% chance of winning, jungle is 80%, and top is 72%, the chances that the team wins is the following:

\[\begin{align*} P(team prob) = \dfrac{0.45 + 0.52 + 0.90 + 0.80 + 0.72}{5} = 0.678 \end{align*}\]

So we are creating a model based on role.

Then based on my experience plaing League of Legends, these are the specific columns in our match_data dataset:

minionsKilled (cs)
kills, deaths, assists (KDA)
objectives (turretKills,dragonKills, baronKills, elderDragonKills)

These are our predictors.

Creating our logistic equation

TLDR; I added each predictor and checked the AIC to see if it is better. I also used anova statistics as well.

All that was done first is that I changed the win column to 1s and 0s as logistic regression needs it to be numerical:

head(match_data$win, 10)

 [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE

match_data <- match_data %>%
  mutate(win = ifelse(win == "TRUE", 1, 0), 
        win = as.numeric(win)) 
  
head(match_data$win, 10)

 [1] 1 1 1 1 1 0 0 0 0 0

The TLDR; Creating our logistic equation

I first did totalMinionsKilled then kills, deaths, assists

winrate_cs <- glm(win ~ totalMinionsKilled, data = match_data)
summary(winrate_cs)


Call:
glm(formula = win ~ totalMinionsKilled, data = match_data)

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)        4.832e-01  3.599e-03 134.250   <2e-16 ***
totalMinionsKilled 1.357e-04  2.321e-05   5.848    5e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 0.2498533)

    Null deviance: 13710  on 54838  degrees of freedom
Residual deviance: 13701  on 54837  degrees of freedom
AIC: 79575

Number of Fisher Scoring iterations: 2

full_cs_KDA <- glm(win ~ kills + deaths + assists + totalMinionsKilled, 
                           data = match_data, family = "binomial")
summary(full_cs_KDA)


Call:
glm(formula = win ~ kills + deaths + assists + totalMinionsKilled, 
    family = "binomial", data = match_data)

Coefficients:
                     Estimate Std. Error z value Pr(>|z|)    
(Intercept)        -0.4503628  0.0311190  -14.47   <2e-16 ***
kills               0.2044906  0.0033722   60.64   <2e-16 ***
deaths             -0.5545717  0.0053916 -102.86   <2e-16 ***
assists             0.2383885  0.0026258   90.79   <2e-16 ***
totalMinionsKilled  0.0030048  0.0001369   21.95   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 76023  on 54838  degrees of freedom
Residual deviance: 46808  on 54834  degrees of freedom
AIC: 46818

Number of Fisher Scoring iterations: 5

anova(winrate_cs, full_cs_KDA)

Analysis of Deviance Table

Model 1: win ~ totalMinionsKilled
Model 2: win ~ kills + deaths + assists + totalMinionsKilled
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1     54837      13701                     
2     54834      46808  3   -33106

Then made life simpler by converting kills, deaths, and assists to KDA with the following equation and code below:

\[\begin{align*} KDA = \dfrac{Kills + Assists}{Deaths} \end{align*}\]

match_data$KDA <- apply(match_data, 1, function(x) {
  deaths <- as.numeric(x['deaths'])
  (as.numeric(x['kills']) + as.numeric(x['assists'])) / ifelse(deaths == 0, 1, deaths)
})

winrate_cs_KDA <- glm(win ~ (KDA + totalMinionsKilled),
                      data = match_data, family = "binomial")

summary(winrate_cs_KDA)


Call:
glm(formula = win ~ (KDA + totalMinionsKilled), family = "binomial", 
    data = match_data)

Coefficients:
                     Estimate Std. Error z value Pr(>|z|)    
(Intercept)        -2.8342571  0.0297092  -95.40   <2e-16 ***
KDA                 0.9049826  0.0085637  105.68   <2e-16 ***
totalMinionsKilled  0.0019684  0.0001226   16.05   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 76023  on 54838  degrees of freedom
Residual deviance: 48125  on 54836  degrees of freedom
AIC: 48131

Number of Fisher Scoring iterations: 6

Then added objectives

winrate_cs_KDA_objectives <- glm(win ~ (KDA + totalMinionsKilled + turretKills + dragonKills + baronKills + elderDragonsKilled), data = match_data, family = "binomial")

anova(winrate_cs_KDA, winrate_cs_KDA_objectives)

Analysis of Deviance Table

Model 1: win ~ (KDA + totalMinionsKilled)
Model 2: win ~ (KDA + totalMinionsKilled + turretKills + dragonKills + 
    baronKills + elderDragonsKilled)
  Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
1     54836      48125                          
2     54832      44766  4   3358.6 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Then added objectives lost

winrate_cs_KDA_objectives_negatives <- glm(win ~ (KDA + totalMinionsKilled + turretKills + 
           dragonKills + baronKills + elderDragonsKilled +
           turretsLost + inhibitorsLost + 
           objectivesStolen),  
          data = match_data, family = "binomial")
anova(winrate_cs_KDA_objectives, winrate_cs_KDA_objectives_negatives)

Analysis of Deviance Table

Model 1: win ~ (KDA + totalMinionsKilled + turretKills + dragonKills + 
    baronKills + elderDragonsKilled)
Model 2: win ~ (KDA + totalMinionsKilled + turretKills + dragonKills + 
    baronKills + elderDragonsKilled + turretsLost + inhibitorsLost + 
    objectivesStolen)
  Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
1     54832      44766                          
2     54829      23479  3    21287 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Then added interaction with time and role

winrate_cs_KDA_objectives_negatives_with_time_and_position <- glm(win ~ (
  KDA + totalMinionsKilled + turretKills +
           dragonKills + baronKills + elderDragonsKilled +
           turretsLost + inhibitorsLost + objectivesStolen) * (timePlayed + teamPosition),  
          data = match_data, family = "binomial")
summary(winrate_cs_KDA_objectives_negatives_with_time_and_position)


Call:
glm(formula = win ~ (KDA + totalMinionsKilled + turretKills + 
    dragonKills + baronKills + elderDragonsKilled + turretsLost + 
    inhibitorsLost + objectivesStolen) * (timePlayed + teamPosition), 
    family = "binomial", data = match_data)

Coefficients:
                                         Estimate Std. Error z value Pr(>|z|)
(Intercept)                            -1.337e+00  1.594e-01  -8.389  < 2e-16
KDA                                     1.047e+00  6.029e-02  17.368  < 2e-16
totalMinionsKilled                     -3.046e-02  2.001e-03 -15.223  < 2e-16
turretKills                             3.148e+00  1.187e-01  26.528  < 2e-16
dragonKills                             1.096e+00  2.266e-01   4.837 1.32e-06
baronKills                              8.547e+00  6.493e-01  13.163  < 2e-16
elderDragonsKilled                      4.125e+00  1.478e+00   2.790 0.005264
turretsLost                            -1.767e+00  5.502e-02 -32.123  < 2e-16
inhibitorsLost                          3.755e-01  2.132e-01   1.762 0.078140
objectivesStolen                       -9.925e-01  8.847e-01  -1.122 0.261887
timePlayed                              3.899e-03  1.068e-04  36.503  < 2e-16
teamPositionJUNGLE                     -2.410e+00  2.155e-01 -11.183  < 2e-16
teamPositionMIDDLE                     -2.453e-02  1.953e-01  -0.126 0.900066
teamPositionSUPPORT                    -1.393e+00  2.050e-01  -6.794 1.09e-11
teamPositionTOP                         9.559e-02  1.898e-01   0.504 0.614571
KDA:timePlayed                         -2.001e-04  3.004e-05  -6.660 2.74e-11
KDA:teamPositionJUNGLE                 -2.035e-01  4.387e-02  -4.639 3.50e-06
KDA:teamPositionMIDDLE                 -1.483e-01  4.388e-02  -3.379 0.000726
KDA:teamPositionSUPPORT                -1.144e-01  4.080e-02  -2.804 0.005043
KDA:teamPositionTOP                    -2.062e-02  4.759e-02  -0.433 0.664872
totalMinionsKilled:timePlayed           1.169e-05  8.237e-07  14.190  < 2e-16
totalMinionsKilled:teamPositionJUNGLE   1.299e-02  3.718e-03   3.494 0.000477
totalMinionsKilled:teamPositionMIDDLE   4.408e-03  1.541e-03   2.860 0.004235
totalMinionsKilled:teamPositionSUPPORT -2.702e-03  4.442e-03  -0.608 0.542975
totalMinionsKilled:teamPositionTOP      3.340e-03  1.519e-03   2.199 0.027886
turretKills:timePlayed                 -1.273e-03  5.758e-05 -22.104  < 2e-16
turretKills:teamPositionJUNGLE         -1.373e-01  7.681e-02  -1.788 0.073768
turretKills:teamPositionMIDDLE         -2.074e-01  6.494e-02  -3.194 0.001403
turretKills:teamPositionSUPPORT         4.359e-03  1.044e-01   0.042 0.966710
turretKills:teamPositionTOP            -4.035e-01  5.929e-02  -6.805 1.01e-11
dragonKills:timePlayed                 -3.398e-04  9.514e-05  -3.572 0.000355
dragonKills:teamPositionJUNGLE          1.527e-01  1.424e-01   1.072 0.283526
dragonKills:teamPositionMIDDLE         -2.743e-01  2.246e-01  -1.221 0.221980
dragonKills:teamPositionSUPPORT        -2.038e-01  3.069e-01  -0.664 0.506672
dragonKills:teamPositionTOP             1.628e-01  2.370e-01   0.687 0.491974
baronKills:timePlayed                  -3.742e-03  2.742e-04 -13.650  < 2e-16
baronKills:teamPositionJUNGLE           3.303e-01  2.997e-01   1.102 0.270318
baronKills:teamPositionMIDDLE          -6.422e-02  4.597e-01  -0.140 0.888893
baronKills:teamPositionSUPPORT         -4.644e-01  6.974e-01  -0.666 0.505488
baronKills:teamPositionTOP             -3.217e-02  4.491e-01  -0.072 0.942892
elderDragonsKilled:timePlayed          -1.005e-03  5.849e-04  -1.719 0.085667
elderDragonsKilled:teamPositionJUNGLE   1.338e+00  6.924e-01   1.933 0.053239
elderDragonsKilled:teamPositionMIDDLE  -1.622e-01  6.954e-01  -0.233 0.815589
elderDragonsKilled:teamPositionSUPPORT  1.173e-01  6.707e-01   0.175 0.861185
elderDragonsKilled:teamPositionTOP     -6.157e-01  6.667e-01  -0.924 0.355722
turretsLost:timePlayed                  3.361e-04  2.669e-05  12.593  < 2e-16
turretsLost:teamPositionJUNGLE         -4.600e-02  4.723e-02  -0.974 0.330089
turretsLost:teamPositionMIDDLE         -1.815e-02  4.522e-02  -0.401 0.688092
turretsLost:teamPositionSUPPORT         1.235e-01  4.286e-02   2.883 0.003941
turretsLost:teamPositionTOP             1.713e-02  4.412e-02   0.388 0.697800
inhibitorsLost:timePlayed              -4.749e-04  8.995e-05  -5.280 1.29e-07
inhibitorsLost:teamPositionJUNGLE       4.937e-01  1.389e-01   3.554 0.000380
inhibitorsLost:teamPositionMIDDLE      -3.673e-03  1.338e-01  -0.027 0.978099
inhibitorsLost:teamPositionSUPPORT      2.685e-02  1.342e-01   0.200 0.841371
inhibitorsLost:teamPositionTOP          3.440e-03  1.322e-01   0.026 0.979243
objectivesStolen:timePlayed             2.205e-04  3.469e-04   0.636 0.525048
objectivesStolen:teamPositionJUNGLE     7.474e-01  6.101e-01   1.225 0.220574
objectivesStolen:teamPositionMIDDLE     5.304e-01  7.260e-01   0.731 0.465008
objectivesStolen:teamPositionSUPPORT   -2.056e-01  8.027e-01  -0.256 0.797805
objectivesStolen:teamPositionTOP        1.709e-01  7.574e-01   0.226 0.821514
                                          
(Intercept)                            ***
KDA                                    ***
totalMinionsKilled                     ***
turretKills                            ***
dragonKills                            ***
baronKills                             ***
elderDragonsKilled                     ** 
turretsLost                            ***
inhibitorsLost                         .  
objectivesStolen                          
timePlayed                             ***
teamPositionJUNGLE                     ***
teamPositionMIDDLE                        
teamPositionSUPPORT                    ***
teamPositionTOP                           
KDA:timePlayed                         ***
KDA:teamPositionJUNGLE                 ***
KDA:teamPositionMIDDLE                 ***
KDA:teamPositionSUPPORT                ** 
KDA:teamPositionTOP                       
totalMinionsKilled:timePlayed          ***
totalMinionsKilled:teamPositionJUNGLE  ***
totalMinionsKilled:teamPositionMIDDLE  ** 
totalMinionsKilled:teamPositionSUPPORT    
totalMinionsKilled:teamPositionTOP     *  
turretKills:timePlayed                 ***
turretKills:teamPositionJUNGLE         .  
turretKills:teamPositionMIDDLE         ** 
turretKills:teamPositionSUPPORT           
turretKills:teamPositionTOP            ***
dragonKills:timePlayed                 ***
dragonKills:teamPositionJUNGLE            
dragonKills:teamPositionMIDDLE            
dragonKills:teamPositionSUPPORT           
dragonKills:teamPositionTOP               
baronKills:timePlayed                  ***
baronKills:teamPositionJUNGLE             
baronKills:teamPositionMIDDLE             
baronKills:teamPositionSUPPORT            
baronKills:teamPositionTOP                
elderDragonsKilled:timePlayed          .  
elderDragonsKilled:teamPositionJUNGLE  .  
elderDragonsKilled:teamPositionMIDDLE     
elderDragonsKilled:teamPositionSUPPORT    
elderDragonsKilled:teamPositionTOP        
turretsLost:timePlayed                 ***
turretsLost:teamPositionJUNGLE            
turretsLost:teamPositionMIDDLE            
turretsLost:teamPositionSUPPORT        ** 
turretsLost:teamPositionTOP               
inhibitorsLost:timePlayed              ***
inhibitorsLost:teamPositionJUNGLE      ***
inhibitorsLost:teamPositionMIDDLE         
inhibitorsLost:teamPositionSUPPORT        
inhibitorsLost:teamPositionTOP            
objectivesStolen:timePlayed               
objectivesStolen:teamPositionJUNGLE       
objectivesStolen:teamPositionMIDDLE       
objectivesStolen:teamPositionSUPPORT      
objectivesStolen:teamPositionTOP          
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 76023  on 54838  degrees of freedom
Residual deviance: 17194  on 54779  degrees of freedom
AIC: 17314

Number of Fisher Scoring iterations: 8

But since there are some non-significant stuff, we used some magic to find at least a good model.

# stepwise selection
reduced_model <- step(winrate_cs_KDA_objectives_negatives_with_time_and_position, direction = "both")

summary(reduced_model)

This takes too much time so instead I already found that (after waiting 20ish minutes)

reduced_model <- glm(formula = win ~ KDA + totalMinionsKilled + turretKills + dragonKills + baronKills + elderDragonsKilled + turretsLost + inhibitorsLost + timePlayed + teamPosition + KDA:timePlayed + KDA:teamPosition + totalMinionsKilled:timePlayed + totalMinionsKilled:teamPosition +  turretKills:timePlayed + turretKills:teamPosition + dragonKills:timePlayed + dragonKills:teamPosition + baronKills:timePlayed + elderDragonsKilled:timePlayed + elderDragonsKilled:teamPosition + turretsLost:timePlayed + turretsLost:teamPosition + inhibitorsLost:timePlayed + inhibitorsLost:teamPosition, 
    family = "binomial", data = match_data)

Testing the model

Using the following functions, we can create a confusion matrix to find the accuracy of our model.

match_data$predition_win <- predict(reduced_model, newdata = match_data, type = "response")
# ROC curve
roc_obj <- roc(as.numeric(match_data$win), as.numeric(match_data$predition_win))

# find the best threshold of the dataset
best_thresh <- coords(roc_obj, "best", ret = "threshold", best.method = "youden")
best_thresh

  threshold
1 0.5386129

# then find how good it is!
prediction_win1 <- ifelse(match_data$predition_win > 0.5386129, 1, 0)
conf_matrix5 <- confusionMatrix(as.factor(prediction_win1), as.factor(match_data$win),positive="1")
print(conf_matrix5)

Confusion Matrix and Statistics

          Reference
Prediction     0     1
         0 26027  1869
         1  1387 25556
                                          
               Accuracy : 0.9406          
                 95% CI : (0.9386, 0.9426)
    No Information Rate : 0.5001          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.8813          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.9319          
            Specificity : 0.9494          
         Pos Pred Value : 0.9485          
         Neg Pred Value : 0.9330          
             Prevalence : 0.5001          
         Detection Rate : 0.4660          
   Detection Prevalence : 0.4913          
      Balanced Accuracy : 0.9406          
                                          
       'Positive' Class : 1

We see that we have an accuracy of 94%!

Testing the model

To truly test our model, lets look at a game from October 6, Asia Invitational, DK vs JGD

2045_state

This is a game state at 20:57 (its actually 20:45 but lets pretend I knew that) I have grabbed the stats and the game state:

# DK
time = 20*60 + 57
Siwoo <- data.frame(KDA = 4, totalMinionsKilled = 183, turretKills = 1, dragonKills = 0, baronKills = 0, elderDragonsKilled = 0, turretsLost = 2, inhibitorsLost = 0, timePlayed = 20*60 + 57,  teamPosition = "TOP")

Lucid <- data.frame(KDA = as.numeric(8/3), totalMinionsKilled = 136, turretKills = 0, dragonKills = 1, baronKills = 0, elderDragonsKilled = 0, turretsLost = 2, inhibitorsLost = 0, timePlayed = 20*60 + 57,  teamPosition = "JUNGLE")

ShowMaker <- data.frame(KDA = as.numeric(3), totalMinionsKilled = 183, turretKills = 1, dragonKills = 0, baronKills = 0, elderDragonsKilled = 0, turretsLost = 2, inhibitorsLost = 0, timePlayed = 20*60 + 57,  teamPosition = "MIDDLE")

Aiming <- data.frame(KDA = as.numeric(7), totalMinionsKilled = 189, turretKills = 1, dragonKills = 0, baronKills = 0, elderDragonsKilled = 0, turretsLost = 2, inhibitorsLost = 0, timePlayed = 20*60 + 57,  teamPosition = "BOTTOM")

BeryL <- data.frame(KDA = 1, totalMinionsKilled = 38, turretKills = 1, dragonKills = 0, baronKills = 0, elderDragonsKilled = 0, turretsLost = 2, inhibitorsLost = 0, timePlayed = 20*60 + 57,  teamPosition = "SUPPORT")

p1 <- predict(reduced_model, newdata = Siwoo, type = "response")
p2 <- predict(reduced_model, newdata = Lucid, type = "response")
p3 <- predict(reduced_model, newdata = ShowMaker, type = "response")
p4 <- predict(reduced_model, newdata = Aiming, type = "response")
p5 <- predict(reduced_model, newdata = BeryL, type = "response")

DK_winrate = (p1 + p2 + p3 + p4 + p5) / 5
DK_winrate

        1 
0.8432843

#JDG
Xiaoxu <- data.frame(KDA = as.numeric(2/3), totalMinionsKilled = 186, turretKills = 1, dragonKills = 0, baronKills = 0, elderDragonsKilled = 0, turretsLost = 3, inhibitorsLost = 0, timePlayed = 20*60 + 57,  teamPosition = "TOP")

Xun <- data.frame(KDA = as.numeric(8), totalMinionsKilled = 159, turretKills = 0, dragonKills = 1, baronKills = 0, elderDragonsKilled = 0, turretsLost = 3, inhibitorsLost = 0, timePlayed = 20*60 + 57,  teamPosition = "JUNGLE")

Scout <- data.frame(KDA = as.numeric(5), totalMinionsKilled = 191, turretKills = 0, dragonKills = 0, baronKills = 0, elderDragonsKilled = 0, turretsLost = 3, inhibitorsLost = 0, timePlayed = 20*60 + 57,  teamPosition = "MIDDLE")

Peyz <- data.frame(KDA = as.numeric(3), totalMinionsKilled = 178, turretKills = 1, dragonKills = 0, baronKills = 0, elderDragonsKilled = 0, turretsLost = 3, inhibitorsLost = 0, timePlayed = 20*60 + 57,  teamPosition = "BOTTOM")

Zhou <- data.frame(KDA = 4.5, totalMinionsKilled = 24, turretKills = 1, dragonKills = 0, baronKills = 0, elderDragonsKilled = 0, turretsLost = 3, inhibitorsLost = 0, timePlayed = 20*60 + 57,  teamPosition = "SUPPORT")

p6 <- predict(reduced_model, newdata = Xiaoxu, type = "response")
p7 <- predict(reduced_model, newdata = Xun, type = "response")
p8 <- predict(reduced_model, newdata = Scout, type = "response")
p9 <- predict(reduced_model, newdata = Peyz, type = "response")
p10 <- predict(reduced_model, newdata = Zhou, type = "response")

JDG_winrate = (p6 + p7 + p8 + p9 + p10) / 5
JDG_winrate

        1 
0.6784764

We find that DK has the higher win rate over JDG Is this True?

Testing the Model

2801_gamestate This is them 8 minutes later.

Who has the higher winrate now?

# DK
time = 28*60 + 1
Siwoo <- data.frame(KDA = 7, totalMinionsKilled = 254, turretKills = 1, dragonKills = 0, baronKills = 0, elderDragonsKilled = 0, turretsLost = 2, inhibitorsLost = 0, timePlayed = time,  teamPosition = "TOP")

Lucid <- data.frame(KDA = as.numeric(13/3), totalMinionsKilled = 176, turretKills = 0, dragonKills = 2, baronKills = 0, elderDragonsKilled = 0, turretsLost = 2, inhibitorsLost = 0, timePlayed = time,  teamPosition = "JUNGLE")

ShowMaker <- data.frame(KDA = as.numeric(6), totalMinionsKilled = 246, turretKills = 1, dragonKills = 0, baronKills = 0, elderDragonsKilled = 0, turretsLost = 2, inhibitorsLost = 0, timePlayed = time,  teamPosition = "MIDDLE")

Aiming <- data.frame(KDA = as.numeric(12), totalMinionsKilled = 272, turretKills = 1, dragonKills = 0, baronKills = 0, elderDragonsKilled = 0, turretsLost = 2, inhibitorsLost = 0, timePlayed = time,  teamPosition = "BOTTOM")

BeryL <- data.frame(KDA = as.numeric(7/6), totalMinionsKilled = 45, turretKills = 1, dragonKills = 0, baronKills = 0, elderDragonsKilled = 0, turretsLost = 2, inhibitorsLost = 0, timePlayed = time,  teamPosition = "SUPPORT")

p1 <- predict(reduced_model, newdata = Siwoo, type = "response")
p2 <- predict(reduced_model, newdata = Lucid, type = "response")
p3 <- predict(reduced_model, newdata = ShowMaker, type = "response")
p4 <- predict(reduced_model, newdata = Aiming, type = "response")
p5 <- predict(reduced_model, newdata = BeryL, type = "response")

DK_winrate = (p1 + p2 + p3 + p4 + p5) / 5
DK_winrate

        1 
0.9842533

#JDG
Xiaoxu <- data.frame(KDA = as.numeric(2/5), totalMinionsKilled = 242, turretKills = 1, dragonKills = 0, baronKills = 0, elderDragonsKilled = 0, turretsLost = 3, inhibitorsLost = 0, timePlayed = time,  teamPosition = "TOP")

Xun <- data.frame(KDA = as.numeric(10/4), totalMinionsKilled = 192, turretKills = 0, dragonKills = 1, baronKills = 0, elderDragonsKilled = 0, turretsLost = 3, inhibitorsLost = 0, timePlayed = time,  teamPosition = "JUNGLE")

Scout <- data.frame(KDA = as.numeric(5), totalMinionsKilled = 267, turretKills = 0, dragonKills = 0, baronKills = 0, elderDragonsKilled = 0, turretsLost = 3, inhibitorsLost = 0, timePlayed = time,  teamPosition = "MIDDLE")

Peyz <- data.frame(KDA = as.numeric(4), totalMinionsKilled = 262, turretKills = 1, dragonKills = 0, baronKills = 0, elderDragonsKilled = 0, turretsLost = 3, inhibitorsLost = 0, timePlayed = time,  teamPosition = "BOTTOM")

Zhou <- data.frame(KDA = as.numeric(11/3), totalMinionsKilled = 33, turretKills = 1, dragonKills = 0, baronKills = 0, elderDragonsKilled = 0, turretsLost = 3, inhibitorsLost = 0, timePlayed = time,  teamPosition = "SUPPORT")

p6 <- predict(reduced_model, newdata = Xiaoxu, type = "response")
p7 <- predict(reduced_model, newdata = Xun, type = "response")
p8 <- predict(reduced_model, newdata = Scout, type = "response")
p9 <- predict(reduced_model, newdata = Peyz, type = "response")
p10 <- predict(reduced_model, newdata = Zhou, type = "response")

JDG_winrate = (p6 + p7 + p8 + p9 + p10) / 5
JDG_winrate

        1 
0.8689398

Despite still being fairly even, my model is telling that DK has the clear advantage over JDG.

Testing the Model

Finally, we check on them a couple minutes later:

3103_gamestate

Its clear now that DK is winning.

Summary and Overview

Exploratory Analysis

We found out which side has the higher winrate:

ggplot(data = match_wins_stats, aes(teamId, wins)) +
  geom_col(fill = c("blue", "red"), col = "black")

Exploratory Analysis

We found which champions are most picked for each role:

ggplot(top_per_role, aes(x = fct_reorder(championName, -percent_picked), y = percent_picked, fill = championName)) + geom_col() + facet_wrap(~ teamPosition, scales = "free") +  
  theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "none") +
  labs(title = "Champion Pick Rates by Position", x = "Champion", y = "Percent Picked")

Exploratory Analysis

Found which champions were most banned

ggplot(data = banned_champions %>% slice_head(n = 20), 
       aes(x = reorder(championName, -percent_banned), y = percent_banned, fill = championName)) +
  geom_col() +
  labs(title = paste0("Most banned champion in every role"), 
       x = "Champion Name", y = "Percent Banned in over 5000 matches") + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
  theme(legend.position = "none")

Exploratory Analysis

Found which bot duos was popular

kable(bot_duos_summary,
  col.names = c("Bot Duo", "Total Count", "Percent Picked"),
      caption = paste("Top 10 Most Played Bot Duos"))

Top 10 Most Played Bot Duos
Bot Duo	Total Count	Percent Picked
Lucian + Nami	167	3.044667
Lulu + Yunara	149	2.716499
Alistar + Yunara	85	1.549681
Braum + Lucian	80	1.458523
Kaisa + Nautilus	80	1.458523
Lucian + Milio	75	1.367366
Ezreal + Karma	73	1.330903
Aphelios + Lulu	72	1.312671
Sona + Yunara	69	1.257976
Aphelios + Thresh	66	1.203282

Exploratory Analysis

Finally, kinda found popular matchups

ggplot(top_matchups_per_role, aes(x = fct_reorder(match_up, -percent_picked), 
                                  y = percent_picked, fill = match_up)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ teamPosition, scales = "free") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(title = "Top 10 Most Picked Champion Matchups per Role",
       x = "Matchup",
       y = "Percent Picked")

Statistical Analysis

Made a model that can predict winrates for each side

summary(reduced_model)


Call:
glm(formula = win ~ KDA + totalMinionsKilled + turretKills + 
    dragonKills + baronKills + elderDragonsKilled + turretsLost + 
    inhibitorsLost + timePlayed + teamPosition + KDA:timePlayed + 
    KDA:teamPosition + totalMinionsKilled:timePlayed + totalMinionsKilled:teamPosition + 
    turretKills:timePlayed + turretKills:teamPosition + dragonKills:timePlayed + 
    dragonKills:teamPosition + baronKills:timePlayed + elderDragonsKilled:timePlayed + 
    elderDragonsKilled:teamPosition + turretsLost:timePlayed + 
    turretsLost:teamPosition + inhibitorsLost:timePlayed + inhibitorsLost:teamPosition, 
    family = "binomial", data = match_data)

Coefficients:
                                         Estimate Std. Error z value Pr(>|z|)
(Intercept)                            -1.343e+00  1.594e-01  -8.425  < 2e-16
KDA                                     1.050e+00  6.033e-02  17.402  < 2e-16
totalMinionsKilled                     -3.031e-02  2.000e-03 -15.155  < 2e-16
turretKills                             3.150e+00  1.185e-01  26.589  < 2e-16
dragonKills                             1.045e+00  2.244e-01   4.657 3.21e-06
baronKills                              8.684e+00  5.903e-01  14.712  < 2e-16
elderDragonsKilled                      4.136e+00  1.479e+00   2.797 0.005163
turretsLost                            -1.772e+00  5.496e-02 -32.242  < 2e-16
inhibitorsLost                          3.763e-01  2.127e-01   1.769 0.076881
timePlayed                              3.901e-03  1.067e-04  36.550  < 2e-16
teamPositionJUNGLE                     -2.410e+00  2.153e-01 -11.192  < 2e-16
teamPositionMIDDLE                     -2.143e-02  1.952e-01  -0.110 0.912613
teamPositionSUPPORT                    -1.391e+00  2.051e-01  -6.783 1.18e-11
teamPositionTOP                         9.589e-02  1.898e-01   0.505 0.613489
KDA:timePlayed                         -2.016e-04  3.004e-05  -6.711 1.94e-11
KDA:teamPositionJUNGLE                 -2.053e-01  4.381e-02  -4.686 2.79e-06
KDA:teamPositionMIDDLE                 -1.485e-01  4.393e-02  -3.381 0.000721
KDA:teamPositionSUPPORT                -1.144e-01  4.086e-02  -2.801 0.005095
KDA:teamPositionTOP                    -2.213e-02  4.761e-02  -0.465 0.642098
totalMinionsKilled:timePlayed           1.159e-05  8.216e-07  14.104  < 2e-16
totalMinionsKilled:teamPositionJUNGLE   1.335e-02  3.693e-03   3.614 0.000302
totalMinionsKilled:teamPositionMIDDLE   4.394e-03  1.540e-03   2.853 0.004336
totalMinionsKilled:teamPositionSUPPORT -2.679e-03  4.447e-03  -0.602 0.546939
totalMinionsKilled:teamPositionTOP      3.341e-03  1.519e-03   2.200 0.027825
turretKills:timePlayed                 -1.275e-03  5.748e-05 -22.182  < 2e-16
turretKills:teamPositionJUNGLE         -1.361e-01  7.656e-02  -1.778 0.075453
turretKills:teamPositionMIDDLE         -2.055e-01  6.490e-02  -3.166 0.001544
turretKills:teamPositionSUPPORT         9.110e-03  1.045e-01   0.087 0.930507
turretKills:teamPositionTOP            -4.021e-01  5.923e-02  -6.790 1.12e-11
dragonKills:timePlayed                 -3.235e-04  9.427e-05  -3.431 0.000601
dragonKills:teamPositionJUNGLE          1.757e-01  1.413e-01   1.244 0.213611
dragonKills:teamPositionMIDDLE         -2.648e-01  2.213e-01  -1.197 0.231466
dragonKills:teamPositionSUPPORT        -2.422e-01  3.014e-01  -0.804 0.421667
dragonKills:teamPositionTOP             1.607e-01  2.343e-01   0.686 0.492841
baronKills:timePlayed                  -3.699e-03  2.734e-04 -13.531  < 2e-16
elderDragonsKilled:timePlayed          -1.010e-03  5.857e-04  -1.724 0.084728
elderDragonsKilled:teamPositionJUNGLE   1.322e+00  6.980e-01   1.894 0.058236
elderDragonsKilled:teamPositionMIDDLE  -1.615e-01  6.985e-01  -0.231 0.817147
elderDragonsKilled:teamPositionSUPPORT  1.303e-01  6.746e-01   0.193 0.846839
elderDragonsKilled:teamPositionTOP     -6.050e-01  6.693e-01  -0.904 0.366071
turretsLost:timePlayed                  3.393e-04  2.662e-05  12.743  < 2e-16
turretsLost:teamPositionJUNGLE         -3.919e-02  4.700e-02  -0.834 0.404341
turretsLost:teamPositionMIDDLE         -1.871e-02  4.519e-02  -0.414 0.678847
turretsLost:teamPositionSUPPORT         1.217e-01  4.284e-02   2.840 0.004508
turretsLost:teamPositionTOP             1.723e-02  4.409e-02   0.391 0.695981
inhibitorsLost:timePlayed              -4.795e-04  8.969e-05  -5.346 8.98e-08
inhibitorsLost:teamPositionJUNGLE       4.988e-01  1.381e-01   3.612 0.000304
inhibitorsLost:teamPositionMIDDLE       4.746e-03  1.332e-01   0.036 0.971577
inhibitorsLost:teamPositionSUPPORT      2.991e-02  1.338e-01   0.224 0.823120
inhibitorsLost:teamPositionTOP          8.622e-03  1.317e-01   0.065 0.947791
                                          
(Intercept)                            ***
KDA                                    ***
totalMinionsKilled                     ***
turretKills                            ***
dragonKills                            ***
baronKills                             ***
elderDragonsKilled                     ** 
turretsLost                            ***
inhibitorsLost                         .  
timePlayed                             ***
teamPositionJUNGLE                     ***
teamPositionMIDDLE                        
teamPositionSUPPORT                    ***
teamPositionTOP                           
KDA:timePlayed                         ***
KDA:teamPositionJUNGLE                 ***
KDA:teamPositionMIDDLE                 ***
KDA:teamPositionSUPPORT                ** 
KDA:teamPositionTOP                       
totalMinionsKilled:timePlayed          ***
totalMinionsKilled:teamPositionJUNGLE  ***
totalMinionsKilled:teamPositionMIDDLE  ** 
totalMinionsKilled:teamPositionSUPPORT    
totalMinionsKilled:teamPositionTOP     *  
turretKills:timePlayed                 ***
turretKills:teamPositionJUNGLE         .  
turretKills:teamPositionMIDDLE         ** 
turretKills:teamPositionSUPPORT           
turretKills:teamPositionTOP            ***
dragonKills:timePlayed                 ***
dragonKills:teamPositionJUNGLE            
dragonKills:teamPositionMIDDLE            
dragonKills:teamPositionSUPPORT           
dragonKills:teamPositionTOP               
baronKills:timePlayed                  ***
elderDragonsKilled:timePlayed          .  
elderDragonsKilled:teamPositionJUNGLE  .  
elderDragonsKilled:teamPositionMIDDLE     
elderDragonsKilled:teamPositionSUPPORT    
elderDragonsKilled:teamPositionTOP        
turretsLost:timePlayed                 ***
turretsLost:teamPositionJUNGLE            
turretsLost:teamPositionMIDDLE            
turretsLost:teamPositionSUPPORT        ** 
turretsLost:teamPositionTOP               
inhibitorsLost:timePlayed              ***
inhibitorsLost:teamPositionJUNGLE      ***
inhibitorsLost:teamPositionMIDDLE         
inhibitorsLost:teamPositionSUPPORT        
inhibitorsLost:teamPositionTOP            
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 76023  on 54838  degrees of freedom
Residual deviance: 17205  on 54789  degrees of freedom
AIC: 17305

Number of Fisher Scoring iterations: 8

MA500 Presentation: League of Legends Anaylsis and Modeling

Introduction to Data set

What is the dataset about?

What is in the dataset?

Introduction to Data set

Where did I get the dataset?

How big is the dataset?

Question Formulation and Hypothesis

General Questions

Cleaning/Mutating the dataset

Preliminary

Cleaning/Mutating Datasets

Cleaning/Mutating the dataset

Cleaning/Mutating the dataset

Answering General Questions (Exploratory Analysis)

Which side statistically wins more games?

Which side statistically wins more games?

Which champion is most picked per role?

Which champion is most picked per role?

Which champion is most banned per role?

Which champion is most banned per role?

What bot duo is most picked?

Most popular matchups

Most popular matchups (by role)

Statistical Analysis

Logistic Equation

Assumptions

Creating our logistic equation

The TLDR; Creating our logistic equation

Testing the model

Testing the model

Testing the Model

Testing the Model

Summary and Overview

Exploratory Analysis

Exploratory Analysis

Exploratory Analysis

Exploratory Analysis

Exploratory Analysis

Statistical Analysis

Questions?