Welcome to my Data Science project: predicting the winner of everyone’s favorite game, Hearthstone. Please enjoy.
Hearthstone is a complicated card game, but for our purposes we’re going to call it incosistent rock paper scissors. What I mean by this is simple: in one game of Hearthstone, we choose a deck to play. That deck has good matchups, and bad matchups. But each deck beats different things. My “deck 1” might beat “deck 2”, but it might not beat “deck 3”. That “deck 3” might beat “deck 2” also.
My goal with this project was not to predict a single game of Hearthstone, but rather a whole match. The specific format of this match is called Last Hero Standing, or LHS for short. To concisely explain how this works, I will show a short video.
In simulating matches of LHS, I hope to find ways in which one can help improve their chances to win a match. In order to fully explain this idea I need to delve into the code.
Let’s take a look at the code, chunk by chunk.
rm(list=ls())
num.sim <- 1000
full_results <- data.frame(match_result = rep(NA, num.sim), my_ban = rep(NA, num.sim),
opp_ban = rep(NA, num.sim), first_queue = rep(NA, num.sim), game_result_1 =rep(NA, num.sim))
Hearthstone_MU_Data <- read_csv("metastats_rank_all.csv")
What we have here is the creation of our data frame, full_results. This data frame is being created now so that later we can assign variables from the simulation to it.
Hearthstone_MU_Data is our data, and I will show you the source and what it means.
Next, we create our most important friend, decks
for (i in 1:num.sim){
Matchup_Sim <- function(my_decks=c("Midrange Hunter", "Khadgar Mage", "Control Warrior", "Tempo Rogue"),
opp_decks=c("Bomb Warrior", "Token Druid", "Midrange Hunter", "Murloc Shaman")){
n1 <- sample.int(4, 1)
n2 <- sample.int(4, 1)
decks <- list()
original_decks <- list()
post_ban_decks <- list()
banned_decks <- list()
post_ban_decks$my <- my_decks[-n1]
post_ban_decks$opp <- opp_decks[-n2]
original_decks$my <- my_decks
original_decks$opp <- opp_decks
banned_decks$my <- my_decks[n1]
banned_decks$opp <- opp_decks[n2]
upcoming_matchup <- list()
first_queue <- list()
unused_decks <- post_ban_decks
game_5_result <- list()
game_4_result <- list()
game_3_result <- list()
game_2_result <- list()
game_1_result <- list()
game_5_result$my <- "tbd"
game_5_result$opp <- "tbd"
game_4_result$my <- "tbd"
game_4_result$opp <- "tbd"
game_3_result$my <- "tbd"
game_3_result$opp <- "tbd"
game_2_result$my <- "tbd"
game_2_result$opp <- "tbd"
game_1_result$my <- "tbd"
game_1_result$opp <- "tbd"
decks <- list(original_decks=original_decks,
post_ban_decks=post_ban_decks,
banned_decks=banned_decks,
first_queue=first_queue,
upcoming_matchup=upcoming_matchup,
unused_decks=unused_decks,
game_1_result=game_1_result,
game_2_result=game_2_result,
game_3_result=game_3_result,
game_4_result=game_4_result,
game_5_result=game_5_result)
return(decks)
}
decks <- Matchup_Sim()
}
You might also realize that this is the start of our for loop.
Decks is a variable filled with tons of fun things. First, we take four different decks for each player, and assign that to original decks. Then we take one away, like the banning process in the video, apply those 3 remaining decks to PBD. Everything else created is something to be stored for later, which we’ll get to in a second.
Now we get to the fun part: predicting the winner of each game. Let’s take a look:
Game_1_Matchup <- function(decks){
n1 <- sample.int(3, 1)
n2 <- sample.int(3, 1)
my_deck_1 <- decks$post_ban_decks$my[n1]
opp_deck_1 <- decks$post_ban_decks$opp[n2]
decks$unused_decks$my <- decks$unused_decks$my[-n1]
decks$unused_decks$opp <- decks$unused_decks$opp[-n2]
decks$upcoming_matchup$my <- my_deck_1
decks$first_queue <- my_deck_1
decks$upcoming_matchup$opp <- opp_deck_1
return(decks)
}
decks <- Game_1_Matchup(decks)
Game_Result_1 <- function(decks, matchup_stats=Hearthstone_MU_Data){
result <- matchup_stats %>% filter(player_archetype==decks$upcoming_matchup$my,
opponent_archetype==decks$upcoming_matchup$opp) %>% mutate(win = 1*(runif(1)<win_rate/100)) %>% select(win) %>% as.numeric()
decks$game_1_result$my <- NA
decks$game_1_result$opp <- NA
if(result>0){
decks$game_1_result$my = "win"; decks$game_1_result$opp = "loss";
decks$upcoming_matchup$opp = NA
}
else{
decks$game_1_result$my = "loss"; decks$game_1_result$opp = "win";
decks$upcoming_matchup$my = NA
}
return(decks)
}
decks <- Game_Result_1(decks)
First we need to take a deck from each side and play them against each other. In the first chunk of code, all we are doing is selecting a random deck from our 3 and removing it from unused decks and applying it to upcoming decks. Then in the second chunk, we get to project a winner. We do this by taking the matchup stats, where we can find a winrate for the matchup, and running that as a probability. Then the code projects a winner based off that probability. So a deck that wins 30% of the time will be picked to win accordingly. Next, we create a result under decks to keep track for future use.
Onwards.
Game_Matchup_2 <- function(decks, matchup_stats = Hearthstone_MU_Data){
if(decks$game_1_result$my=="win"){
decks$upcoming_matchup$my = decks$upcoming_matchup$my;
decks$upcoming_matchup$opp = matchup_stats %>%
filter(player_archetype==decks$upcoming_matchup$my,
opponent_archetype %in% decks$unused_decks$opp) %>%
top_n(1, desc(win_rate)) %>% select(opponent_archetype) %>% as.character();
left.over <- !(decks$unused_decks$opp==decks$upcoming_matchup$opp)
decks$unused_decks$opp <- decks$unused_decks$opp[left.over]
}
else{
decks$upcoming_matchup$opp = decks$upcoming_matchup$opp;
decks$upcoming_matchup$my = matchup_stats %>%
filter(player_archetype %in% decks$unused_decks$my,
opponent_archetype==decks$upcoming_matchup$opp) %>%
top_n(1, win_rate) %>% select(player_archetype) %>% as.character();
left.over <- !(decks$unused_decks$my==decks$upcoming_matchup$my)
decks$unused_decks$my <- decks$unused_decks$my[left.over]
}
return(decks)
}
decks <- Game_Matchup_2(decks)
Game_Result_2 <- function(decks, matchup_stats=Hearthstone_MU_Data){
result <- matchup_stats %>%
filter(player_archetype==decks$upcoming_matchup$my,
opponent_archetype==decks$upcoming_matchup$opp) %>%
mutate(win = 1*(runif(1)<win_rate/100)) %>% select(win) %>% as.numeric()
decks$game_2_result$my <- NA
decks$game_2_result$opp <- NA
if(result>0){
decks$game_2_result$my = "win"; decks$game_2_result$opp = "loss";
decks$upcoming_matchup$opp = NA
}
else{
decks$game_2_result$my = "loss"; decks$game_2_result$opp = "win"
decks$upcoming_matchup$my = NA
}
return(decks)
}
After picking our first winner, we need to see what our next matchup is. Following the video from earlier, we need for the losing player to pick the best deck to play against the winning player’s deck. So looking to our code, we see a big ifelse function. This function looks to see whether or not we won the first game. If we did, our deck stays the same, and our opponent uses the matchup data to find the best deck against ours from their unused decks. That deck gets moved to upcoming matchup. If we lost, it’s the opposite. Game 2 is predicted just like game 1 was. This exact process is repeated for the rest of the code, but something special is added after Game 3.
The goal of this simulation is to predict a winner, so we need the simulation to check for a winner. This is how I did that:
decks$match_result <- NA
if(decks$upcoming_matchup$my == "character(0)"){
decks$match_result = 0
}
else{
decks$match_result = NA
}
if(decks$upcoming_matchup$opp == "character(0)"){
decks$match_result = 1
}
if(is.na(decks$match_result)){
This code uses the ifelse to look for this deck called character(0). Obviously that is just R saying that there are no more variables in that slot to be called upon, but since it’s named that we can just ask R to say if we don’t have any more decks to play, we lose. Otherwise, we don’t know who won yet. If our opponent doesn’t have any more decks to play, we win. Otherwise, if that match win is still unknown, go back to predicting the next individual game. This check runs every game until game 5, when it would be impossible for a player to not have won yet.
Now we know how to get our data. So let’s get it! Fortunately I already ran this for 10,000,000 simulations last night (sorry Jared), so our data will be ready to delve into!
Let’s look at the data:
head(full_results)
## match_result my_ban opp_ban first_queue
## 1 0 Khadgar Mage Token Druid Control Warrior
## 2 0 Khadgar Mage Murloc Shaman Midrange Hunter
## 3 0 Khadgar Mage Token Druid Tempo Rogue
## 4 1 Midrange Hunter Bomb Warrior Control Warrior
## 5 0 Midrange Hunter Murloc Shaman Control Warrior
## 6 1 Control Warrior Midrange Hunter Midrange Hunter
## game_result_1
## 1 loss
## 2 loss
## 3 loss
## 4 win
## 5 loss
## 6 loss
Here are some of the most important ways we can use this data:
full_results %>% summarise(mean(match_result))
## mean(match_result)
## 1 0.4914
full_results %>% filter(my_ban=="Tempo Rogue") %>% summarise(mean(match_result))
## mean(match_result)
## 1 0.4722675
full_results %>% filter(opp_ban=="Bomb Warrior") %>% summarise(mean(match_result))
## mean(match_result)
## 1 0.485786
full_results %>% filter(first_queue=="Midrange Hunter") %>% summarise(mean(match_result))
## mean(match_result)
## 1 0.4868316
full_results %>% filter(game_result_1=="win") %>% summarise(mean(match_result))
## mean(match_result)
## 1 0.653723
And to leave, here is this code working for real matches:
ME its a 45 across the board if he bans ur tempo rogue hunter queue is technically best your best win rate if he bans tempo rogue is 45% and thats u ban mid hunter and queue mid hunter If he doesn’t ban tempo rogue u queue that
Player Okay, So if he no ban rogue I queue rogue If he does I queue hunter?
Me yes and ban hunter ofc ur amazing if u ban bomb warrior and queue rogue but the chance he doesnt ban rogue is small and if u miss on that its rough
Player Won 3-2 He banned Mid hunter and I queued and won with rogue until he countered with warrior, then the match just played out but since I had the g1 rogue win it was over.