Goal & Motivation

Use the player-reported Hearthstone: Heroes of Warcraft Arena deck composition to predict deck success. This investigation looks at whether the success of a player can be predicted by the cards that player selects during a deck-selection phase. There is skill required at all levels of the game (from deck-selection to the gameplay itself), however, deck selection is one aspect where a player can look to a variety of outside sources for help

Methods and Techniques

This study made use of MySQL to access card data and made extensive use of the dplyr and caret libraries for data frame manipulation and machine learning, respectively. For modeling, I used Random Forest and Gradient Boosting Machines (GBM), evaluating model performance using area under the receiver operating characteristic (ROC) curve. The source code for this report can be found at my GitHub repository.

About the Game

Hearthstone is turn-based electronic card game where players assume the role of one of nine heroes including a mage, a rogue, and a warrior. Each hero competes using 30-card decks of creatures (minions), spells, and weapons to reduce their opponent’s health points to zero. These decks may contain any combination of cards from a pool of common cards and cards that only that hero may weild. For instance, mages have access to Fireball, but not to Fiery War Axe–a warrior card–whereas both have access to the Abomination. This study focuses on Arena Mode, where players craft their decks by drafting one card at a time, making this choice among three random cards each round. Since players do not know which cards may appear in later rounds, they must choose these cards based on their perceived value among the cards shown or based on any synergy among the cards they have already selected.

The Data Source

There are no built-in quantitative datasets available to the player by way of a profile or ranking list. Dedicated players can record and track their progress in these arena matches on websites such as Arena Mastery. A player may keep track of which cards they had in their deck, how many wins/losses they ended their run with, and even which heroes they lost to. Because Blizzard Entertainment, the game’s creator, has not released an API for this game, these websites are the best way for outsiders to acquire quantitative information on gameplay. The webmaster and developer of Arena Mastery made available a de-personalized portion of this data for use in this project. This subset includes data from the game’s release until late November, 2014, over 90,000 completed decks by over 9,000 unique players in total. It is because of his hard work and the self-reporting of scores and decks from players around the world that this work was made possible.

The data set contains several SQL tables that track player performance. The tables accessed in this study are described below:

The RMySQL library was essential to reading these data and allowed the selection of only variables of interest to be loaded, saving memory.

Data Tidying

The data were generally pretty clean. Most of what was done in this section was done by merges, joins and filters through the dplyr package.

Load and clean the cardPool data, i.e. details about the cards that could be chosen

Here, the primary variables I was concerned with were:

  • cardId: A unique numeric card identifier

  • cardName: Name of the card

  • cardSet: The set to which the card belongs (i.e. original release or expansion)

  • cardRarity: A factor determining the level of rarity of the card (common, rare, epic, or legendary)

  • cardType: Is the card a minion/creature, a spell, or a weapon

  • cardClass: Which class can use the card (0 is common to all classes)

  • cardCost: The card’s mana cost

  • cardText: Any additional text that described other card attributes.

For example, the Abomination card appears like this to a player:

And has a corresponding entry in cardPool that looks like this:

cardId cardName cardSet cardRarity cardType cardClass cardCost cardText
121 Abomination 1 Rare Minion Neutral 5 Taunt. Deathrattle: Deal 2 damage to ALL characters.

Later, the cardText was parsed in order to pick out special features like “Taunt” or “Damage.”

Merge arena results and deck selection records

With the card pool in place, the next step was to load the arena records themselves. The variables of interest are as follows:

  • arenaId: A unique identifier for the arena run

  • arenaPlayerId: A unique player ID (account number)

  • arenaClassId: The ID of the class being played during the arena run

  • arenaOfficialWins: Number of wins

  • arenaOfficialLosses: Number of losses

  • arenaWins: Number of wins (not including “disconnects”)

  • arenaLosses: Number of losses (not including “disconnects”)

  • arenaRetireEarly: Did the player end the run early

  • arenaStartDate: When was the arena deck selected, starting that run

arenaId arenaPlayerId arenaClassId wins losses arenaStartDate
107774 6925 Mage 8 3 2014-03-13 04:00:00
109627 9605 Shaman 9 3 2014-05-31 04:00:00
145994 10193 Mage 12 2 2014-03-12 04:00:00
150369 8624 Druid 1 3 2014-03-16 04:00:00
153911 8530 Rogue 4 3 2014-03-12 04:00:00

Official vs. Reported wins/losses

Here, I made the choice to only look at reported (“unofficial”) wins and losses, rather than official outcomes. Players have the option to flag a win or a loss as if the game ended in either opponent disconnecting early. For example, if a player’s connection to the server fails and causes a game loss, that player may choose to report the loss, or evaluate his/her standing at the time of disconnect and flag the game as a “win.” Out of 235159 arena games, games flagged in this way account for an over-reporting of wins by 0.3% and an under-reporting of losses by 0.76%.

The final stage was to join arena records and card choices. Here is where most of the data were discarded. It turns out that only 33.694% of games had full decks associated with them. This small subset was not randomly selected and a students’ t-test indicates that those who recorded their decks had a mean win rate that was lower than those who did not, by about 0.5 at 95% confidence.

To sum up, deck lists associated with the win rates of over 90,000 games have been cleaned, merged, and subsetted. The next steps were to determine which features would help summarize deck quality and determine if “powerful” deck lists were associated with high win rates.

Feature Generation

The data were rich with information about each card, however there were some other features that might be useful in predicting the success of a deck. For example, some players value cards that allow them to draw more cards when played. Some might value minions that “taunt” other minions, thereby protecting their life points. Furthermore, the original data set had information about which cards were selected and which were seen in a given draft round. From this information, I can learn the popularity of a card among the community of players reporting the data and get some measure as to the strength (or perceived strength) of that card. Finally, I wanted some idea of the strength of an individual card measured by how the mean win rate of decks with one or more copies of that card differed from those with no copies.

Detailed Card/Deck Attributes (Parsing Card Text)

Since some of the most powerful aspects of a card are written in the card text. This text was parsed for certain functions and those were stored as a boolean variable.

For example, the Abomination would show the following relevant TRUE flags (the FALSE flags have been omitted for brevity):

cardName hasTaunt hasDeathrattle hasDamage hasAOEdmg

The complete list of card abilities was added to the provided list of card attributes so that the more detailed characteristics (beyond card cost, card class, etc) of each deck could be summarized later.

Determine Card Popularity Ranks

Since every player is shown a random set of cards , it is difficult to separate those who make poor selections from those who have bad luck, i.e. those who do not pick the best cards vs. those who cannot. In the data set, cards are recorded whether they are selected (isSelected==1) or not (isSelected==0); so, an objective measure of card popularity would be to normalize a card’s selection by how often it appeared as an option.

For example, the top 10 cards selected by Mages (among all deck win rates, 0-12) are:

rank cardName fractionPicked
1 Ragnaros the Firelord 0.9683908
2 Azure Drake 0.9663462
3 Fireball 0.9428052
4 Argent Commander 0.9258144
5 Pyroblast 0.9179663
6 Flamestrike 0.9158199
7 Ysera 0.9003115
8 Water Elemental 0.8986598
9 Frostbolt 0.8894714
10 Chillwind Yeti 0.8656741

These ranks, when averaged among 30-card dekcs, would give some clue as to how many popular cards made it into the deck in question. More popular cards might indicate access to or selection of cards that the population as a whole agreed were “good.” Furthermore, each individual card’s rank was used to weight the card attributes when summarizing the decks. For example, not all cards with the taunt attribute are created equally. This ensured that a deck with 2 popular taunt cards would receive a different score than a deck with two unpopular taunt cards.

Card and Deck Swing

The last feature I wanted to generate was the effective swing of a card. For each card, I calculated the mean win rate of decks without that card then the mean win rate for decks with 1 or more copies of that card, comparing the change in win rate with respect to the mean win rate of the class. For example, the “0-1” swing for Chillwind Yeti is +0.1, meaning that decks that have 1 copy of that card average 0.1 wins more than decks with no copies of that card, decks that have 3 copies perform even better. Flamestrike, has much greater positive swing values, perhaps suggesting it is a very important card to draft. These values have also been tabulated on wowmetrics.com using a similar data set.

Feature Generation Summary

With these features created, each deck was summarized, resulting in the following predictors:

Predictor Name Description
cost.mean mean deck mana cost
cost.median median deck mana cost
skew skew of the mana curve
taunt score of cards with taunt, weighted by card popularity
draw score of cards with draw, weighted by card popularity
destroy score of cards with destroy, weighted by card popularity
aoe score of cards with damage all, weighted by card popularity
silence score of cards with silence, weighted by card popularity
charge score of cards with charge, weighted by card popularity
heal score of cards with heal, weighted by card popularity
rattle score of cards with deathrattle, weighted by card popularity
enrage score of cards with enrage, weighted by card popularity
damage score of cards that deal damage, weighted by card popularity
buff score of cards with buffs, weighted by card popularity
battlecry score of cards with battlecry, weighted by card popularity
blank score of cards with no special abilities, weighted by card popularity
dmgSpell score of spell cards with damage, weighted by card popularity
minions score of minion cards, weighted by card popularity
spells score of spell cards, weighted by card popularity
classCard score of cards available only to that class, weighted by card popularity
avgRank average popularity of the cards in the deck
top15 total cards in the top 15 most popular
deckSwing sum of cardSwing

The header of the numeric dataframe of predictors is sampled below:

## Source: local data frame [6 x 25]
##   winCount cost.mean cost.median       skew   rarity taunt draw destroy
## 1        8  3.633333         4.0 0.13490137 1.233333   116  252       0
## 2       12  3.900000         4.0 1.25304461 1.166667   203  155       0
## 3       12  4.066667         3.5 3.51218185 1.366667    69  160      82
## 4        3  3.633333         3.5 0.37719214 1.266667    93  257       0
## 5        7  4.033333         4.0 0.92222271 1.500000    17  204      97
## 6        3  3.866667         4.0 0.04184354 1.300000    23  157      19
## Variables not shown: aoe (dbl), silence (dbl), charge (dbl), heal (dbl),
##   rattle (dbl), enrage (dbl), damage (dbl), buff (dbl), battlecry (dbl),
##   blank (dbl), dmgSpell (dbl), minions (dbl), spells (dbl), classCard
##   (dbl), avgRank (dbl), top15 (int), deckSwing (dbl)

Notably missing from this list is a metric card synergy. Many cards are much stronger in tandem than individually. Cursory attempts were made to derive these interactions from the raw data; however, professional players have a stronger sense of this classification. This would be a great metric to add for future work.

Feature Selection

Each hero class plays with a different style, so I chose to model the classes separately. Furthermore, gameplay was potentially changed with the release of the expansion. Here, I use the Mage data as a test case for the original release of the game. This subset resulted in a selection of approximately 11,000 games.

Overview of Features: Correlation Plots

First, I looked at a correlation plot of the features in question, providing some hint of colineariaty among features.