Introduction

League of Legends is a competitive multiplayer game in which match outcomes are influenced by a wide range of in-game factors, including kills, gold earned, items, objectives, and other measures of team performance. Because these variables change throughout the course of a match, interval-based snapshot data provides a useful setting for studying whether the eventual winner can be predicted from the current state of play. Having played the game for nearly a decade, I have often noticed that matches can feel unpredictable, yet certain in-game patterns seem to signal when one team is beginning to take control. That personal experience motivated my interest in exploring whether those patterns can be captured more formally through machine learning.

In this project, I use a Kaggle dataset containing League of Legends match interval snapshots from 2026 matches to build classification models for predicting match outcomes. Rather than using all snapshots together, this project focuses on fixed points in the match to examine how predictive performance changes over time. The goal is to compare several machine learning approaches and determine which model is most effective at predicting the winner from structured in-game features. This is a classification problem because the response variable is categorical, while the predictors include numerical and potentially categorical game-state variables recorded over time.

This project is relevant because it applies machine learning methods to a large real-world dataset with many possible predictors and complex relationships. It also provides a useful example of how classification models can be used on structured gaming data, where information evolves over time and may contain patterns that are not immediately obvious from simple summary statistics alone. By building and comparing multiple models, the aim is to evaluate how accurately match outcomes can be predicted from interval snapshot data.

Project Roadmap

To carry out this analysis, I will begin with exploratory data analysis, examine missing data, and prepare the dataset for modeling. I will then use a stratified training and testing split, along with cross-validation on the training set, to fit and tune several machine learning models. The strongest models will then be evaluated on a held-out testing set to determine which method performs best on this prediction task.

Exploring Our Data

Before fitting any machine learning models, it is important to first understand the structure and quality of the data. Even when a dataset has already been compiled, it is rarely ready for modeling without additional inspection and preparation. Variables may be incorrectly coded, contain missing values, or otherwise require cleaning before modeling. In this section, I summarize and visualize the data, inspect missingness, and begin preparing the dataset for later on.

Loading and Exploring Raw Data

We obtain our dataset from Kaggle, using the League of Legends Match Interval Snapshots 2026 dataset. The dataset contains structured in-game observations collected at different time intervals throughout thousands of League of Legends matches. The data includes a variety of gameplay features such as kills, gold, items, objectives, and other indicators of team or player performance over time.

Since the raw data comes from multiple files, we first want to load any relevant tables and determine which files are most useful. Although the primary focus is on the interval snapshot data, since it contains the in-game features we’ll use to predict match outcomes, supporting files, such as match-level metadata or lookup tables may also be useful.

We also can’t forget to load any libraries that we’ll use.

library(tidyverse)
library(ggplot2)
library(dplyr)
library(tidyr)
library(readr)
library(tidymodels)
library(bonsai)
library(lightgbm)
library(doParallel)
library(vip)
library(corrplot)
library(ggcorrplot)

intervals <- read_csv("data/intervals.csv")
matches   <- read_csv("data/matches.csv")
summoners <- read_csv("data/processed_summoner_data.csv")
champions <- read_csv("data/ChampionTbl.csv")
items     <- read_csv("data/ItemTbl.csv")

Structure of the Raw Data

Now that the raw data has been loaded, I want to get a better sense of how large these datasets are and what they contain.

dim(intervals)

## [1] 2108090      36

dim(matches)

## [1] 39954    12

dim(summoners)

## [1] 399540      7

dim(champions)

## [1] 173   2

dim(items)

## [1] 635   2

This provides an initial sense of how large the data is and what types of variables are available for modeling. Since the interval snapshot data is the main focus of this project, I next would like to inspect its contents more closely.

head(intervals)

names(intervals)

##  [1] "id"                    "match_id"              "player_id"            
##  [4] "minute"                "current_gold"          "total_gold"           
##  [7] "cs"                    "jungle_cs"             "xp"                   
## [10] "level"                 "kills"                 "deaths"               
## [13] "assists"               "item_0"                "item_1"               
## [16] "item_2"                "item_3"                "item_4"               
## [19] "item_5"                "item_6"                "team_kills"           
## [22] "team_inhibitors"       "team_towers"           "team_dragons_fire"    
## [25] "team_dragons_water"    "team_dragons_earth"    "team_dragons_air"     
## [28] "team_dragons_chemtech" "team_dragons_hextech"  "team_dragons"         
## [31] "team_barons"           "team_void_grubs"       "team_heralds"         
## [34] "gold_diff"             "xp_diff"               "team_gold_diff"

glimpse(intervals)

## Rows: 2,108,090
## Columns: 36
## $ id                    <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1…
## $ match_id              <chr> "EUW1_7688004322", "EUW1_7688004322", "EUW1_7688…
## $ player_id             <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6,…
## $ minute                <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 10, 10, 10, 10, 10…
## $ current_gold          <dbl> 487, 310, 384, 468, 338, 1269, 699, 1009, 1144, …
## $ total_gold            <dbl> 1587, 1710, 1684, 1368, 1313, 1769, 2099, 2159, …
## $ cs                    <dbl> 33, 1, 40, 20, 4, 43, 0, 41, 33, 6, 66, 6, 77, 5…
## $ jungle_cs             <dbl> 0, 30, 0, 0, 0, 0, 40, 0, 0, 0, 0, 68, 0, 0, 0, …
## $ xp                    <dbl> 2382, 1589, 2258, 1399, 1220, 2382, 1898, 2408, …
## $ level                 <dbl> 5, 4, 5, 4, 4, 5, 5, 6, 4, 4, 8, 8, 8, 7, 6, 8, …
## $ kills                 <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 1, 3, 1, …
## $ deaths                <dbl> 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 3, 0, 1, 0, …
## $ assists               <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 2, 2, 0, …
## $ item_0                <dbl> 1056, 1101, 1056, 1055, 3340, 1055, 1102, 1056, …
## $ item_1                <dbl> 1001, 1001, 3340, 1036, 1028, 3340, 3340, 1082, …
## $ item_2                <dbl> 1029, 1052, 3070, 2003, 0, 0, 2003, 1027, 0, 0, …
## $ item_3                <dbl> 0, 2022, 1052, 0, 0, 0, 2508, 0, 0, 0, 0, 3108, …
## $ item_4                <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3802, 0, 0, …
## $ item_5                <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1052, 0, 0, …
## $ item_6                <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ team_kills            <dbl> 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 6, 6, 6, 6, 6, 4, …
## $ team_inhibitors       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ team_towers           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ team_dragons_fire     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ team_dragons_water    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ team_dragons_earth    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ team_dragons_air      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ team_dragons_chemtech <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ team_dragons_hextech  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ team_dragons          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ team_barons           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ team_void_grubs       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 1, …
## $ team_heralds          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ gold_diff             <dbl> -182, -389, -475, -276, 48, 182, 389, 475, 276, …
## $ xp_diff               <dbl> 0, -309, -150, -243, -17, 0, 309, 150, 243, 17, …
## $ team_gold_diff        <dbl> -1274, -1274, -1274, -1274, -1274, 1274, 1274, 1…

I will also inspect the match-level data, since it may also contain variables related to our final outcome.

head(matches)

names(matches)

##  [1] "match_id"      "game_duration" "patch_version" "winning_team" 
##  [5] "game_date"     "game_version"  "game_mode"     "queue_id"     
##  [9] "region"        "average_rank"  "blue_bans"     "red_bans"

glimpse(matches)

## Rows: 39,954
## Columns: 12
## $ match_id      <chr> "EUN1_3798533851", "EUN1_3798666914", "EUN1_3809012024",…
## $ game_duration <dbl> 1957, 1770, 1621, 1886, 2132, 1752, 1874, 2213, 1844, 93…
## $ patch_version <dbl> 15, 15, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, …
## $ winning_team  <dbl> 100, 200, 200, 100, 100, 100, 100, 100, 100, 200, 100, 1…
## $ game_date     <dttm> 2025-06-29 13:33:40, 2025-06-29 18:52:23, 2025-07-27 17…
## $ game_version  <chr> "15.13.693.4876", "15.13.693.4876", "15.14.697.2104", "1…
## $ game_mode     <chr> "CLASSIC", "CLASSIC", "CLASSIC", "CLASSIC", "CLASSIC", "…
## $ queue_id      <dbl> 420, 420, 420, 420, 420, 420, 420, 420, 420, 420, 420, 4…
## $ region        <chr> "euw1", "euw1", "euw1", "euw1", "euw1", "euw1", "euw1", …
## $ average_rank  <chr> "UNRANKED", "UNRANKED", "UNRANKED", "UNRANKED", "UNRANKE…
## $ blue_bans     <chr> "35,112,54,117,36", "35,350,92,90,555", "43,31,200,38,59…
## $ red_bans      <chr> "25,54,119,-1,800", "203,89,36,119,134", "54,711,804,107…

Let’s take a look at when these snapshots occur and how long League of Legends matches last on average.

ggplot(intervals, aes(x = minute)) +
  geom_histogram(binwidth = 1, fill = "#0A3D62", color = "#D4AF37") +
  labs(
    title = "Distribution of Match Snapshot Times",
    x = "Minute",
    y = "Count"
  )

ggplot(matches, aes(x = game_duration / 60)) +
  geom_histogram(binwidth = 2, fill = "#0A3D62", color = "#D4AF37") +
  labs(
    title = "Distribution of Match Duration",
    x = "Game Duration (minutes)",
    y = "Count"
  )

Examining the graphs, I can see that the first graph shows that most interval snapshots occur during the early and middle stages of matches, with substantially fewer observations from later stages of play. The second graph shows that most matches last roughly 20-35 minutes, with a peak around the upper 20-minute range. A small concentration of very short games at the 5-minute mark likely explained by the League of Legends remake feature, which allows certain matches to end early.

Choice of Match Timepoints

To compare prediction performance across different stages of the game, I first examined the snapshot times available in the interval data. The dataset contains observations at 5-minute increments from 5 to 60 minutes. Note that the number of observations declines noticeably at later time points, especially after 25 minutes.

##  [1]  5 10 15 20 25 30 35 40 45 50 55 60

## 
##      5     10     15     20     25     30     35     40     45     50     55 
## 395120 393850 392650 362160 303490 175280  64170  17070   3450    710    110 
##     60 
##     30

To compare predictive performance across different stages of the game, the primary focus is on snapshots taken at 10, 15, 20, and 25 minutes. These time points provide a useful progression from early to mid game while also retaining a large number of observations for modeling.

Response Variable and Unit of Observation

The response variable for this project is winning_team from matches.csv, which indicates which team won the match. This will serve as the target variable in the classification models, and it can be joined to the interval data using match_id.

In intervals.csv, each row represents a single player at a specific snapshot time within a match. This is shown by variables such as match_id, player_id, and minute, along with player-level features like gold, experience, kills, deaths, and assists. Because the raw interval data is recorded at the player level, it will need to be prepared further so that the final modeling datasets match the prediction task at 10, 15, 20, and 25 minutes.

Constructing the Modeling Dataset

The final modeling datasets are created by combining intervals.csv with matches.csv using match_id. Since the raw interval data is recorded at the player level, it must first be transformed so that each observation corresponds to a single match at a specific snapshot time.

For this project, I focus on snapshots taken at 10, 15, 20, and 25 minutes. I merge the interval data with player metadata from processed_summoner_data.csv to identify team membership, aggregate player-level variables into match-level features, and then join the final match outcome, winning_team, from matches.csv. This results in one row per match at each selected timepoint, making it possible to compare model performance across different stages of the game.

team_lookup <- summoners %>%
  select(match_id, participant_id, team_id) %>%
  mutate(team = if_else(team_id == 100, "blue", "red"))

snapshots <- intervals %>%
  filter(minute %in% c(10, 15, 20, 25)) %>%
  group_by(match_id) %>%
  mutate(within_match_id = ((player_id - 1) %% 10) + 1) %>%
  ungroup() %>%
  left_join(team_lookup, by = c("match_id", "within_match_id" = "participant_id")) %>%
  filter(!is.na(team)) %>%
  group_by(match_id, minute, team) %>%
  summarize(
    total_gold   = sum(total_gold, na.rm = TRUE),
    xp           = sum(xp, na.rm = TRUE),
    kills        = sum(kills, na.rm = TRUE),
    deaths       = sum(deaths, na.rm = TRUE),
    assists      = sum(assists, na.rm = TRUE),
    cs           = sum(cs, na.rm = TRUE),
    team_towers  = first(team_towers),
    team_dragons = first(team_dragons),
    team_barons  = first(team_barons),
    .groups = "drop"
  ) %>%
  pivot_wider(
    names_from  = team,
    values_from = c(total_gold, xp, kills, deaths, assists, cs,
                    team_towers, team_dragons, team_barons)
  ) %>%
  left_join(matches %>% select(match_id, winning_team), by = "match_id") %>%
  drop_na() %>%
  mutate(
    blue_win = factor(if_else(winning_team == 100, "blue", "red"))
  )

dim(snapshots)

## [1] 145215     22

table(snapshots$minute)

## 
##    10    15    20    25 
## 39385 39265 36216 30349

The code above transforms the raw player-level interval data into a match-level modeling dataset. After identifying team membership, the data is restricted to the 10, 15, 20, and 25 minute snapshots and aggregated within each match and team to produce team-level features. These summaries are then reshaped so that each row corresponds to one match at one timepoint, with separate blue-team and red-team variables. The final match outcome is then added from matches.csv, and the binary response variable blue_win is created for classification.

Exploratory Data Analysis (EDA)

Next we’re working on EDA. The first step in the exploratory data analysis is to examine the distribution of the response variable. This helps determine whether match outcomes are reasonably balanced across the dataset. I also examine whether the distribution of wins remains similar across the selected snapshot times of 10, 15, 20, and 25 minutes.

snapshots <- snapshots %>%
  mutate(
    outcome = factor(blue_win, levels = c("red", "blue"),
                     labels = c("Red Win", "Blue Win")),
    minute  = factor(minute, levels = c(10, 15, 20, 25))
  )

ggplot(snapshots, aes(x = outcome)) +
  geom_bar(fill = "#0A3D62", color = "#D4AF37") +
  labs(title = "Distribution of Match Outcomes",
       x = "Match Outcome", y = "Count") +
  theme_minimal()

ggplot(snapshots, aes(x = outcome)) +
  geom_bar(fill = "#0A3D62", color = "#D4AF37") +
  facet_wrap(~ minute) +
  labs(title = "Distribution of Match Outcomes by Snapshot Time",
       x = "Match Outcome", y = "Count") +
  theme_minimal()

Before fitting classification models, it is useful to inspect the distribution of the response variable. In this project, the outcome of interest is blue_win, which records whether the match was ultimately won by the blue side or the red side. The overall distribution of match outcomes is fairly balanced, with 73,268 red-side wins and 71,947 blue-side wins, indicating that neither class dominates the dataset. This suggests that severe class imbalance is not a major concern for the models considered here. The faceted plot by snapshot time shows a similarly balanced split at 10, 15, 20, and 25 minutes, indicating that the class distribution remains stable across the selected timepoints.

Missing Data and Preprocessing

We can’t start model fitting without cleaning our data first. We need to examine the constructed snapshot dataset for missing values, duplicated observations, and variable types. This helps in identifying any data quality issues that need to be addressed before training our models.

colSums(is.na(snapshots))

##          match_id            minute   total_gold_blue    total_gold_red 
##                 0                 0                 0                 0 
##           xp_blue            xp_red        kills_blue         kills_red 
##                 0                 0                 0                 0 
##       deaths_blue        deaths_red      assists_blue       assists_red 
##                 0                 0                 0                 0 
##           cs_blue            cs_red  team_towers_blue   team_towers_red 
##                 0                 0                 0                 0 
## team_dragons_blue  team_dragons_red  team_barons_blue   team_barons_red 
##                 0                 0                 0                 0 
##      winning_team          blue_win           outcome 
##                 0                 0                 0

sum(duplicated(snapshots))

## [1] 0

table(snapshots$minute)

## 
##    10    15    20    25 
## 39385 39265 36216 30349

After aggregating the data to the match level, the resulting snapshot dataset contained no missing values and no duplicated observations. The final dataset includes 39,385 matches at 10 minutes, 39,265 at 15 minutes, 36,216 at 20 minutes, and 30,349 at 25 minutes, providing a strong sample size across all selected time points.

Creating Advantage Feature

Because match outcomes depend more naturally on relative team advantage than on absolute team totals alone, I create several difference-based predictors. These variables measure the blue team’s advantage over the red team at each selected snapshot time and provide a cleaner summary of game state for both exploratory analysis and modeling.

snapshots <- snapshots %>%
  mutate(
    outcome = factor(if_else(blue_win == "blue", "Blue Win", "Red Win"),
                     levels = c("Red Win", "Blue Win")),
    minute  = factor(minute, levels = c(10, 15, 20, 25)),
    gold_diff   = total_gold_blue - total_gold_red,
    xp_diff     = xp_blue - xp_red,
    kills_diff  = kills_blue - kills_red,
    deaths_diff = deaths_blue - deaths_red,
    assists_diff = assists_blue - assists_red,
    cs_diff     = cs_blue - cs_red,
    tower_diff  = team_towers_blue - team_towers_red,
    dragon_diff = team_dragons_blue - team_dragons_red,
    baron_diff  = team_barons_blue - team_barons_red
  )

Predictor Relationship

Let’s take a look at some plots, primarily differences and correlation between certain factors and gold.

snap_20 <- snapshots %>% filter(minute == 20)

snapshots <- snapshots %>%
  mutate(
    total_kills_game = kills_blue + kills_red,
    total_gold_game  = total_gold_blue + total_gold_red
  )

snap_20 <- snapshots %>% filter(minute == 20)

ggplot(snap_20, aes(x = total_kills_game, y = total_gold_game)) +
  geom_point(alpha = 0.15, size = 0.8, color = "#0A3D62") +
  geom_smooth(method = "lm", se = FALSE, color = "#D4AF37", linewidth = 1) +
  labs(title = "Total Kills vs Total Gold at 20 Minutes",
       x = "Total Kills in Match",
       y = "Total Gold in Match") +
  theme_minimal()

Examining the plot above, we can see a clear positive relationship between total kills adn total gold at 20 minutes. Matches with more kills generally have higher total gold, suggesting the more action-heavy games tend to generate more resources overall. Some spread around the line, so kills are related to gold, but they do not explain all of the variation on their own. It’s worth noting that the data collected occurred during a season where games are much faster pace than years prior.

ggplot(snapshots, aes(x = outcome, y = gold_diff, fill = outcome)) +
  geom_boxplot(alpha = 0.7) +
  facet_wrap(~ minute) +
  scale_fill_manual(values = c("Red Win" = "#C0392B", "Blue Win" = "#0A3D62")) +
  labs(title = "Gold Difference by Outcome at Each Snapshot Time",
       x = "Match Outcome",
       y = "Gold Difference (Blue - Red)",
       fill = "Outcome") +
  theme_minimal()

For this plot above, we can see gold differences becoming more informative as the game progresses. At each snapshot, blue wins tend to have positive gold differences while red wins tend to have negative ones. Separation between the two grounds become more clear by the 20 and 25 minute mark, suggesting relative gold advantage is increasingly associated with eventual match outcome. Let’s take a look at the correlation matrix next!

diff_vars <- snapshots %>%
  select(gold_diff, xp_diff, kills_diff, cs_diff, 
         tower_diff, dragon_diff, blue_win) %>%
  mutate(blue_win = if_else(blue_win == "blue", 1, 0))

cor_matrix <- cor(diff_vars, use = "complete.obs")

ggcorrplot(cor_matrix,
  method = "square",
  type = "lower",
  lab = TRUE,
  lab_size = 3,
  colors = c("#C0392B", "white", "#0A3D62"),
  title = "Correlation Matrix of Difference Features and Outcome")

Examining the correlation matrix, we can see a lot of difference-based features being correlated with one another. A strong correlation especially among gold_diff, xp_diff, and kills_diff. This supports the idea that measure of team advantage tend to move together during a match, as one leads to the other.

Data Splitting

The 25-minute snapshot is used as the primary modeling dataset, as it represents the richest in-game state while still retaining a large number of observations. The data are split into an 80% training set and a 20% held-out testing set using stratified sampling on blue_win to preserve the class balance observed in EDA. A 5-fold stratified cross-validation scheme is then applied to the training set for model tuning.

snap_25 <- snapshots %>%
  filter(minute == 25) %>%
  select(blue_win,
         gold_diff, xp_diff, kills_diff, deaths_diff,
         assists_diff, cs_diff, tower_diff, dragon_diff, baron_diff)

# Stratified 80/20 split
set.seed(131)
snap_split <- initial_split(snap_25, prop = 0.80, strata = blue_win)
snap_train <- training(snap_split)
snap_test  <- testing(snap_split)

dim(snap_train)

## [1] 24278    10

dim(snap_test)

## [1] 6071   10

snap_folds <- vfold_cv(snap_train, v = 5, strata = blue_win)
snap_folds

Resulting split produces 24,278 observations in the training set and 6,071 observations in the test set. Cross-validation is performed only on the training data so that model tuning remains separate from the final evaluation.

Recipe

A single recipe is defined using the seven difference-based predictors selected during EDA. All predictors are centered and scaled using step_normalize(). While normalization is not strictly required for tree-based models, applying it uniformly keeps the workflow consistent across all four model types.

To verify that the recipe is working as intended, I prep and bake it on the training data and inspect the transformed output.

snap_recipe <- recipe(blue_win ~ gold_diff + xp_diff + kills_diff +
                        cs_diff + tower_diff + dragon_diff + baron_diff,
                      data = snap_train) %>%
  step_normalize(all_predictors())

# Verify recipe
snap_recipe %>% prep() %>% bake(new_data = NULL) %>% head()

Model Specifications

Four classification models are specified for tuning. Logistic regression with an elastic net penalty tunes both the regularization strength (penalty) and the mix of L1/L2 regularization (mixture). The decision tree tunes cost_complexity to control pruning. The random forest holds trees fixed at 500 and tunes mtry (number of predictors sampled per split) and min_n (minimum node size). The gradient boosted tree tunes the number of trees, learning rate, and tree depth.

log_spec <- logistic_reg(penalty = tune(), mixture = tune()) %>%
  set_engine("glmnet") %>%
  set_mode("classification")

# Decision tree
tree_spec <- decision_tree(cost_complexity = tune()) %>%
  set_engine("rpart") %>%
  set_mode("classification")

# Random forest
rf_spec <- rand_forest(mtry = tune(), min_n = tune(), trees = 500) %>%
  set_engine("ranger", importance = "impurity") %>%
  set_mode("classification")

# Gradient boosted tree
bt_spec <- boost_tree(trees = tune(), learn_rate = tune(),
                      tree_depth = tune()) %>%
  set_engine("lightgbm") %>%
  set_mode("classification")

bt_wf <- workflow() %>% add_recipe(snap_recipe) %>% add_model(bt_spec)

Workflows

Next, I create a separate workflow for each model by pairing the shared recipe with its corresponding model specification. This ensures that all four models use the same pre-processing steps during tuning and evaluation.

log_wf  <- workflow() %>% add_recipe(snap_recipe) %>% add_model(log_spec)
tree_wf <- workflow() %>% add_recipe(snap_recipe) %>% add_model(tree_spec)
rf_wf   <- workflow() %>% add_recipe(snap_recipe) %>% add_model(rf_spec)
bt_wf   <- workflow() %>% add_recipe(snap_recipe) %>% add_model(bt_spec)

Tuning + Fitting

In order to tune each model, separate tuning grids are defined for each model based on their key hyper-parameters. These grids specify the candidate values to be evaluated during cross-validation for the models’ we’ve established.

log_grid <- grid_regular(
  penalty(range = c(-4, 0)),
  mixture(range = c(0, 1)),
  levels = 5
)

# Decision tree
tree_grid <- grid_regular(
  cost_complexity(range = c(-4, -1)),
  levels = 10
)

# Random forest
rf_grid <- grid_regular(
  mtry(range = c(2, 7)),
  min_n(range = c(10, 40)),
  levels = 4
)

# Boosted tree
bt_grid <- grid_regular(
  trees(range = c(100, 1000)),
  learn_rate(range = c(-3, -1)),
  tree_depth(range = c(2, 6)),
  levels = 3
)

Saving Results

Rather than rerunning the tuning process each time the report is rendered, I load the saved cross-validation results for each model from .rds files.

log_res  <- readRDS("log_res.rds")
tree_res <- readRDS("tree_res.rds")
rf_res   <- readRDS("rf_res.rds")
bt_res   <- readRDS("bt_res.rds")

Comparing Tuned Model Performance

For each model, I select the best tuning result based on cross-validated ROC AUC and combine them into one table for easier comparison.

log_best  <- show_best(log_res,  metric = "roc_auc", n = 1)
tree_best <- show_best(tree_res, metric = "roc_auc", n = 1)
rf_best   <- show_best(rf_res,   metric = "roc_auc", n = 1)
bt_best   <- show_best(bt_res,   metric = "roc_auc", n = 1)

model_comparison <- bind_rows(
  log_best  %>% mutate(model = "Elastic Net Logistic"),
  tree_best %>% mutate(model = "Decision Tree"),
  rf_best   %>% mutate(model = "Random Forest"),
  bt_best   %>% mutate(model = "Boosted Tree")
) %>%
  select(model, mean, std_err, n) %>%
  rename(roc_auc = mean)

model_comparison

Cross-validation results show that the elastic net logistic regression and boosted tree models performed best, with ROC AUC scores of 0.911 and 0.910, respectively. The random forest was close behind at 0.907, while the decision tree trailed at 0.884. Since the elastic net and boosted tree were the top two performers, I move forward with those models for final testing on the held-out test set.

Visualizing Performance

To visualize how performance changes across the tuning grids, I then plot the cross-validation results for each model. The goal here is to show which tuning values performed best and how sensitive each model was to change in its hyper-parameters.

autoplot(log_res)

autoplot(tree_res)

autoplot(rf_res)

autoplot(bt_res)

Penalized Logistic Regression Model: Performed best at smaller penalty values, while performance dropped once the regularization became too large. The pure lasso setting was especially sensitive at the highest penalty level. Overall, this suggests that some regularization helped, but too much of it reduced predictive performance.
Decision Tree: The decision tree performed best at smaller cost-complexity values. As pruning became more aggressive, both accuracy and ROC AUC decreased, suggesting that overly simple trees were not flexible enough for this predictive task.
Random Forest: The random forest model was fairly stable across the tuning grid, but the strongest results came from smaller values of mtry and larger minimum node sizes. This suggests that the model benefited from some randomness while still keeping terminal nodes from becoming too small.
Boosted Trees: The boosted tree model performed best at a moderate learning rate with a medium-to-large number of trees. More aggressive settings, especially with larger depth and learning rate, appeared to reduce performance, suggesting some overfitting.

Test Set Evaluation

The two best models from cross-validation are finalized using their optimal hyperparameters and evaluated on the held out test set. We’re aiming for a clean, unbiased look at how well each model actually generalizes.

best_log  <- select_best(log_res, metric = "roc_auc")
best_bt   <- select_best(bt_res,  metric = "roc_auc")

# Finalize workflows with best hyperparameters
final_log_wf <- finalize_workflow(log_wf, best_log)
final_bt_wf  <- finalize_workflow(bt_wf,  best_bt)

# Fit final models on full training set and evaluate on test set
final_log_fit <- last_fit(final_log_wf, snap_split)
final_bt_fit  <- last_fit(final_bt_wf,  snap_split)

# Collect test set metrics
log_test_metrics <- collect_metrics(final_log_fit) %>% mutate(model = "Elastic Net Logistic")
bt_test_metrics  <- collect_metrics(final_bt_fit)  %>% mutate(model = "Boosted Tree")

# Summary table
test_comparison <- bind_rows(log_test_metrics, bt_test_metrics) %>%
  select(model, .metric, .estimate) %>%
  pivot_wider(names_from = .metric, values_from = .estimate)

test_comparison

On the held-out test, the elastic net logistic regression and boosted tree models performed nearly the same, with accuracies of 0.829 and 0.830, respectively. Even though boosted tree was slightly higher, the difference is small, suggesting both models generalize well.

ROC Curves

Next, I plot the ROC curves for both final models on the test set to compare their classification performance across different thresholds.

log_roc <- collect_predictions(final_log_fit) %>%
  roc_curve(truth = blue_win, .pred_blue) %>%
  mutate(model = "Elastic Net Logistic")

bt_roc <- collect_predictions(final_bt_fit) %>%
  roc_curve(truth = blue_win, .pred_blue) %>%
  mutate(model = "Boosted Tree")

bind_rows(log_roc, bt_roc) %>%
  ggplot(aes(x = 1 - specificity, y = sensitivity, color = model)) +
  geom_path(linewidth = 1) +
  geom_abline(lty = 3) +
  scale_color_manual(values = c("Elastic Net Logistic" = "#D4AF37",
                                "Boosted Tree" = "#0A3D62")) +
  labs(title = "ROC Curves on Test Set",
       x = "1 - Specificity", y = "Sensitivity", color = "Model") +
  theme_minimal()

On the held-out test, both the elastic net logistic regression and boosted tree achieved nearly identical accuracy of roughly 83%. The ROC curves tell the same story. This suggests that at the 25-minute mark, the relationship between game state features and match outcome is linear enough that the simpler elastic net model captures it just as well as the more complex boosted tree. For the cross-time comparison we’ll move forward with the elastic net since it’s faster to fit and performs equivalently.

Cross-Time Comparison

To examine how prediction accuracy changes as the game progresses, the finalized elastic net model is refit at each of the four snapshot using the same hyperparameters selected at 25 minutes. A fresh stratified split is applied at each timepoint so results are comparable across time.

time_results <- map_dfr(c(10, 15, 20, 25), function(t) {
  snap_t <- snapshots %>%
    filter(minute == t) %>%
    select(blue_win, gold_diff, xp_diff, kills_diff,
           cs_diff, tower_diff, dragon_diff, baron_diff)
  
  set.seed(131)
  split_t    <- initial_split(snap_t, prop = 0.80, strata = blue_win)
  train_t    <- training(split_t)
  test_t     <- testing(split_t)
  final_log_wf %>%
    last_fit(split_t) %>%
    collect_metrics() %>%
    filter(.metric == "accuracy") %>%
    mutate(minute = t)
})

time_results %>% select(minute, .metric, .estimate)

The accuracy here refers to the proportion of matches where the model correctly predicted the winning team. For example, an accuracy of 83% at 25 minutes means that given a snapshot of the game state at the point, the model correctly identified the eventual winner in roughly 83 out of every 100 matches. At 10 minutes, the drops to 69%, only modeestly better than a coinflip. This perfectly reflects how unpredictable the game still is at earlier stages compared to a later time point.

Accuracy Plots Across Timepoints

Let’s visualize this.

ggplot(time_results, aes(x = minute, y = .estimate)) +
  geom_line(color = "#0A3D62", linewidth = 1) +
  geom_point(color = "#D4AF37", size = 3) +
  scale_x_continuous(breaks = c(10, 15, 20, 25)) +
  scale_y_continuous(limits = c(0.60, 0.90),
                     labels = scales::percent_format(accuracy = 1)) +
  labs(title = "Prediction Accuracy Across Match Timepoints",
       x = "Minute", y = "Test Set Accuracy") +
  theme_minimal()

We can see that the plot shows a clear upward trend in prediction accuracy as the match progresses, which makes intuitive sense. Early in the game, advantages are small and matches are still very up for grabs. By 25 minutes, gold leads, tower differences, and objective control have compounded enough that the model can predict the eventual winner with much greater confidence. A nice confirmation to how League of Legends is a very snowball heavy game. The longer a team stays ahead, the harder it becomes to come back.

Conclusion

What once started with a passion for games became an opportunity to explore a real classification problem through data. By using structured in-game snapshots from League of Legends, the project showed that match outcomes can be predicted reasonably well using team advantage features such as gold, experience, kills, and objectives. Among the models considered, elastic net logistic regression and boosted trees performed best, with nearly identical results on both cross-validation and the held-out test set. Because the elastic net model achieved similar performance While remaining simpler and easier to interpret, it was selected as the final model.

Predicting League of Legends Match Outcomes at Different Stages of Play

Using Machine Learning Models with Match Interval Snapshots | UCSB Spring 2026

Nicholas Heng

2026-03-12