League Q

League of Legends (Background)

League of Legends (LoL) is a multiplayer online battle arena (MOBA) game developed and published by Riot Games in 2009. Inspired by the popular Warcraft III mod Defense of the Ancients (DotA), LoL became a global phenomenon, offering a competitive and strategic gaming experience. Players assume the role of “champions,” each with unique abilities, and compete in team-based matches to destroy the enemy’s Nexus, the heart of their base. Known for its dynamic gameplay, frequent updates, and a vast roster of champions, LoL fosters a thriving esports scene, with tournaments like the League of Legends World Championship drawing millions of viewers worldwide. Its rich lore and engaging gameplay have cemented its place as one of the most iconic and influential games in the industry.

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ lubridate 1.9.4     ✔ tibble    3.2.1
✔ purrr     1.0.2     ✔ tidyr     1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
── Attaching packages ────────────────────────────────────── tidymodels 1.2.0 ──

✔ broom        1.0.7     ✔ rsample      1.2.1
✔ dials        1.3.0     ✔ tune         1.2.1
✔ infer        1.0.7     ✔ workflows    1.1.4
✔ modeldata    1.4.0     ✔ workflowsets 1.1.0
✔ parsnip      1.2.1     ✔ yardstick    1.3.1
✔ recipes      1.1.0     

── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ scales::discard() masks purrr::discard()
✖ dplyr::filter()   masks stats::filter()
✖ recipes::fixed()  masks stringr::fixed()
✖ dplyr::lag()      masks stats::lag()
✖ yardstick::spec() masks readr::spec()
✖ recipes::step()   masks stats::step()
• Use tidymodels_prefer() to resolve common conflicts.

Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 


Attaching package: 'shiny'


The following objects are masked from 'package:DT':

    dataTableOutput, renderDataTable


The following object is masked from 'package:infer':

    observe


Loading required package: lattice

Registered S3 methods overwritten by 'pROC':
  method    from
  print.roc fmsb
  plot.roc  fmsb


Attaching package: 'caret'


The following objects are masked from 'package:yardstick':

    precision, recall, sensitivity, specificity


The following object is masked from 'package:purrr':

    lift

Purpose

The purpose of the analysis was to look at LoL stats and analyze the different CSV files I obtained from (Kaggle).

I was motivated by boredom in all honesty.

Rows: 1855 Columns: 29
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (3): team, region, matchType
dbl (26): Baron, Dra, Turts, kills, deaths, assists, CS, gold, damage, tanki...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 21076 Columns: 25
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (5): player_name, team, region, matchType, position
dbl (20): kills, deaths, assists, CS, gold, damage, tanking, matches_played,...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 0 Columns: 1
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): 03_hero.csv

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 7044 Columns: 29
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (5): Team1_name, Team1_region, Team2_name, Team2_region, matchType
dbl (24): matches_played, minutes_played, matches_won_1, matches_won_2, Team...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 7044 Columns: 97
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (25): Team1_name, Team1_region, Team2_name, Team2_region, matchType, Tea...
dbl (72): matches_played, minutes_played, Team1_player1_kills, Team1_player1...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 11458 Columns: 17
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (14): player_name, player_team, player_region, matchType, hero_chosen_1,...
dbl  (3): matches_played, minutes_played, matches_won

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
Rows: 21147 Columns: 116
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (37): matchType, MatchDate, win, Team1, Team1_region, Team1_ban1, Team1...
dbl  (78): MatchID, gameset, Team1_Baron, Team1_Dra, Team1_Turts, Team2_Baro...
time  (1): Duration

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 27270 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): hero_name, position, matchType
dbl (7): pick_count, ban_count, win_count, matches_played, pick_rate, ban_ra...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 201253 Columns: 23
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (5): hero1_name, hero1_position, hero2_name, hero2_position, matchType
dbl (18): matches_played, minutes_played, matches_won_1, matches_won_2, hero...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Exploratory Data Analysis and Visualizations

`01_team`$team <- abbreviate(`01_team`$team)
Warning in abbreviate(`01_team`$team): abbreviate used with non-ASCII chars
team_win <- `01_team` %>%
  select(team, matches_won, matches_lose) %>%
  group_by(team) %>%
  mutate(win_percentage = matches_won / (matches_won + matches_lose) * 100) %>%
  arrange(desc(matches_won))

datatable(team_win)

Regional Wins

region_win <- `01_team` %>%
  select(region, matches_won, matches_lose) %>%
  group_by(region) %>%
  summarise(total_wins = sum(matches_won), 
            total_losses = sum(matches_lose), 
            total_win_percentage = total_wins / (total_wins + total_losses) * 100) %>%
  arrange(desc(total_wins))


datatable(region_win)

Head to Head Record

`04_team_vs` <- `04_team_vs` %>%
  mutate(team_pair = ifelse(Team1_name < Team2_name, 
         paste(Team1_name, Team2_name, sep = "-"),
         paste(Team2_name, Team1_name, sep = "-")))

head_to_head <- `04_team_vs` %>%
  group_by(team_pair) %>%
  summarise(
    matches_played = sum(matches_played, na.rm = TRUE),
    team1_wins = sum(matches_won_1, na.rm = TRUE),
    team2_wins = sum(matches_won_2, na.rm = TRUE),
    .groups = "drop"
  )
  
head_to_head <- head_to_head %>%
  mutate(
    Team_A = sapply(strsplit(as.character(team_pair), "-"), `[`, 1),
    Team_B = sapply(strsplit(as.character(team_pair), "-"), `[`, 2)
  ) %>%
  select(Team_A, Team_B, matches_played, team1_wins, team2_wins)
server <- function(input, output, session) {
  # Reactive data filtering
  filtered_data <- reactive({
    req(input$team1, input$team2) # Ensure inputs are available
    head_to_head %>%
      filter((Team_A == input$team1 & Team_B == input$team2) |
             (Team_A == input$team2 & Team_B == input$team1))
  })
  
  # Render the Highcharter plot
  output$plot <- renderHighchart({
    data <- filtered_data()
    
    if (nrow(data) == 0) {
      return(
        highchart() %>%
          hc_title(text = paste("No Data for Selected Teams:", input$team1, "vs", input$team2)) %>%
          hc_chart(type = "line")
      )
    }
    
    # Prepare data for Highcharter
    categories <- c("Team 1 Wins", "Team 2 Wins")
    values <- c(sum(data$team1_wins, na.rm = TRUE), sum(data$team2_wins, na.rm = TRUE))
    
    highchart() %>%
      hc_chart(type = "column") %>%
      hc_title(text = paste("Head-to-Head: ", input$team1, " vs ", input$team2)) %>%
      hc_xAxis(categories = categories, title = list(text = "Result")) %>%
      hc_yAxis(title = list(text = "Count")) %>%
      hc_series(
        list(
          name = "Wins",
          data = values
        )
      ) %>%
      hc_tooltip(pointFormat = "<b>{point.category}:</b> {point.y}")
  })
}

Player Stats and Performance

player_stats <- `02_player` %>%
  group_by(player_name) %>%
  summarise(mean_kills_per_game = mean(kills_per_game), 
            mean_CS_per_minute = mean(CS_per_minute), 
            mean_damage_per_game = mean(damage_per_game), 
            mean_deaths_per_game = mean(deaths_per_game), 
            mean_damage_per_game = mean(damage_per_game), 
            mean_tanking_per_game = mean(tanking_per_game), 
            mean_KDA = mean(KDA), 
            mean_assists_per_game = mean(assists_per_game)) 

numeric_columns <- sapply(player_stats, is.numeric)
preprocessor <- preProcess(as.data.frame(player_stats[, numeric_columns]), method = "range")
normalized_data <- predict(preprocessor, player_stats[, numeric_columns])

non_numeric_data <- player_stats[, !numeric_columns, drop = FALSE]

normalized_stats <- cbind(normalized_data, non_numeric_data)
server <- function(input, output, session) {
  # Reactive data filtering
  filtered_data_player <- reactive({
    req(input$Player) # Ensure inputs are available
    normalized_stats %>%
     filter(player_name == input$Player)
  })
  
    # Render the Highcharter plot
  output$plot <- renderHighchart({
    data <- filtered_data_player()
    
    if (nrow(data) == 0) {
      return(
        highchart() %>%
          hc_title(text = paste("No Data for Selected Player:", input$Player)) %>%
          hc_chart(type = "line")
      )
    }
    
    # Define categories (excluding player column)
    categories <- setdiff(colnames(data), "player_name") # Adjust column name as needed
    
    # Ensure categories are numeric
    performance_data <- as.numeric(data[ 1,categories, drop = TRUE])
    
    if (any(is.na(performance_data))) {
      return(
        highchart() %>%
          hc_title(text = paste("Invalid Data for Player:", input$Player)) %>%
          hc_chart(type = "line")
      )
    }
    
    highchart() %>%
      hc_chart(type = "line", polar = TRUE) %>%
      hc_title(text = paste("Performance of", input$Player)) %>%
      hc_xAxis(categories = categories, tickmarkPlacement = "on", lineWidth = 0) %>%
      hc_yAxis(gridLineInterpolation = "polygon", lineWidth = 0, min = 0, max = 1) %>%
      hc_series(
        list(
          name = input$Player,
          data = performance_data
        )
      ) %>%
      hc_tooltip(pointFormat = "<b>{point.y}</b>")
  })
}
shinyApp(ui = fluidPage(
  titlePanel("Player Performance"),
  sidebarLayout(
    sidebarPanel(
      selectInput("Player", "Select Player", choices = unique(c(player_stats$player_name))),
    ),
    mainPanel(
      highchartOutput("plot")
    )
  )
), server = server)
Warning: The select input "Player" contains a large number of options; consider
using server-side selectize for massively improved performance. See the Details
section of the ?selectizeInput help topic.

Hero Stats

hero$year <- gsub(".*?(\\d{4}).*", "\\1", hero$matchType)
hero <- hero[grepl("^\\d+$", hero$year), ]
hero$year <- ymd(hero$year, truncated = 2L)
hero_stats <- hero %>%
  group_by(hero_name, year) %>%
  summarise(mean_pick_rate = mean(pick_rate), 
            mean_ban_rate = mean(ban_rate))
`summarise()` has grouped output by 'hero_name'. You can override using the
`.groups` argument.
server <- function(input, output, session) {
  # Reactive data filtering
  filtered_data_hero <- reactive({
    req(input$Hero) # Ensure the Hero input is available
    hero_stats %>%
      filter(hero_name == input$Hero)
  })
     
  # Render the Highcharter plot
 output$plot <- renderHighchart({
  # Retrieve the filtered data
  data <- filtered_data_hero()
  
  # Define numeric categories excluding non-relevant columns
  categories <- colnames(data)[sapply(data, is.numeric) & !(colnames(data) %in% c("player", "year"))]
  
  # Ensure categories are numeric
  hero_data <- suppressWarnings(as.numeric(data[1, categories, drop = TRUE]))
  
  # Check for invalid data
  if (any(is.na(hero_data))) {
    return(
      highchart() %>%
        hc_title(text = paste("Invalid Data for Hero:", input$Hero)) %>%
        hc_chart(type = "line")
      )
  }
  
  # Convert year data to milliseconds for Highcharter
  year_data <- as.numeric(as.POSIXct(data$year)) * 1000
  print(year_data)

  # Create the Highchart line graph with two series
  highchart() %>%
    hc_chart(type = "line") %>%
    hc_title(text = "Mean Pick and Ban Rates") %>%
    hc_xAxis(
      type = "datetime",
      title = list(text = "Year"),
      labels = list(format = "{value:%Y}") # Display only the year
    ) %>%
    hc_yAxis(title = list(text = "Rate")) %>%
    hc_series(
      list(
        name = "Mean Pick Rate",
        data = mapply(function(x, y) list(x, y), year_data, data$mean_pick_rate, SIMPLIFY = FALSE)
      ),
      list(
        name = "Mean Ban Rate",
        data = mapply(function(x, y) list(x, y), year_data, data$mean_ban_rate, SIMPLIFY = FALSE)
      )
    )
  })
}
shinyApp(ui = fluidPage(
  titlePanel("Hero Pick and Ban Rate"),
  sidebarLayout(
    sidebarPanel(
      selectInput("Hero", "Select Hero", choices = unique(c(hero_stats$hero_name))),
    ),
    mainPanel(
      highchartOutput("plot")
    )
  )
), server = server)

Modeling Time!!!!

To start, we will create a simple logistic regression to predict which teams won what matches and go from there. Then, we will proceed to either create a convolutional neural net or a random forest model. We shall see.

Fitting model

# Load necessary libraries
library(tidymodels)

# Read the dataset
data <- read.csv("01_team.csv")

# Create the binary target variable based on win/loss
data <- data %>%
  mutate(win_binary = ifelse(matches_won > matches_lose, 1, 0)) %>%
  select(win_binary, Baron_per_game, Dra_per_game, Turts_per_game, kills_per_game,
         deaths_per_game, assists_per_game, CS_per_minute, gold_per_minute,
         damage_per_game, tanking_per_game) 

data$win_binary <- as.factor(data$win_binary)

# Split the data into training and testing sets
set.seed(42)
data_split <- initial_split(data, prop = 0.8, strata = win_binary)
train_data <- training(data_split)
test_data <- testing(data_split)

# Define the logistic regression model
logistic_model <- logistic_reg() %>%
  set_engine("glm") %>%
  set_mode("classification")

# Create a recipe for preprocessing
logistic_recipe <- recipe(win_binary ~ ., data = train_data) %>%
  step_normalize(all_predictors())


# Create a workflow
logistic_workflow <- workflow() %>%
  add_model(logistic_model) %>%
  add_recipe(logistic_recipe)

# Train the model
logistic_fit <- logistic_workflow %>%
  fit(data = train_data)

# Make predictions on the test set
test_predictions <- logistic_fit %>%
  predict(new_data = test_data) %>%
  bind_cols(test_data)

# Evaluate the model
metrics <- test_predictions %>%
  metrics(truth = win_binary, estimate = .pred_class)

conf_mat <- test_predictions %>%
  conf_mat(truth = win_binary, estimate = .pred_class)

# Print the results
print(metrics)
# A tibble: 2 × 3
  .metric  .estimator .estimate
  <chr>    <chr>          <dbl>
1 accuracy binary         0.900
2 kap      binary         0.797
print(conf_mat)
          Truth
Prediction   0   1
         0 192  15
         1  22 142