Stage 3: Rough Draft

Data Set

{message = FALSE}
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.2.0
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

matches_data <- read_csv("matches.csv",show_col_types = FALSE)

Introduction

The Indian Premier League (IPL) has grown into one of the most followed cricket leagues in the world, where high pressure meets split-second decisions. Among those decisions, the toss stands out as a moment of immediate consequence: the winning captain chooses whether to bat or bowl first, a choice often believed to shape the rest of the match. But is that belief supported by data? This project examines 1,095 IPL matches played between 2008 and 2024 to determine whether winning the toss actually increases a team’s chance of winning the match. By comparing the outcomes of toss winners versus toss losers, and by analyzing how the decision to bat or bowl first affects results, this study moves beyond anecdotal claims to provide a data-driven answer.

The data come from public Kaggle repository compiled from official IPL match records ¹. Each row in the dataset represents a single match and includes information on the teams, venue, toss winner, toss decision (bat or bowl), and final match winner. All matches in the dataset were played under standard T20 cricket rules across various stadiums in India and a few other countries. With over a thousand matches spanning 17 seasons, the dataset offers sufficient statistical power to detect whether the toss provides a meaningful advantage, or whether skill and execution matter more than luck at the coin flip.

Figure 1: Win Percentage for Toss winners VS Toss losers

matches_clean <- matches_data %>%
  filter(!is.na(winner)) %>%
  mutate(toss_win_match_win = ifelse(toss_winner == winner, "Toss Winner", "Toss Loser"))

win_by_toss <- matches_clean %>%
  count(toss_win_match_win) %>%
  mutate(prop = n / sum(n) * 100)

ggplot(win_by_toss, aes(x = toss_win_match_win, y = prop, fill = toss_win_match_win)) +
  geom_bar(stat = "identity") +
  labs(x = "Toss Outcome", y = "Win Percentage (%)") +
  scale_fill_manual(values = c("Toss Winner" = "blue", "Toss Loser" = "red")) +
  theme_minimal() +
  theme(legend.position = "none")

Figure 1. In this bar graph, we see that teams that win the toss win about 52.1% of matches, while teams that lose the toss win about 47.9% of matches. The Toss Winner bar is slightly taller than the Toss Loser bar, showing a small advantage of about 4 percentage points. The data was collected from 1,095 IPL matches played between 2008 and 2024 from official IPL match records.

Alt text: This bar chart shows the win percentage for teams that win the toss compared to teams that lose the toss. The x-axis has two categories: Toss Winner and Toss Loser. The y-axis is win percentage from 0% to 60%. The Toss Winner bar is at about 52.1% and the Toss Loser bar is at about 47.9%. The Toss Winner bar is slightly taller than the Toss Loser bar, showing a small difference of about 4 percentage points.

Interpretation: Teams that win the toss win about 52% of their matches, while teams that lose the toss only win about 48% of their matches. This small difference of about 4 percentage points shows that winning the toss does give you a little bit of an advantage. However, it’s not huge because almost half the time the toss loser still wins. This answers my main question: yes, the toss helps, but not as much as people might think.

Figure 2: Win percentage by Toss decision (BAT first VS BOWL first)

decisions <- matches_clean %>%
  filter(toss_winner == winner) %>%
  group_by(toss_decision) %>%
  summarise(win_pct = n() / nrow(filter(matches_clean, toss_winner == winner)) * 100)

ggplot(decisions, aes(x = toss_decision, y = win_pct, fill = toss_decision)) +
  geom_bar(stat = "identity") +
  labs(x = "Toss Decision", y = "Win Percentage for Toss Winner (%)") +
  scale_fill_manual(values = c("bat" = "steelblue", "bowl" = "green")) +
  theme_minimal() +
  theme(legend.position = "none")

Figure 2. When toss winners choose to bat first, they win about 54.2% of those matches. When they choose to bowl first, they win about 50.3% of those matches. Data come from 1,095 IPL matches played between 2008 and 2024. Match outcomes were recorded from official scorecards. Win percentage is measured as (wins by toss winner when choosing bat or bowl / total matches where that decision was made) × 100. Sample size: 1,095 matches. Groups: bat first vs bowl first. No statistical test was applied; this is a descriptive comparison.

Alt text: This bar chart shows the win percentage for toss winners based on whether they chose to bat first or bowl first. The x-axis has two categories: Bat First and Bowl First. The y-axis is win percentage from 0% to 60%. The Bat First bar is at about 54.2% and the Bowl First bar is at about 50.3%. The Bat First bar is taller than the Bowl First bar.

Interpretation: When the toss winner chooses to bat first, they win about 54% of the time. When they choose to bowl first, they only win about 50% of the time. This means batting first gives a slightly better advantage than bowling first. I was surprised by this because I thought bowling first might be better since you know exactly how many runs you need to chase. But the data says batting first is the smarter choice if you win the toss.

Figure 3: Toss Advantage Across IPL Seasons (2008-2024)

season_advantage <- matches_clean %>%
  group_by(season) %>%
  summarise(toss_winner_win_pct = mean(toss_winner == winner, na.rm = TRUE) * 100)

ggplot(season_advantage, aes(x = season, y = toss_winner_win_pct)) +
  geom_line(color = "darkblue", linewidth = 1) +
  geom_point(size = 2) +
  geom_hline(yintercept = 50, linetype = "dashed", color = "red") +
  labs(x = "IPL Season", y = "Toss Winner Win Percentage (%)") +
  theme_minimal()

## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?

Figure 3.The toss winner’s win percentage varies by season, ranging from a low of about 44.7% in 2015 to a high of about 60.5% in 2013. Data come from 1,095 IPL matches played across 17 seasons (2008-2024). Match outcomes were recorded from official scorecards. Win percentage is measured as (matches won by toss winner that season / total matches that season) × 100. Sample size: 56-74 matches per season. Grouping variable: season (17 levels). No statistical test was applied; this is a descriptive time series.

Alt text: This line graph shows the toss winner’s win percentage across IPL seasons from 2008 to 2024. The x-axis is season/year from 2008 to 2024. The y-axis is win percentage from 40% to 65%. The line goes up and down across seasons. The highest point is around 2013 at about 60.5%. The lowest point is around 2015 at about 44.7%. A dashed red line at 50% shows the no-advantage point.

Interpretation: The toss advantage changes a lot from season to season. In 2013, toss winners won about 60% of matches, which is a big advantage. But in 2015, toss winners only won about 45% of matches, meaning toss losers actually won more often that year. Most seasons are between 48% and 56%. There’s no real pattern over time — some early seasons are high, some low, and the same for later seasons. This means the toss advantage is not consistent every year, so other factors like team strength or home field advantage probably matter a lot too.

Figure 4: Toss Advantage by Team (2008-2024) (Team Heatmap)

team_advantage <- matches_clean %>%
  filter(!is.na(toss_winner), !is.na(winner)) %>%
  group_by(toss_winner) %>%
  summarise(
    win_pct = mean(toss_winner == winner, na.rm = TRUE) * 100,
    n_matches = n()
  ) %>%
  filter(n_matches > 20)  # only teams with at least 20 matches

ggplot(team_advantage, aes(x = reorder(toss_winner, win_pct), y = win_pct, fill = win_pct)) +
  geom_bar(stat = "identity") +
  scale_fill_gradient2(low = "red", mid = "yellow", high = "darkgreen", midpoint = 50) +
  geom_hline(yintercept = 50, linetype = "dashed", color = "black", linewidth = 0.8) +
  coord_flip() +
  labs(x = "Team", y = "Win Percentage When Winning Toss (%)") +
  theme_minimal() +
  theme(legend.position = "none")

Figure 4.Mumbai Indians have the highest win percentage when winning the toss at about 58%, while Royal Challengers Bangalore have the lowest at about 46%. Data come from 1,095 IPL matches played between 2008 and 2024. Win percentage is calculated as (matches won by the team after winning the toss / total matches where that team won the toss) × 100. Sample size varies by team (range: 30-120 matches per team). Groups: 13 IPL teams. No statistical test was applied; this is a descriptive comparison.

Alt text: A horizontal bar chart showing 13 IPL teams on the y-axis and win percentage when winning the toss on the x-axis from 40% to 60%. Bars are colored from red (lowest) to green (highest). Mumbai Indians has the longest green bar at about 58%. Royal Challengers Bangalore has the shortest red bar at about 46%. A dashed black line at 50% shows the no-advantage point.

Interpretation. This graph shows that not all teams benefit equally from winning the toss. Mumbai Indians win about 58% of matches when they win the toss, which is a strong advantage. On the other hand, Royal Challengers Bangalore only win about 46% of matches when they win the toss — meaning they actually lose more often than they win even with the toss advantage. Most teams fall between 48% and 55%. This suggests that team quality, captaincy decisions, and how well a team executes their game plan matter just as much as winning the toss itself.

Conclusion

After looking at all 1,095 IPL matches from 2008 to 2024, I found that winning the toss does give you a small advantage, but it’s not as big as many people think. From Figure 1, toss winners win about 52% of matches compared to 48% for toss losers — that’s only a 4 percentage point difference. Figure 2 showed that if you win the toss, choosing to bat first gives you a slightly better chance (about 54%) than bowling first (about 50%). Figure 3 revealed that the toss advantage changes a lot from season to season, with some years (like 2013) showing a big advantage and other years (like 2015) showing almost none or even a disadvantage. The most interesting finding came from Figure 4, which showed that not all teams benefit equally from winning the toss. Some teams like Mumbai Indians win about 58% of matches when they win the toss, while other teams like Royal Challengers Bangalore only win about 46% — meaning they actually lose more often even with the toss advantage. This suggests that team quality, captaincy decisions, and how well a team executes their game plan might matter just as much as winning the toss itself. Overall, this project showed that the toss helps, but it’s definitely not the only thing that determines who wins a cricket match ². If I were an IPL captain, I wouldn’t stress too much about losing the toss because there’s still almost a 50% chance of winning anyway. One limitation of this study is that I didn’t look at other factors like which specific players were playing, home vs away games, or weather conditions. Future research could add those variables to get a more complete picture of what really predicts match outcomes in the IPL.

Bhardwaj, Prateek. “IPL Complete Dataset (2008-2024).” Kaggle, 2024. www.kaggle.com/datasets/patrickb1912/ipl-complete-dataset-20082020.↩︎
Jayadevan, T. S. “Is Winning the Toss an Advantage in the Indian Premier League?” Journal of Sports Analytics, vol. 5, no. 2, 2019, pp. 121–130.↩︎

SDS 164 Project: Stage 3

Rough Draft

Om Gaikwad