Introduction

In this project, I will be analyzing the relationship between MLB team payroll and team performance. My goal is to find whether higher spending leads to more wins, which teams perform more efficiently relative to payroll, and whether higher payroll is associated with more frequent playoff appearances.

Dataset Description

The dataset contains MLB team payroll and performance data across multiple seasons. Each row represents one team in one season. The main variables used in this project include team name, year, total payroll, wins, losses, and postseason result. The data represents the years 2011-2024.

Load Packages and Data

library(readr)
library(dplyr)
library(ggplot2)
library(ggrepel)
library(scales)

df <- read_csv("mlb_payrolls.csv")

Question 1: Does higher payroll lead to more wins?

plot1 <- ggplot(df, aes(x = total_payroll, y = wins)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "blue") +
  scale_x_continuous(labels = scales::label_dollar(scale = 1e-6, suffix = "M")) +
  labs(
    title = "Relationship Between Payroll and Wins",
    x = "Total Payroll ($ Millions)",
    y = "Wins"
  )

plot1

This visualization shows a positive relationship between payroll and wins. Teams with higher payrolls typically tend to win more games. However, the distribution of the points shows that payroll is not the only factor that affects performance, since teams with similar payrolls can still have different win totals.

Question 2: Which teams are most efficient with payroll?

model = lm(wins ~ total_payroll, data = df)

efficiency = df %>%
  mutate(predicted_wins = predict(model, newdata = df),
         residual = wins - predicted_wins,
         label = paste(team_name, year))

team_efficiency = efficiency %>%
  group_by(team_name) %>%
  summarize(avg_residual = mean(residual),
            avg_payroll = mean(total_payroll),
            avg_wins = mean(wins),
            seasons = n()
            ) %>%
  filter(seasons >= 5)

plot2 <- ggplot(team_efficiency, aes(x = avg_payroll, y = avg_residual)) +
  geom_point(size = 3) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_text_repel(aes(label = team_name), size = 3) +
  scale_x_continuous(labels = scales::label_dollar(scale = 1e-6, suffix = "M")) +
  labs(
    title = "Team Payroll Efficiency (Average Over Time)",
    x = "Average Payroll",
    y = "Average Over/Under Performance (Wins vs Expected)"
  )

plot2

For this question, I used a regression model to estimate how many games a team would be expected to win based on their payroll. Then, I compared actual wins to expected wins. Teams above zero won more games than expected for their payroll, while teams below zero won fewer. This helps identify which teams were more efficient or less efficient in turning payroll into wins.

Question 3: Do higher payroll teams make the playoffs more often?

team_playoff <- df %>%
  group_by(team_name) %>%
  summarize(
    avg_payroll = mean(total_payroll),
    playoff_rate = mean(postseason != "No Playoffs"),
    seasons = n()
  ) %>%
  filter(seasons >= 5)

plot3 <- ggplot(team_playoff, aes(x = avg_payroll, y = playoff_rate)) +
  geom_point(size = 3) +
  geom_smooth(method = "lm", se = FALSE) +
  geom_text_repel(aes(label = team_name), size = 3) +
  scale_x_continuous(labels = scales::label_dollar(scale = 1e-6, suffix = "M")) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  labs(
    title = "Average Payroll vs Playoff Appearance Rate",
    x = "Average Payroll ($ Millions)",
    y = "Playoff Appearance Rate"
  )

plot3

This plot shows a positive relationship between average payroll and playoff appearance rate. Teams with higher payrolls will typically make the playoffs more often, although there are exceptions. Some teams with lower payrolls still appear in the playoffs frequently, while some higher payroll teams aren’t as successful as their spending would suggest.

Conclusion

Overall, this analysis shows that payroll does matter in MLB, but it does not guarantee success. Higher payroll is associated with more wins and more frequent playoff appearances, which suggests that financial resources provide an advantage. However, the efficiency analysis shows that some teams are much better than others at converting payroll into results. The main takeaway is that money helps, but smart management and efficient roster construction still play a major role in team success.