Rough Draft

We have done beyond the necessities for this assignment, but it is not yet a full final project. If needed, ignore additional pieces like the 3rd and 4th graphs, extra citations, ect.. Thank you

Due 5/7 at 11 pm: Published Rmarkdown document with link submitted to moodle

Each group should submit ONE assignment.

One May 9th, we will spend time peer-reviewing each other’s visualizations during class. Choose ONE visualization that you feel would be best supported by peer review and be ready to present it to the class.

In a Published RMarkdown document, your rough draft must include:

  1. Your two (three) paragraph written introduction from Part 2 with Dr. Zillig’s edits incorporated DONE
  2. At least two (4) figures, ONE of which is ready for peer-review during class time DONE
  3. Alt text for all figures (not done for figure 4) (see below for instructions) DONE
  4. Figure captions for all figures (not done for figure 4) (see below for instructions) DONE
  5. For each figure, 3-4 sentences that interpret/explain your graph or the trends you observe. (not done for figure 4) These explanations should answer the questions you laid out in your written introduction paragraphs. DONE
  6. At least two relevant citations included as footnotes. (see below for instructions) DONE

Check out the Grading Rubric for more specifics. This is the grading rubric for the Final Project, not the rough draft, but will give you a good idea of what I am looking for.

Introduction

Our topic involves MLB (baseball) statistics. We are both athletes at St. Olaf, and love watching/playing baseball. In today’s game, payroll is becoming and more and more important factor in terms of a team’s success. Last MLB season, the World Series was between the Los Angeles Dodgers and the New York Yankees, who were 5th and 2nd in total payroll allocations with 241 million USD and 309 USD respectively. Additionally, the New York Mets, the team with the highest total payroll allocations in the MLB last season with $317 million, lost to the Dodgers in the National League Championship Series1. As high payrolls are seemingly becoming essential for team’s to perform well in the MLB today, we were interested in the actual correlation between payroll and success. We found a dataset that looks at various statistics for each team by year from 2011 to 2024, focusing primarily on payroll allocations. Through this dataset, we can measure success through both win rate (a column we created by dividing wins by total games) and regular season results (the standings before postseason: no playoffs, wildcard, or division winner). This topic is especially important in the game of baseball today, because of its growing importance and disparity among teams. Some team’s have an abundance of wealth, while others are already at their cap. Team’s with minimal payroll are scrambling to match bigger teams like the New York Yankees, who have seen large amounts of success in the recent past. But is payroll really what creates this success? This leads to our research question: “How does a team’s total payroll allocations in a given year affect success in that year?” On a simpler level, is success directly correlated with payroll? Or are there other, more important ways to win in the MLB? One example of a team whose low payroll didn’t limit their success is the “Moneyball” Oakland Athletics. In 2002, the Athletics had the third-lowest payroll in Major League Baseball of only 41 million USD. However, by using advanced analytics to construct their roster, the Athletics were able to win 103 games that year, and advance to the American League Divisional Series in the playoffs2. Was this case an outlier, or is payroll not as much of an indicator of success as we are made to believe?

There is prior research on the correlation between payroll and success, but there are minimal definite conclusions. In fact, there are multiple sources that found conflicting results about the impact of a higher payroll on wins. On the one hand, payroll may lead to more capital freedom, but not necessarily success3. On the other hand, payroll can help gather better, more skilled players who in turn help a team win more games4. So which one is more apparent? Our study aims to fill the general uncertainty and knowledge gap of the importance of payroll in the MLB today. As MLB fans, this is extremely interesting and can have a lot of significance in better informing underlying causes of a franchise’s success, not only in baseball, but in any professional sport (assuming that findings can be generalized).

The dataset5 we will be diving into comes from kaggle.com, with statistics gathered directly from Spotrac, which is a company that provides detailed information on player contracts, salaries, and other financial data for professional sports. This is an observational study, with each row representing a given team in a given year. The data includes variables such as average age, payroll allocations, and wins/losses measured for each team from 2011-2024. Data comes from end-of-year statistics for each of these 14 years. For this project, we will primarily analysis team name, year, payroll allocations, win rate, and pre-playoff standing. Team name is a nominal categorical variable with 30 levels, one for each team in the MLB. Year is also a nominal categorical variable with 14 levels, 2011 through 2024. Playoff allocations in a discrete numeric variable measured in dollars. Win rate is a continuous numeric variable measured in terms of proportion of wins in a given year. Pre-playoff standing is a ordinal categorical variable broken down into “no playoffs”, “wild card”, and “division winner”. The purpose of this dataset is to give ability for analysis of various metrics in relation to each team and for a certain year. We will utilize this data to help conclude the importance of payroll for a team’s success/failure, by using data science and data visualization between the four variables described above.


library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)

mlb_payrolls <- read_csv("~/Sds_164_S25/Project/Kobe_Kirk_Kieran_Haaland/mlb_payrolls_new.csv")
## Rows: 420 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): Team, Team Name, Injured, Retained, Buried, Postseason
## dbl (4): Year, Average Age, Wins, Losses
## num (2): Total Payroll Allocations, Active 26-Man
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# time to tidy the data; renaming columns and creating new, important variables
mlb_payrolls <- mlb_payrolls |> 
  mutate(
    Win_Rate = Wins / (Wins + Losses), .after = Losses, # creating a new column, win rate, from wins and losses
    Injured = parse_number(Injured), # ensures the injured column is measured as a numeric variable
    Retained = parse_number(Retained), # ensures the retained column is measured as a numeric variable
    Buried = parse_number(Buried) # ensures the buried column is measured as a numeric variable
    ) |>
  rename( # renaming all of the following columns to be in tidy format (no spaces)
    Team_Name = `Team Name`,
    Average_Age = `Average Age`,
    Total_Payroll_Allocations = `Total Payroll Allocations`,
    Active_26_Man = `Active 26-Man`,
    Regular_Season_Result = Postseason
    )
## Warning: There were 3 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `Injured = parse_number(Injured)`.
## Caused by warning:
## ! 109 parsing failures.
## row col expected actual
## 186  -- a number      -
## 225  -- a number      -
## 229  -- a number      -
## 237  -- a number      -
## 301  -- a number      -
## ... ... ........ ......
## See problems(...) for more details.
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 2 remaining warnings.
mlb_wider <- mlb_payrolls |> # creating a wider dataset, which can be used for future analysis
  pivot_wider(
    names_from = Year, # gives each year between 2011 and 2024 its own column
    values_from = Total_Payroll_Allocations, # numeric values come from total payroll allocations
    id_cols = c(Team, Team_Name) # keeps the columns Team and Team_Name from the original dataset
  )

mlb_payrolls |>
  group_by(Team, Year) |> # groups our data by unique combinations of team and year (each observation)
  ggplot(aes( # initiates a ggplot
    x = Total_Payroll_Allocations / 100000000, # maps payroll allocations (by hundreds of millions) to the x-axis
    y = Win_Rate)) + # maps win rate to the y-axis
  geom_point() + # initiates a scatter plot
  geom_smooth(method = "lm", se = FALSE, color = "blue", size = 1.5) + # creates a blue regression line
  labs(
    title = "Relationship Between Total Payroll Allocations and Win Rate", # titles our plot
    subtitle = "MLB Teams 2011-2024", # creates a subtitle for our plot
    x = "Total Payroll Allocations (100,000,000s USD)", # labels our x-axis
    y = "Win Rate", # labels our y-axis
  ) +
  theme_minimal() # uses the minimal theme for our plot
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `geom_smooth()` using formula = 'y ~ x'
This is a scatter plot that shows the relationship between total payroll allocations and win rate for all 30 Major League Baseball teams by year (from 2011-2024). The x-axis represents total payroll allocations in increments of 100 million USD, ranging from 0 to 3.5 (350 million). The y-axis represents win rate, as a proportion of games won out of total games, ranging from 0.25 to 0.75. Each point represents a given team in a certain year, with 420 total. A blue fitted trend line runs through the data points, indicating a clear positive correlation (positive slope) between payroll allocations and win rates, with teams with larger payrolls generally achieving higher win rates.

Figure 1: Higher total payroll allocations are associated with higher win rates for MLB teams. Data are based on and derived from official MLB statistics via Kaggle.com, on each of the 30 MLB teams from 2011-2024 (420 total observations). Payroll allocations represent the amount of money used on a given team, measured in hundreds of millions of US dollars. Win rate is calculated as total wins divided by total games, measured as a proportion between 0 and 1. A test of slope (p-value < 0.001) shows a statistically significant correlation between payroll allocations and MLB team’s success in that given year.

summary(lm(Win_Rate ~ Total_Payroll_Allocations, data = mlb_payrolls)) # runs a linear regression model on the correlation between win rate and payroll allocations, giving us a p-value
## 
## Call:
## lm(formula = Win_Rate ~ Total_Payroll_Allocations, data = mlb_payrolls)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.249169 -0.050610 -0.002326  0.051822  0.218050 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               4.403e-01  8.891e-03  49.520  < 2e-16 ***
## Total_Payroll_Allocations 4.678e-10  6.371e-11   7.344 1.09e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07367 on 418 degrees of freedom
## Multiple R-squared:  0.1143, Adjusted R-squared:  0.1122 
## F-statistic: 53.93 on 1 and 418 DF,  p-value: 1.094e-12

#interpretation/analysis

The data shown in figure 1 depicts a clear positive relationship between payroll allocations and win rate in Major League Baseball. Teams with higher payrolls tend to achieve higher win rates, as seen in the upward trend of the plotted points and the resulting positive slope of the fitted trend line. A slope of 0.0047 suggests that for every additional 100,000,000 USD, win rate is expected to increase by 0.47%. While there are certainly exceptions–some teams with lower payrolls still manage strong win rates, and vice versa–the general pattern suggests that higher payroll allocations correlates with better performance. This aligns with existing ideas about competitive advantages in baseball, where franchises with higher payroll allocations attract top talent, which results in more wins. In answering the research question, evidence in figure 1 suggests that payroll allocations positively impact a team’s success in the MLB from years over the last 15 years.


mlb_payrolls |>
  mutate( # relevels our data to be in order of ascending regular season result (makes our graph more intuitive)
    Regular_Season_Result = fct_relevel(Regular_Season_Result, 
                                   "No Playoffs", "Wildcard", "Division Winner")) |>
  group_by(Regular_Season_Result) |> # groups the data by the three categories of regular season result
  summarize(average_payroll_allocations = mean(Total_Payroll_Allocations / 100000000)) |> # includes an average payroll allocation variables, which is the mean (in hundreds of millions of dollars) for each regular season result
  ggplot(aes( # initiates a ggplot
    x = Regular_Season_Result, # plots regular season result to the x-axis
    y = average_payroll_allocations, # plots average payroll to the y-axis
    fill = Regular_Season_Result)) + # colors each bar by regular season results (more aesthetically pleasing)
  geom_bar(stat = "identity", show.legend = FALSE) + # initiates a bar chart, without a legend
  scale_fill_colorblind() + # makes the plot colorblind friendly
  labs(
    title = "Average Team Payroll Allocations by Regular Season Result", # titles the plot
    subtitle = "Measured using all 30 MLB teams from 2011-2024", # creates a subtitle for the plot
    x = "Regular Season Result", # labels our x-axis
    y = "Total Payroll Allocations (in $100,000,000s)" # labels our y-axis
  ) +
   theme_economist() # uses the theme "economist" to make the graph more visually appealing
This is a bar chart displaying the average team payroll allocations of Major League Baseball teams based on their categorical regular season results from 2011 to 2024. The x-axis categorizes teams into three regular season result groups: No Playoffs, Wildcard, and Division Winner. The y-axis represents total payroll allocations, measured in 100 million USD increments, ranging from 0 to 1.5 (150 million). This chart shows that teams failing to make the playoffs have the lowest payroll allocations on average, followed by teams that qualify as wildcard contenders, while division winners allocate the highest payroll amounts, on average. This trend suggests that higher payroll investments generally correlate with stronger regular-season performance.

Figure 2: Higher average payroll allocations are associated with better regular season results. Data are based on 30 MLB teams from 2011-2024, measured by official MLB statistics. Average payroll allocations are measured in hundreds of millions of US dollars. This is categorized under regular season results, which is broken down into ‘no playoffs’, ‘wildcard’, and ‘division winner’, each of which get progressively better respectively. Visually, we see a strong relationship between payroll and regular season results; payrolls are higher, on average, for teams with more successful regular season results.

#interpretation/analysis

Payroll allocations vary significantly by regular season results. In figure 2, we see that teams that fail to make the playoffs tend to have the lowest payrolls on average, while teams that do make the playoffs (wildcard and division winners) tend to have the highest payrolls. More specifically, the increase in payroll for division-winning teams suggests that financial resources contribute to more success in securing postseason berths. This pattern reinforces the notion that payroll allocations are a strong predictor of team success, in that higher payrolls often lead to more success in the MLB, directly answering our overall research question.


mlb_payrolls |>
  group_by(Team, Year) |> # groups our data by unique combinations of team and year (each observation)
  filter(Total_Payroll_Allocations > 240000000 | Total_Payroll_Allocations < 52000000) |> # filters to include only payrolls above 240,000,000 and below 52,000,000 (which are calculated to be the top and bottom 5% of teams)
  mutate(Payroll_Group = ifelse(Total_Payroll_Allocations > 240000000, "High Payroll (More than $240,000,000)", "Low Payroll (Less than $52,000,000)")) |>  # classify the points as either high or low payroll
  ggplot(aes( # initiates a ggplot
    x = Total_Payroll_Allocations / 100000000, # maps payroll allocations (by hundreds of millions) to the x-axis
    y = Win_Rate, # maps win rate to the y-axis
    color = Payroll_Group)) + # maps win rate to the y-axis
  geom_point() + # initiates a scatter plot with color of dots representing given teams
  ggrepel::geom_text_repel(aes(label = Team), size = 3) + # labels points on the graph
  scale_color_manual(values = c(
    "High Payroll (More than $240,000,000)" = "forestgreen", # makes the high payroll points green 
    "Low Payroll (Less than $52,000,000)" = "darkred")) + # makes the low payroll points red
  labs(
    title = "Relationship Between Payroll and Win Rate For Top and Bottom 5% of Payrolls", # titles our plot
    subtitle = "MLB Teams 2011-2024", # creates a subtitle for our plot
    x = "Total Payroll Allocations (100,000,000s USD)", # labels our x-axis
    y = "Win Rate", # labels our y-axis
  ) +
  theme_minimal() + # uses the minimal theme for our plot
  theme(legend.position = "bottom") # moves legend to the bottom of the graph
This is a scatter plot displaying the relationship between total payroll allocations of MLB teams and corresponding win rates, for the top and bottom 5% of teams (top and bottom 21 teams). The x-axis displays payroll, which is measured in hundreds of millions of dollars, with the bottom 5% ranging from 0.25 (25,000,000 USD) to 0.52 (52,000,000 USD) and the top 5% ranging from 2.4 (240,000,000 USD) to 3.5 (350,000,000 USD). On the y-axis is win rate, which is measured as a proportion of games won out of total games, and ranges from 0.3 to 0.7. Each point represents an MLB team in a certain year 2011-2024, and is labeled by the team's abbreviation on the graph. To enhance appearance, the points are colored green for the top 5% payroll group, and red for the bottom 5% payroll group. We see that the average win rates for the higher payroll teams are clearly higher than that of the lower payroll teams. Thus, there is visual evidence that higher payrolls lead to more successful seasons for MLB teams, based on win rate.

Figure 3: Teams with higher payroll allocations tend to be more successful (in terms of win rate) than teams with lower payroll allocations in the MLB. Based on teams with the top and bottom 5% of total payrolls, higher teams experience higher win rates in a given season (mean of 0.58 vs 0.46, respectively). Data are based on 30 MLB teams from 2011-2024, measured by official MLB statistics. Average payroll allocations are measured in hundreds of millions of US dollars, while win rate is a measured as a proportion of games won. Visually, we see a strong relationship between payroll and win rate.

mlb_payrolls |>
  filter(Total_Payroll_Allocations < 52000000) |>
  summarize(Average_Win_Rate_Low = mean(Win_Rate)) # gives the mean win rate for the bottom 5% of payrolls
## # A tibble: 1 × 1
##   Average_Win_Rate_Low
##                  <dbl>
## 1                0.458
mlb_payrolls |>
  filter(Total_Payroll_Allocations > 240000000) |>
  summarize(Average_Win_Rate_High = mean(Win_Rate)) # gives the mean win rate for the top 5% of payrolls
## # A tibble: 1 × 1
##   Average_Win_Rate_High
##                   <dbl>
## 1                 0.577
summary_table <- tibble( # initiates a table
  Payroll_Group = c("High Payroll (>$240M)", "Low Payroll (<$52M)"), # creates column Payroll_Group
  Avg_Win_Rate = c(0.58, 0.46)) # creates column Avg_Win_Rate

print(summary_table) # displays the table
## # A tibble: 2 × 2
##   Payroll_Group         Avg_Win_Rate
##   <chr>                        <dbl>
## 1 High Payroll (>$240M)         0.58
## 2 Low Payroll (<$52M)           0.46

#analysis/interpretation

Figure 3 shows a strong difference in win rates between the higher and lower payroll clusters. The observations in the top 5% of payroll allocations have an average win rate of 0.58, while the observations in the bottom 5% of payroll allocations have an average win rate of 0.46. This 12 percent difference in significant; it is a difference of nearly 20 wins in a given season (out of 162 total games). Thus, while there are certainly exceptions to this trend outlined in figure 3 (like the Tampa Bay Rays in 2020), there is strong visual evidence that higher payrolls are correlated with more winning seasons. This notion aligns with the general expectation and more money allows for more talent and, in turn, more wins. Overall, this plot provides further evidence that payroll allocations are positively correlated with a team’s success in the MLB.


NEED 1 MORE GRAPH

mlb_payrolls |>  
  filter(Year == 2024 | Year == 2018 | Year == 2012) |> # filters to include data from only 2012, 2018, and 2024 seasons
  group_by(Team) |> # groups the data by team
  ggplot(aes( # initiates ggplot with 
    x = Active_26_Man / 100000000, # maps active 26-man payroll allocations (in hundreds of millions USD) to the x-axis
    y = Win_Rate, # maps win rate to the y-axis
    color = Team)) + # assigns each team to a color
  geom_point(show.legend = FALSE) + #i nitiates scatterplot and removes color legend
  geom_smooth(method = "lm", # creates a linear model through points for EACH team
              se = FALSE, # removes standard error shading
              show.legend = FALSE) + # removes color legend (because it makes the plot confusing)
  geom_smooth(aes(group = 1), method = "lm", #creates an overall linear model through all points
              color = "black", se = FALSE, linetype = "dashed", size = 1.5) + # makes the line black and dashed
  labs(
    title = "Trends Between Active 26-Man Roster Payroll and Win Rate", #titles the plot
    subtitle = "MLB Teams for Years 2012, 2018, and 2024", # creates a subtitle for the plot
    x = "Active 26-Man Payroll Allocations (100,000,000s USD)", # labels the x-axis
    y = "Win Rate" # labels the y-axis
  ) +
  theme_gray() # uses theme "gray" for enhanced visual appearance
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

# Step 1: Filter to selected years and get slopes per team
slope_df <- mlb_payrolls |>
  filter(Year %in% c(2012, 2018, 2024)) |>  # keep only data from 2012, 2018, and 2024
  group_by(Team) |>  # group by team to run individual models
  do({  
    model = lm(Win_Rate ~ Active_26_Man, data = .)  # use linear model of Win Rate ~ Payroll
    tidy_model = broom::tidy(model)  # tidy the model output to extract coefficients
    slope = tidy_model$estimate[2]  # extract the slope
    tibble(slope = slope)  # return slope as a tibble
  }) |>  
  mutate(slope_sign = ifelse(slope > 0, "positive", "negative"))  # classify slope as positive or negative

# Step 2: Join back slope info to original data
plot_data <- mlb_payrolls |>
  filter(Year %in% c(2012, 2018, 2024)) |>  # keep only relevant years (again)
  left_join(slope_df, by = "Team")  # join slope info by team

# Step 3: Plot with color mapped to slope_sign
ggplot(plot_data, aes(  # initiate ggplot using filtered/joined data
    x = Active_26_Man / 100000000,  #map payroll to x-axis (in hundreds of millions)
    y = Win_Rate,  # map win rate to y-axis
    group = Team)) +  # group lines by team
  geom_point(aes(color = Team), show.legend = FALSE) +  # plot points, colored by team (legend hidden)
  geom_smooth(aes(color = slope_sign), method = "lm", se = FALSE, show.legend = FALSE) +  # lines colored by slope sign
  geom_smooth(aes(group = 1), method = "lm",  # add an overall trend line across all teams
              color = "black", se = FALSE, linetype = "dashed", size = 1.5) +  # make the overall line black and dashed
  scale_color_manual(values = c("positive" = "forestgreen", "negative" = "darkred")) +  # set colors for slope sign
  labs( 
    title = "Trends Between Active 26-Man Roster Payroll and Win Rate",  # create title for plot
    subtitle = "MLB Teams for Years 2012, 2018, and 2024",  # create subtitle
    x = "Active 26-Man Payroll Allocations (100,000,000s USD)",  # label x-axis
    y = "Win Rate"  # label y-axis
  ) +
  theme_gray()  # use the gray theme for plot styling
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
This is a scatter plot including regression lines for each team (as well as an overall trend line), showing the relationship between Active 26-Man Payroll Allocations and Win Rate in the years: 2012, 2018, and 2024 (6 year intervals). The x-axis displays Active 26-Man Payroll Allocations, measured in hundreds of millions of US dollars, ranges from 0.1 (10 million USD) to about 2.75 (275 million USD). The y-axis displays win rate, which is measured as a proportion of games won out of total games, and ranges from 0.25 to about 0.7. Lines with a positive slope are colored green, while lines with a negative slope are colored red. We see that a large majority of lines (26/30) have a positive slope (green), and the overall trend line (black, dashed) is increasing as well.

Figure 4: …

#alt text (also included as fig.alt)

This is a scatter plot including regression lines for each team (as well as an overall trend line), showing the relationship between Active 26-Man Payroll Allocations and Win Rate in the years: 2012, 2018, and 2024 (6 year intervals). The x-axis displays Active 26-Man Payroll Allocations, measured in hundreds of millions of US dollars, ranges from 0.1 (10 million USD) to about 2.75 (275 million USD). The y-axis displays win rate, which is measured as a proportion of games won out of total games, and ranges from 0.25 to about 0.7. Lines with a positive slope are colored green, while lines with a negative slope are colored red. We see that a large majority of lines (26/30) have a positive slope (green), and the overall trend line (black, dashed) is increasing as well.

#analysis/interpretation

…indicating that there is a positive correlation between active payroll allocations and win rate. In additional, the overall trend line (black, dashed) is increasing, which provides further evidence…


  1. Blasi, Weston (25 October. 2024). “The Yankees-Dodgers World Series Features the Biggest Combined Payroll Ever.” Market Watch. https://www.marketwatch.com/story/the-yankees-dodgers-world-series-features-the-biggest-combined-payroll-ever-316dff5c.↩︎

  2. Popdust (28 Feb. 2025). “How Moneyball Changed the Way We See Sports Forever.” Popdust. https://www.popdust.com/how-moneyball-changed-sports-forever.↩︎

  3. Hall, S., Szymanski, S., & Zimbalist, A. S. (2002). “Testing Causality Between Team Performance and Payroll: The Cases of Major League Baseball and English Soccer.” Journal of Sports Economics. https://doi.org/10.1177/152700250200300204↩︎

  4. Schwartz, Noah L. and Zarrow, Jason M. (2009). “An Analysis of the Impact of Team Payroll on Regular Season and Postseason Success in Major League Baseball.” Undergraduate Economic Review. https://digitalcommons.iwu.edu/uer/vol5/iss1/3↩︎

  5. Treasure, C. (2024). MLB Team Payrolls 2011-2024. Kaggle.com. https://www.kaggle.com/datasets/christophertreasure/mlb-team-payrolls-2011-2024↩︎