Following the 1968 Major League Baseball (MLB) season, officials decided to lower the pitching mound by 5 inches. This was done following a year where pitching and defense dominated the game and run scoring was on the decline. MLB officials thought that this would increase the run scoring in the games. We can use MLB data to examine the effect this had on MLB run scoring. The data was sourced from Sean Lahman’s database which is publicly available.
First we must set our working directory and then load up the dataset called Teams. This includes team statistics from every season from 1871-2022.
setwd("~/R_Files/R_Baseball/baseballdatabank-2023.1/core")
teams = read.csv("Teams.csv")
Next we can create a new dataframe to work with. This will initially contain 2 columns; the years of the initial dataset, and the aggregate number of runs scored in that season across all games played.
runs = aggregate(teams['R'], list(year = substr(teams$yearID,1,4)),FUN = sum, na.rm = TRUE)
We will add 3 new variables to the runs dataframe. 1. time : This is a number that starts with 1 at the first year of the dataframe (1871) and increases by 1 until the last year of the dataframe. 2. treatment : This will assign a 1 or a 0 to every year that is either before the mound adjustment, or after. 3. timesince : This will assign a 0 to every year before the mound adjustment, and then count upwards by 1 starting in the year 1969. This is the time since the treatment (mound adjustment) was introduced.
runs$time = rep(1:nrow(runs))
runs$treatment = ifelse(runs$year < 1969,0,1)
runs$timesince = c(rep(0,98),rep(1:54))
Now that we have created a dataframe with all the relevant data in it, we can input this into a linear model using lm() to calculate the coefficients.
reg_runs = lm(R ~ time + treatment + timesince, data = runs)
Next we can view the results in a table using the stargazer package.
When we do this, a few things are apparent. First, run scoring has increased over time by ~60 runs per year. Second, the immediate impact of lowering the pitching mound for the 1969 season had a drastic effect on run scoring, increasing by ~3100 runs. Last, run scoring has increased at a rate of ~60 runs per year each year following the mound lowering. All of these effects were statistically significant from 0. Based on this we can interpret that the MLB lowering the pitching mound by 5 inches prior to the 1969 season significantly increased run scoring.
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
stargazer(reg_runs,
type = "text",
dep.var.labels = ("Annual Run Scoring"),
column.labels = "",
covariate.labels = c("Time","Mound Lowered","Time Since Mound Lowering"),
omit.stat = "all",
digits = 3,
out = "runs_lm.html")
##
## =====================================================
## Dependent variable:
## ---------------------------
## Annual Run Scoring
##
## -----------------------------------------------------
## Time 59.604***
## (9.766)
##
## Mound Lowered 3,101.930***
## (932.957)
##
## Time Since Mound Lowering 60.117**
## (25.799)
##
## Constant 7,262.146***
## (556.792)
##
## =====================================================
## =====================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Finally, we can use the pred() function to predict the run scoring throughout time with our data, and plot the data on a graph.
The mound lowering in 1969 is indicated by the red line, and the increase in run scoring is evident in the graph. Clearly the change worked as planned!
pred_runs = predict(reg_runs,runs)
#Plotting the data
plot(runs$year,runs$R,
col = 'black',
xlab = "Time (Years)",
ylab = "Total Runs Scored",
main = ('Annual MLB Run Scoring Total (1871-2022)'))
lines(runs$year, pred_runs, col = "blue", lwd = 3)
abline(v = 1968, col = "red",lty = 5)
legend(1872,25000, legend = c("Runs Scored Prediction","Mound Lowered (1969)"),
col = c("blue","red"),lty = 1:2)