Giancarlo Stanton of the Miami Marlins, who hit 59 Home Runs in 2017.
Baseball is America’s pastime, and statistics have been an integral part of the game since the 19th century. Over time, statistical analysis of baseball has changed greatly through extensive analysis of every facet of the game. However, a recent source of controversy related to the sport has been related to the recent 2017 season. In this season, 6105 home runs were hit, the largest number of any season in history. Many fans, players, and analysts alike were alarmed at such a high number of home runs being hit. This also caused many to believe that such a high number of home runs is far too abnormal, and must be caused by a change in an inherent aspect of the game. Many pitchers, specifically, have noted that the baseballs themselves feel different than in the past, and that they are not able to grip the ball as well, making it easier for batters to take advantage of pitches and hit them out of the park. Therefore, our group wants to apply our knowledge of statistical procedures to identify whether this record-setting home run year truly was a simple anomaly, or a symptom of a larger problem within the game.
Within these CSV files, we focused on the columns relevant to our test, year, total home runs, and games in a season, and we then computed the HR/Game for each season from this.
Lastly, we put the HR/Game column for 2017 and for the years 1920-2016 into separate text files so that they could be directly interpreted by R.
v0 = read.table('mlb0hrg.txt')
m0 = v0[,1]
v17 = read.table('mlb17hrg.txt')
m17= v17[,1]
t.test(m17, alternative = c('greater'), mu = mean(m0), conf.level = .95)
##
## One Sample t-test
##
## data: m17
## t = 15.808, df = 29, p-value = 4.305e-16
## alternative hypothesis: true mean is greater than 0.7630291
## 95 percent confidence interval:
## 1.203168 Inf
## sample estimates:
## mean of x
## 1.256173
v0 = read.table('mlb0hrg.txt')
m0 = v0[,1]
v17 = read.table('mlb17hrg.txt')
m17= v17[,1]
year = seq(1920,2016)
model = lm(m0~year)
summary(model)
##
## Call:
## lm(formula = m0 ~ year)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.241337 -0.100692 0.002098 0.086136 0.244672
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.268e+01 8.727e-01 -14.53 <2e-16 ***
## year 6.829e-03 4.434e-04 15.40 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1223 on 95 degrees of freedom
## Multiple R-squared: 0.7141, Adjusted R-squared: 0.711
## F-statistic: 237.2 on 1 and 95 DF, p-value: < 2.2e-16
Based on the results of our hypothesis test, the p value was extremely small (\(4.305^-16\)) and nearly 0. Because of this, we would reject our null hypothesis at the 95% confidence level in favor of our alternate hypothesis. Thus, we would conclude that the true mean HR/Game in 2017 was greater than the historical average of .763, and that 2017’s average of 1.256 is the new average HR/Game.
The results of our hypothesis test could be interpreted in the context of our initial problem staterment. Since our test indicated that the mean HR/G in 2017 is different from the historical average, we can then hypothesize the reasons behind this drastic change. Some possible reasons for this could be a difference in the type of balls used for this season or a change in the strategy of hitters in terms of favoring a more power-oriented approach at the plate.
Our project did have some limitations when it came to collecting data. We initially wanted to use an SQL database to test whether the mean home runs for each player is statistically different in 2017 than the historical average, but it would have required more coding, and we are unsure whether this would have actually resulted in tangible results that we could have interpreted.
Another limitation was that our analysis cannot determine the cause of the increase in home runs, and this would require a more thorough investigation of the MLB.
One key idea that we would be able to apply for future testing is the incorporation of MLB statcast in our analysis of hitting data. Statcast was introduced in 2015 and uses radar data to track the game in a far more advanced manner.
Statcast leaderboard for the 2017 season
Example of Statcast applications.