We are again using Sean Lahman’s Baseball Database, focusing on players’ batting statistics for players who had 502 or more plate appearances in the 1962-2019 seasons (the MLB threshold to qualify for the batting title).
What is the mean batting average? Is the mean batting average different from .300 at the 95% confidence level?
Mean = 0.2767106 Yes, the mean batting average is different from .300 at the 95% confidence level.
mean(batting_names$BA)
## [1] 0.2767106
t.test(batting_names$BA, mu = 0.3 , alternative = "two.sided")
##
## One Sample t-test
##
## data: batting_names$BA
## t = -71.951, df = 7161, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0.3
## 95 percent confidence interval:
## 0.2760760 0.2773451
## sample estimates:
## mean of x
## 0.2767106
What is the mean number of home runs? Is the mean number of home runs greater than 15 at the 95% confidence level?
Mean = 17.72843 Yes, the mean number of home runs greater than 15 at the 95% confidence level.
mean(batting_names$HR)
## [1] 17.72843
t.test(batting_names$HR, mu = 15 , alternative = "two.sided")
##
## One Sample t-test
##
## data: batting_names$HR
## t = 21.099, df = 7161, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 15
## 95 percent confidence interval:
## 17.47493 17.98193
## sample estimates:
## mean of x
## 17.72843
Many people argue that left-handed hitters perform better than right-handed hitters. One reason is that batters can see and track the ball better out of the pitcher’s hand when the pitcher and batter’s handedness does not match (i.e., left-handed hitter vs. right-handed pitcher). Since more pitchers are right-handed, left-handed hitters are more likely to enjoy this handedness advantage on a per-at-bat basis.
Say we want to evaluate this claim in the case of batting averages.
t.test(batting_names$BA[which(batting_names$bats=='L')],batting_names$BA[which(batting_names$throws=='R')], alternative = "less")
##
## Welch Two Sample t-test
##
## data: batting_names$BA[which(batting_names$bats == "L")] and batting_names$BA[which(batting_names$throws == "R")]
## t = 6.6512, df = 3898.9, p-value = 1
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf 0.005715039
## sample estimates:
## mean of x mean of y
## 0.2801965 0.2756148
Many people argue that batting performance was artificially inflated by the use of steroids during the “Steroid Era” (approximately 1994-2005). Mainly, people suggest that steroids helped batters build strength to hit baseballs further, such that home run totals soared.
Say we want to test this claim about the Steroid Era and home run totals.
steroidHR <- batting_names$HR[which(batting_names$yearID >= 1994 & batting_names$yearID <= 2005)]
modernHR <- batting_names$HR[which(batting_names$yearID >= 2005 & batting_names$yearID <= 2024)]
t.test(steroidHR, modernHR, alternative = 'greater')
##
## Welch Two Sample t-test
##
## data: steroidHR and modernHR
## t = 2.1061, df = 3130, p-value = 0.01764
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 0.172979 Inf
## sample estimates:
## mean of x mean of y
## 20.73477 19.94404