Shane Hylton
12/9/2021
## [1] "The mean batting average in 2021 is: 0.237"
## [1] "The median batting average in 2021 is: 0.243"
## [1] "Designated Hitter"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2100 0.2555 0.2640 0.2605 0.2740 0.2860
## [1] "Infield"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1410 0.2245 0.2520 0.2493 0.2710 0.3420
## [1] "Outfield"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1600 0.2218 0.2475 0.2447 0.2645 0.3380
## [1] "Catcher"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1430 0.2020 0.2320 0.2256 0.2470 0.3040
## Position med iqr
## 1 dh 0.264 0.0185
## 2 inf 0.252 0.0465
## 3 of 0.2475 0.04275
## 4 c 0.232 0.045
The goal of this analysis is to determine which position would be the ideal position to rely on for a hit in a team of nine random players, 3 outfielders, 1 designated hitter, 1 catcher, and 4 infielders.
I hypothesize that the designated hitter will prove to be the player to select when a hit is needed and no other information is known.
## [1] "OF"
The simulation validates the hypothesis that the designated hitter position would be the best position to choose on any given team to give the team the best chance at recording a hit.
The second iteration where the advantage of having four infielders was removed shows that at random, selecting an infielder does not provide the highest probability of success.
The first simulation would be valid if the manager were allowed to know the players’ batting averages after they are assigned to the team. This may be closer to reality, but it is not valid in this experiment.
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
##
## Call:
## lm(formula = AVG ~ Age, data = mlb)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.103573 -0.024422 0.003395 0.023475 0.097522
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.454e-01 1.456e-02 16.854 <2e-16 ***
## Age -3.178e-05 5.064e-04 -0.063 0.95
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03501 on 390 degrees of freedom
## Multiple R-squared: 1.01e-05, Adjusted R-squared: -0.002554
## F-statistic: 0.003938 on 1 and 390 DF, p-value: 0.95
## [1] "Predicted Batting Average for a given Age 24 : 0.246"
## [1] 0.252
## [1] "Actual Sample Batting Average for a given Age 24 : 0.252"
## [1] "Residual: -0.00584"
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
The equation for the regression line is \(AVG = 0.2454 + 0.00003178*Age\).
Overall, age gives measurably no advantage in batting average for players with greater than 125 at-bats.
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
##
## Call:
## lm(formula = AVG ~ Age, data = mlb_50)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.158109 -0.026752 0.006391 0.027998 0.118185
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.2245924 0.0143195 15.684 <2e-16 ***
## Age 0.0004265 0.0004973 0.858 0.391
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04212 on 554 degrees of freedom
## Multiple R-squared: 0.001326, Adjusted R-squared: -0.0004764
## F-statistic: 0.7357 on 1 and 554 DF, p-value: 0.3914
## [1] "Predicted Batting Average for a given Age 24 : 0.235"
## [1] 0.259
## [1] "Actual Sample Batting Average for a given Age 24 : 0.259"
## [1] "Residual: -0.02417"
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
The equation for the regression line is \(AVG = 0.2245924 + 0.0004265*Age\).
Overall, an increase in age will lead to a very small increase in batting average for players with 50 or more at-bats.
“2021 Major League Baseball Standard Batting.” Baseball, https://www.baseball-reference.com/leagues/majors/2021-standard-batting.shtml.
2021 MLB Player Stats, https://www.rotowire.com/baseball/stats.php.