MLB Batting Analysis for 2021

Shane Hylton

12/9/2021

Goals

Summary Histograms

## [1] "The mean batting average in 2021 is:  0.237"
## [1] "The median batting average in 2021 is:  0.243"

Averages By Position

## [1] "Designated Hitter"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.2100  0.2555  0.2640  0.2605  0.2740  0.2860
## [1] "Infield"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1410  0.2245  0.2520  0.2493  0.2710  0.3420
## [1] "Outfield"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1600  0.2218  0.2475  0.2447  0.2645  0.3380
## [1] "Catcher"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1430  0.2020  0.2320  0.2256  0.2470  0.3040
##   Position    med     iqr
## 1       dh  0.264  0.0185
## 2      inf  0.252  0.0465
## 3       of 0.2475 0.04275
## 4        c  0.232   0.045

Distribution of Averages by Position

Research Question

The goal of this analysis is to determine which position would be the ideal position to rely on for a hit in a team of nine random players, 3 outfielders, 1 designated hitter, 1 catcher, and 4 infielders.

Hypothesis

I hypothesize that the designated hitter will prove to be the player to select when a hit is needed and no other information is known.

Example Team

## [1] "OF"

Simulation of 1000 Random Teams

Adjusting the Team Selection

Conclusions

The simulation validates the hypothesis that the designated hitter position would be the best position to choose on any given team to give the team the best chance at recording a hit.

The second iteration where the advantage of having four infielders was removed shows that at random, selecting an infielder does not provide the highest probability of success.

The first simulation would be valid if the manager were allowed to know the players’ batting averages after they are assigned to the team. This may be closer to reality, but it is not valid in this experiment.

Is there a relationship between age and batting average?

Regression Analysis: Greater Than or Equal to 125 At-Bats

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

## 
## Call:
## lm(formula = AVG ~ Age, data = mlb)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.103573 -0.024422  0.003395  0.023475  0.097522 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.454e-01  1.456e-02  16.854   <2e-16 ***
## Age         -3.178e-05  5.064e-04  -0.063     0.95    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03501 on 390 degrees of freedom
## Multiple R-squared:  1.01e-05,   Adjusted R-squared:  -0.002554 
## F-statistic: 0.003938 on 1 and 390 DF,  p-value: 0.95
## [1] "Predicted Batting Average for a given Age  24  :  0.246"
## [1] 0.252
## [1] "Actual Sample Batting Average for a given Age  24  :  0.252"
## [1] "Residual:  -0.00584"
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

The equation for the regression line is \(AVG = 0.2454 + 0.00003178*Age\).

Overall, age gives measurably no advantage in batting average for players with greater than 125 at-bats.

Greater Than or Equal to 50 At-Bats

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

Regression Analysis

## 
## Call:
## lm(formula = AVG ~ Age, data = mlb_50)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.158109 -0.026752  0.006391  0.027998  0.118185 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.2245924  0.0143195  15.684   <2e-16 ***
## Age         0.0004265  0.0004973   0.858    0.391    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04212 on 554 degrees of freedom
## Multiple R-squared:  0.001326,   Adjusted R-squared:  -0.0004764 
## F-statistic: 0.7357 on 1 and 554 DF,  p-value: 0.3914
## [1] "Predicted Batting Average for a given Age  24  :  0.235"
## [1] 0.259
## [1] "Actual Sample Batting Average for a given Age  24  :  0.259"
## [1] "Residual:  -0.02417"
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

The equation for the regression line is \(AVG = 0.2245924 + 0.0004265*Age\).

Overall, an increase in age will lead to a very small increase in batting average for players with 50 or more at-bats.

References

“2021 Major League Baseball Standard Batting.” Baseball, https://www.baseball-reference.com/leagues/majors/2021-standard-batting.shtml.

2021 MLB Player Stats, https://www.rotowire.com/baseball/stats.php.