From the output we can see that sprint speed, at .39, correlates the most to Outs above average, with average arm velocity, at .25, correlating more than max arm velocity, at .21.
This takeaway helps answer the central question partly, we can slightly predict short stop defensive success with average sprint speed, but the correlation is not high enough to be confident.
Next we will combine these variables using a linear regression model to hopefully create a more correlated formula to Outs Above Average than any of the three signular variables already tested.
# Use linear regression model to find best formula correlating to outs Above Average, first using max arm
model <- lm(oaa ~ sprint_speed + max_arm_velo, data=data)
summary(model)
##
## Call:
## lm(formula = oaa ~ sprint_speed + max_arm_velo, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.664 -4.744 0.544 4.961 16.078
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -96.0821 40.8690 -2.351 0.0248 *
## sprint_speed 2.8031 1.3285 2.110 0.0425 *
## max_arm_velo 0.2270 0.3584 0.633 0.5310
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.183 on 33 degrees of freedom
## Multiple R-squared: 0.159, Adjusted R-squared: 0.108
## F-statistic: 3.119 on 2 and 33 DF, p-value: 0.05746
# Second using average arm
model <- lm(oaa ~ sprint_speed + avg_arm_velo, data=data)
summary(model)
##
## Call:
## lm(formula = oaa ~ sprint_speed + avg_arm_velo, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.6958 -4.7242 0.0126 5.5391 15.8164
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -111.3275 42.8405 -2.599 0.0139 *
## sprint_speed 2.7936 1.2711 2.198 0.0351 *
## avg_arm_velo 0.4251 0.3750 1.134 0.2650
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.077 on 33 degrees of freedom
## Multiple R-squared: 0.1807, Adjusted R-squared: 0.131
## F-statistic: 3.638 on 2 and 33 DF, p-value: 0.03733
# Third using all three variables
model <- lm(oaa ~ sprint_speed + max_arm_velo + avg_arm_velo, data=data)
summary(model)
##
## Call:
## lm(formula = oaa ~ sprint_speed + max_arm_velo + avg_arm_velo,
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.9352 -4.6272 -0.7454 5.9649 15.6115
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -114.3653 43.2630 -2.643 0.0126 *
## sprint_speed 3.1209 1.3445 2.321 0.0268 *
## max_arm_velo -0.6072 0.7720 -0.787 0.4373
## avg_arm_velo 0.9962 0.8182 1.218 0.2323
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.124 on 32 degrees of freedom
## Multiple R-squared: 0.1962, Adjusted R-squared: 0.1209
## F-statistic: 2.604 on 3 and 32 DF, p-value: 0.06893
# Use lrm weights to create new variable for easier calculation using only max arm
data$athletic_score_max <- 2.8031 * data$sprint_speed +
.2270 * data$max_arm_velo
cor.test(data$athletic_score_max, data$oaa)
##
## Pearson's product-moment correlation
##
## data: data$athletic_score_max and data$oaa
## t = 2.5351, df = 34, p-value = 0.01601
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.08075342 0.64301751
## sample estimates:
## cor
## 0.3987112
# Use lrm weights to create new variable for easier calculation using only average arm
data$athletic_score_avg <- 2.7936 * data$sprint_speed +
.4251 * data$avg_arm_velo
cor.test(data$athletic_score_avg, data$oaa)
##
## Pearson's product-moment correlation
##
## data: data$athletic_score_avg and data$oaa
## t = 2.7381, df = 34, p-value = 0.009762
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1121822 0.6612481
## sample estimates:
## cor
## 0.4250522
# Use lrm weights to create new variable for easier calculation using all three variables
data$athletic_score <- 3.1209 * data$sprint_speed +
0.9962 * data$avg_arm_velo +
-0.6072 * data$max_arm_velo
cor.test(data$athletic_score, data$oaa)
##
## Pearson's product-moment correlation
##
## data: data$athletic_score and data$oaa
## t = 2.8809, df = 34, p-value = 0.00682
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1339051 0.6734808
## sample estimates:
## cor
## 0.4429542
From the linear regression models we can see that the model consisting of just average sprint speed and max throwing velocity is only slightly more correlated to Outs Above average than average sprint speed alone.
The model consisting of just average sprint speed and average throwing velocity is more correlated than both sprint speed on its own, and the previous model of sprint speed and max throwing power.
The final model consisting of all three variables is the most correlated to Outs Above Average, and can be used to predict the defensive performance of a player with an average level of confidence
From the rather small sample size of just the 2025 season, along with the final correlation sill being below .5, I would not recommend using this formula to predict real world performance. However, I do believe this can be used as a basic grounds for positional assignment when choosing a starting shortstop. If you have two short stops fighting for the starting job, the one that posseses a more consistently strong arm, along with more consistent speed is more than likely going to be the better fielder in the long run.