Multiple Linear Regression
Data comes from the 2020 baseball season courtesy of sports_reference
Most baseball fans will tell you that runs against is a more important metric than runs for because good pitching is more reliable.
Further, most fans will tell you that the best runs are the home runs because thats how you score against good pitching.
Lets see what Multiple Regression shows using 2020 wins as the result of Runs For, Runs Against, HR For, HR Against…
team_data = read.csv(file = "C:\\Users\\arono\\source\\R\\Data605\\teams.csv", header = TRUE)
runs_for<-team_data$R
runs_against<-team_data$RGAgainst # runs against
wins<-team_data$W
hr_for<-team_data$HR
hr_against<-team_data$HRAgainst
bb.lm <- lm(wins~runs_against+runs_for+hr_for+hr_against)
library(car)
avPlots(bb.lm)
summary(bb.lm)
##
## Call:
## lm(formula = wins ~ runs_against + runs_for + hr_for + hr_against)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.8634 -1.5802 0.4777 1.7363 11.5255
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 95.59739 15.90605 6.010 2.81e-06 ***
## runs_against -10.30271 2.90606 -3.545 0.00158 **
## runs_for 0.05008 0.01948 2.570 0.01652 *
## hr_for 0.12044 0.04508 2.672 0.01308 *
## hr_against -0.14414 0.05337 -2.701 0.01224 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.317 on 25 degrees of freedom
## Multiple R-squared: 0.9233, Adjusted R-squared: 0.911
## F-statistic: 75.2 on 4 and 25 DF, p-value: 1.448e-13
Lets look at the correlations individually.
runs_for.lm <- lm(wins~runs_for)
runs_against.lm <- lm(wins~runs_against)
hr_for.lm <- lm(wins~hr_for)
hr_against.lm <- lm(wins~hr_against)
sum_runs_for.lm<-summary(runs_for.lm)
sum_runs_against.lm<-summary(runs_against.lm)
sum_hr_for.lm<-summary(hr_for.lm)
sum_hr_against.lm<-summary(hr_against.lm)
sprintf("Runs For R Squared : %.2f", sum_runs_for.lm$r.squared )
## [1] "Runs For R Squared : 0.59"
sprintf("Runs Against R Squared : %.2f", sum_runs_against.lm$r.squared )
## [1] "Runs Against R Squared : 0.78"
sprintf("HR For R Squared : %.2f", sum_hr_for.lm$r.squared )
## [1] "HR For R Squared : 0.50"
sprintf("HR Against For R Squared : %.2f", sum_hr_against.lm$r.squared )
## [1] "HR Against For R Squared : 0.55"
par(mfrow=c(1,2))
plot(wins~runs_for, xlab="Runs For", ylab="Wins")
abline(runs_for.lm)
plot(wins~runs_against, xlab="Runs Against", ylab="Wins")
abline(runs_against.lm)
par(mfrow=c(1,2))
plot(wins~hr_for, xlab="HR For", ylab="Wins")
abline(hr_for.lm)
plot(wins~hr_against, xlab="HR Against", ylab="Wins")
abline(hr_against.lm)