Multiple Linear Regression






Data comes from the 2020 baseball season courtesy of sports_reference


Most baseball fans will tell you that runs against is a more important metric than runs for because good pitching is more reliable.


Further, most fans will tell you that the best runs are the home runs because thats how you score against good pitching.


Lets see what Multiple Regression shows using 2020 wins as the result of Runs For, Runs Against, HR For, HR Against…


team_data = read.csv(file = "C:\\Users\\arono\\source\\R\\Data605\\teams.csv", header = TRUE)

runs_for<-team_data$R
runs_against<-team_data$RGAgainst         # runs against
wins<-team_data$W
hr_for<-team_data$HR
hr_against<-team_data$HRAgainst


bb.lm <- lm(wins~runs_against+runs_for+hr_for+hr_against)


library(car)

avPlots(bb.lm)

summary(bb.lm)
## 
## Call:
## lm(formula = wins ~ runs_against + runs_for + hr_for + hr_against)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.8634  -1.5802   0.4777   1.7363  11.5255 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   95.59739   15.90605   6.010 2.81e-06 ***
## runs_against -10.30271    2.90606  -3.545  0.00158 ** 
## runs_for       0.05008    0.01948   2.570  0.01652 *  
## hr_for         0.12044    0.04508   2.672  0.01308 *  
## hr_against    -0.14414    0.05337  -2.701  0.01224 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.317 on 25 degrees of freedom
## Multiple R-squared:  0.9233, Adjusted R-squared:  0.911 
## F-statistic:  75.2 on 4 and 25 DF,  p-value: 1.448e-13


Lets look at the correlations individually.


runs_for.lm <- lm(wins~runs_for)
runs_against.lm <- lm(wins~runs_against)
hr_for.lm <- lm(wins~hr_for)
hr_against.lm <- lm(wins~hr_against)
sum_runs_for.lm<-summary(runs_for.lm)
sum_runs_against.lm<-summary(runs_against.lm)
sum_hr_for.lm<-summary(hr_for.lm)
sum_hr_against.lm<-summary(hr_against.lm)

sprintf("Runs For R Squared : %.2f", sum_runs_for.lm$r.squared )
## [1] "Runs For R Squared : 0.59"
sprintf("Runs Against R Squared : %.2f", sum_runs_against.lm$r.squared )
## [1] "Runs Against R Squared : 0.78"
sprintf("HR For R Squared : %.2f", sum_hr_for.lm$r.squared )
## [1] "HR For R Squared : 0.50"
sprintf("HR Against For R Squared : %.2f", sum_hr_against.lm$r.squared )
## [1] "HR Against For R Squared : 0.55"


par(mfrow=c(1,2))


plot(wins~runs_for, xlab="Runs For", ylab="Wins")
abline(runs_for.lm) 


plot(wins~runs_against, xlab="Runs Against", ylab="Wins")
abline(runs_against.lm) 


par(mfrow=c(1,2))


plot(wins~hr_for, xlab="HR For", ylab="Wins")
abline(hr_for.lm) 


plot(wins~hr_against, xlab="HR Against", ylab="Wins")
abline(hr_against.lm)