library('Lahman')
cor.test(Pitching$ERA,Pitching$W, conf.level=0.99)
##
## Pearson's product-moment correlation
##
## data: Pitching$ERA and Pitching$W
## t = -48.578, df = 48303, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 99 percent confidence interval:
## -0.2269676 -0.2046199
## sample estimates:
## cor
## -0.215822
There is a negative correlation between ERA and wins. As ERA goes up, wins decrease.
plot(Pitching$ERA,Pitching$W)
pitch<-lm(Pitching$W~Pitching$ERA)
summary(pitch)
##
## Call:
## lm(formula = Pitching$W ~ Pitching$ERA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.705 -3.844 -2.093 2.194 54.603
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.70491 0.03464 164.70 <2e-16 ***
## Pitching$ERA -0.22296 0.00459 -48.58 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.575 on 48303 degrees of freedom
## (94 observations deleted due to missingness)
## Multiple R-squared: 0.04658, Adjusted R-squared: 0.04656
## F-statistic: 2360 on 1 and 48303 DF, p-value: < 2.2e-16
abline(pitch, col="red", lwd=5)
The regression analysis shows a slope of -0.22296. An increase of ERA around 5 roughly decreases a pitcher’s season win total by 1.
summary(Pitching$W)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.00 2.00 4.55 7.00 60.00
summary(pitch$residual)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5.705 -3.844 -2.093 0.000 2.194 54.603
hist(pitch$residual, breaks=50)
The median number of wins in the data set is 2. The median residual is -2.093. Half of the residuals are larger than the median number of wins. Given this, plus a low R-squared value of 0.04658, I am weary to reject my H0 Null Hypothesis based on this simple linear regression model.