https://www.youtube.com/watch?v=vGOpEpjz2Ks
fullmodel <- lm(Butterfat~Breed*Age,data=butterfat)
plot(fullmodel)
Residual vs fitted: We see as fitted values increase, variance of residuals increases. This is bad. Normal Q-Q: we see as we get towards -2 and 2 for theoretical quantiles, the variance of the standardized residuals along the dotted line increases. This is bad. Scale-Location: we want the red line to be flat. The fact that it curves a little is not good. Constant Leverage: Not sure how to interpret, but looks like Jersey cows are the issue.
Now we do a box-cox transformation.
basically, we will transform the response variable (Butterfat) by a value to better meet the assumptions of a linear model, which will be evident by those plotted graphs (Residual vs fitted, Normal Q-Q etc. ) looking better.
library(MASS)
boxcox(fullmodel) #expand range of x axis to improve viewing
boxcox(fullmodel, lambda = seq(-3,3))
The 95% confidence interval of lambda is between about -2.4 and -0.4
We can calculate the exact value
bc<-boxcox(fullmodel)
bc$x[which(bc$y==max(bc$y))]
## [1] -1.393939
So we would use lambda = -1.4
BUT sometimes people just use whole numbers in the literature
fullmodel.bc <- lm((Butterfat)^-1.4~Breed*Age,
data=butterfat)
plot(fullmodel.bc)
AIC(fullmodel,fullmodel.bc)
## df AIC
## fullmodel 11 119.8703
## fullmodel.bc 11 -551.1828
summary(fullmodel.bc)
##
## Call:
## lm(formula = (Butterfat)^-1.4 ~ Breed * Age, data = butterfat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.038725 -0.009277 0.000826 0.007989 0.039486
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1462148 0.0045919 31.842 < 2e-16 ***
## BreedCanadian -0.0221464 0.0064939 -3.410 0.000973 ***
## BreedGuernsey -0.0358579 0.0064939 -5.522 3.21e-07 ***
## BreedHolstein-Fresian 0.0170183 0.0064939 2.621 0.010301 *
## BreedJersey -0.0424620 0.0064939 -6.539 3.66e-09 ***
## AgeMature -0.0092666 0.0064939 -1.427 0.157051
## BreedCanadian:AgeMature 0.0119770 0.0091838 1.304 0.195508
## BreedGuernsey:AgeMature 0.0049687 0.0091838 0.541 0.589820
## BreedHolstein-Fresian:AgeMature 0.0092194 0.0091838 1.004 0.318126
## BreedJersey:AgeMature -0.0002915 0.0091838 -0.032 0.974746
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01452 on 90 degrees of freedom
## Multiple R-squared: 0.741, Adjusted R-squared: 0.7151
## F-statistic: 28.61 on 9 and 90 DF, p-value: < 2.2e-16
Rsquared = 0.7151
summary(fullmodel)
##
## Call:
## lm(formula = Butterfat ~ Breed * Age, data = butterfat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.0190 -0.2720 -0.0430 0.2372 1.3170
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.9660 0.1316 30.143 < 2e-16 ***
## BreedCanadian 0.5220 0.1861 2.805 0.00616 **
## BreedGuernsey 0.9330 0.1861 5.014 2.65e-06 ***
## BreedHolstein-Fresian -0.3030 0.1861 -1.628 0.10693
## BreedJersey 1.1670 0.1861 6.272 1.22e-08 ***
## AgeMature 0.1880 0.1861 1.010 0.31503
## BreedCanadian:AgeMature -0.2870 0.2631 -1.091 0.27834
## BreedGuernsey:AgeMature -0.0860 0.2631 -0.327 0.74457
## BreedHolstein-Fresian:AgeMature -0.1750 0.2631 -0.665 0.50773
## BreedJersey:AgeMature 0.1310 0.2631 0.498 0.61982
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4161 on 90 degrees of freedom
## Multiple R-squared: 0.6926, Adjusted R-squared: 0.6619
## F-statistic: 22.53 on 9 and 90 DF, p-value: < 2.2e-16
Rsquared = 0.6619