https://www.youtube.com/watch?v=vGOpEpjz2Ks

 fullmodel <- lm(Butterfat~Breed*Age,data=butterfat)
 plot(fullmodel)

Residual vs fitted: We see as fitted values increase, variance of residuals increases. This is bad. Normal Q-Q: we see as we get towards -2 and 2 for theoretical quantiles, the variance of the standardized residuals along the dotted line increases. This is bad. Scale-Location: we want the red line to be flat. The fact that it curves a little is not good. Constant Leverage: Not sure how to interpret, but looks like Jersey cows are the issue.

Now we do a box-cox transformation.

basically, we will transform the response variable (Butterfat) by a value to better meet the assumptions of a linear model, which will be evident by those plotted graphs (Residual vs fitted, Normal Q-Q etc. ) looking better.

library(MASS)
boxcox(fullmodel) #expand range of x axis to improve viewing

boxcox(fullmodel, lambda = seq(-3,3))

The 95% confidence interval of lambda is between about -2.4 and -0.4

We can calculate the exact value

bc<-boxcox(fullmodel)

bc$x[which(bc$y==max(bc$y))]
## [1] -1.393939

So we would use lambda = -1.4

BUT sometimes people just use whole numbers in the literature

fullmodel.bc <- lm((Butterfat)^-1.4~Breed*Age,
                   data=butterfat)
plot(fullmodel.bc)

AIC(fullmodel,fullmodel.bc)
##              df       AIC
## fullmodel    11  119.8703
## fullmodel.bc 11 -551.1828
summary(fullmodel.bc)
## 
## Call:
## lm(formula = (Butterfat)^-1.4 ~ Breed * Age, data = butterfat)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.038725 -0.009277  0.000826  0.007989  0.039486 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      0.1462148  0.0045919  31.842  < 2e-16 ***
## BreedCanadian                   -0.0221464  0.0064939  -3.410 0.000973 ***
## BreedGuernsey                   -0.0358579  0.0064939  -5.522 3.21e-07 ***
## BreedHolstein-Fresian            0.0170183  0.0064939   2.621 0.010301 *  
## BreedJersey                     -0.0424620  0.0064939  -6.539 3.66e-09 ***
## AgeMature                       -0.0092666  0.0064939  -1.427 0.157051    
## BreedCanadian:AgeMature          0.0119770  0.0091838   1.304 0.195508    
## BreedGuernsey:AgeMature          0.0049687  0.0091838   0.541 0.589820    
## BreedHolstein-Fresian:AgeMature  0.0092194  0.0091838   1.004 0.318126    
## BreedJersey:AgeMature           -0.0002915  0.0091838  -0.032 0.974746    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01452 on 90 degrees of freedom
## Multiple R-squared:  0.741,  Adjusted R-squared:  0.7151 
## F-statistic: 28.61 on 9 and 90 DF,  p-value: < 2.2e-16

Rsquared = 0.7151

summary(fullmodel)
## 
## Call:
## lm(formula = Butterfat ~ Breed * Age, data = butterfat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0190 -0.2720 -0.0430  0.2372  1.3170 
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                       3.9660     0.1316  30.143  < 2e-16 ***
## BreedCanadian                     0.5220     0.1861   2.805  0.00616 ** 
## BreedGuernsey                     0.9330     0.1861   5.014 2.65e-06 ***
## BreedHolstein-Fresian            -0.3030     0.1861  -1.628  0.10693    
## BreedJersey                       1.1670     0.1861   6.272 1.22e-08 ***
## AgeMature                         0.1880     0.1861   1.010  0.31503    
## BreedCanadian:AgeMature          -0.2870     0.2631  -1.091  0.27834    
## BreedGuernsey:AgeMature          -0.0860     0.2631  -0.327  0.74457    
## BreedHolstein-Fresian:AgeMature  -0.1750     0.2631  -0.665  0.50773    
## BreedJersey:AgeMature             0.1310     0.2631   0.498  0.61982    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4161 on 90 degrees of freedom
## Multiple R-squared:  0.6926, Adjusted R-squared:  0.6619 
## F-statistic: 22.53 on 9 and 90 DF,  p-value: < 2.2e-16

Rsquared = 0.6619