Today we covered model building and residual analysis. This includes diagnostic plots, transformation of variables and outliers.
Diagnostic plots contain residual analysis. We discussed the four different plots given by r and use the plot command to plot our linear model. We discussed the most important plot to focus on is the residual vs. fitted values. We don’t want to see any trends and it should have spread equally in order for the mean function to be correct.
Transformation of variables We use this when observing the plots and realizing the mean function follows a trend and cook’s distance has variables that have more influence than others. We discussed 3 ways to transform the variables to improve the heteroscedasticity.
Outliers We need to figure out on what to do if we have them (and we typically always do). In class Dr. Knudson let us know it really depends on what the situation is. In x direction the outlier might not affect because it follows the linear model. In y direction it once again depends on the case. We shouldn’t get rid of the outlier unless of error or we have a reason why.
library(alr3)
## Warning: package 'alr3' was built under R version 3.3.3
## Loading required package: car
## Warning: package 'car' was built under R version 3.3.3
data(brains)
attach(brains)
mod1 <- lm(BrainWt ~ BodyWt)
plot(mod1)
There are issues with the plots. There are outlier data points that are affecting the residual, qqnorm and scale plots.
sqBrainWt <- sqrt(BrainWt)
sqmod <- lm(sqBrainWt ~ BodyWt)
plot(sqmod)
there are still the same issues as above.
logBrainWt <- log(BrainWt)
logmod <- lm(logBrainWt ~ BodyWt)
plot(logmod)
This still affects our data with data points being outliers but it appears the log function did help the appearance of the graph by a small amount.
invBrainWt <- 1/(BrainWt)
invmod <- lm(invBrainWt ~ BodyWt)
plot(invmod)
Residuals vs fitted is very concentrated at one spot and all of the qqnorm points do not follow the line closely.
brains[33, ]
## BrainWt BodyWt
## African_elephant 5711.86 6654.18
When 33 is outside outside the cones of cook’s distance line we can say it has alot of influence on the plot.