Learning Log Day 14

Diagnostic Plots

In our most recent class, we learned about diagnostic plots. After we create a model, we can create plots to decide whether the model is a good fit or not.

The first plot that we would look at would be a residual plot. This helps us decide 1) if the mean function x(beta) is appropriate, 2) if there is homoscedasticity, and 3) if we have outliers.

The second plot we create is a qqnorm plot. This plots the quantiles of the residuals vs. quantiles of the normal data points. This plot uses standardized residuals, which weights the extreme x-values as lower.

The third plot we create is the scale location plot, which look at the square root of the residuals vs the fitted values. This reduces the skew of the data and makes it easier to see trends in the residuals.

The last plot we look at is called Cook’s distance. This plot measures each data point’s influence on the coeffecient beta hat.

If we want to automatically create all of these plots, we first create a model and then use our plot function. In this example we will use a data set that looks at the body weight and brain weights of animals.

library(alr3)

## Warning: package 'alr3' was built under R version 3.4.3

## Loading required package: car

## Warning: package 'car' was built under R version 3.4.3

data(brains)
attach(brains)
head(brains)

##            BrainWt  BodyWt
## Arctic_fox  44.500   3.385
## Owl_monkey  15.499   0.480
## Beaver       8.100   1.350
## Cow        423.012 464.983
## Gray_wolf  119.498  36.328
## Goat       114.996  27.660

mymod <- lm(BrainWt ~ BodyWt)
plot(mymod)

The plot function spits out all of the four previously mentioned plots in order. In our first plot, we want the red line to follow closely to the gray line. In the second plot, we want our data points to follow the dotted line. In the third plot, we want the red line to be decently straight. In the fourth plot, we want all of our data points to be within the dotted curved lines.

Transformations

If we want to try to improve our plots, we can use transformations. A few common transformations are to take the square root (helps with heteroscedasticity), log (helps straighten the trend), or inverse of our models, as follows:

tmod <- lm(sqrt(BrainWt) ~ sqrt(BodyWt))
plot(tmod)

tmod2 <- lm(log(BrainWt) ~ log(BodyWt))
plot(tmod2)

tmod3 <- lm((1/BrainWt) ~ (1/BodyWt))
plot(tmod3)

## hat values (leverages) are all = 0.01612903
##  and there are no factor predictors; no plot no. 5

Looking at these plots while keeping the desired goals for each in mind, it looks like the log transformation gave us the best results.

Outliers and Influential Points

We also began section 5.4 in class, which is about outliers and influential points. To deal with outliers, we first want to identify them from the plot of our model. We then ask these questions to decide what to do with them:

Was the data point recorded incorrectly?
If it is correct, why is it an outlier?
If it is correct, are we missing a predictor that could explain the trend?

Learning Log Day 14

Cara Christianson

March 21, 2018

Learning Log Day 14

Diagnostic Plots

Transformations

Outliers and Influential Points