Let’s view the data we’re working with:
head(trees)
## Girth Height Volume
## 1 8.3 70 10.3
## 2 8.6 65 10.3
## 3 8.8 63 10.2
## 4 10.5 72 16.4
## 5 10.7 81 18.8
## 6 10.8 83 19.7
Let’s see if there appears to be a linear relationship between girth and height where the girth of the tree can predict the height of a tree:
*It appears that there might be some positive relationship but I don’t know how strong it is
plot(trees$Girth,trees$Height)
Using the lm() function we see that the linear function is:
Height = 62.031 + Girth*1.054
attach(trees)
trees_lm <- lm(trees$Height ~ trees$Girth)
trees_lm
##
## Call:
## lm(formula = trees$Height ~ trees$Girth)
##
## Coefficients:
## (Intercept) trees$Girth
## 62.031 1.054
Let’s add the this relationship to the plot:
plot(trees$Girth,trees$Height)
abline(trees_lm)
summary(trees_lm)
##
## Call:
## lm(formula = trees$Height ~ trees$Girth)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.5816 -2.7686 0.3163 2.4728 9.9456
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 62.0313 4.3833 14.152 1.49e-14 ***
## trees$Girth 1.0544 0.3222 3.272 0.00276 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.538 on 29 degrees of freedom
## Multiple R-squared: 0.2697, Adjusted R-squared: 0.2445
## F-statistic: 10.71 on 1 and 29 DF, p-value: 0.002758
Now let’s look at the residuals:
A residual is how far off the data point is from the regression line. A negative residual means the data point was below the regression line.
The residuals in our plot are pretty consistent although some random residuals on the left-center side of the graph are farther off. The girth isn’t a great predictor of height but it’s not the worst. Having additional predictors would make a better model.
plot(fitted(trees_lm),resid(trees_lm))
####QQ Plots
If the residuals are normally distribuated, we would know the model is good. The points diverge from the line on the left and especially on the right side of this graph. The girth alone isn’t a good predictor.
qqnorm(resid(trees_lm))
qqline(resid(trees_lm))