Let’s view the data we’re working with:
head(trees)
## Girth Height Volume
## 1 8.3 70 10.3
## 2 8.6 65 10.3
## 3 8.8 63 10.2
## 4 10.5 72 16.4
## 5 10.7 81 18.8
## 6 10.8 83 19.7
Let’s see if there appears to be a linear relationship between girth and height where the girth of the tree can predict the height of a tree:
* There appear to be a linear relationships of varying strength between girth and height and volume and height
pairs(trees)
Using the lm() function we see that the linear function is:
Height = 83.2958 - Girth * 1.8615 + Volume * 0.5756
attach(trees)
trees_lm <- lm(Height ~ Girth + Volume, data=trees)
trees_lm
##
## Call:
## lm(formula = Height ~ Girth + Volume, data = trees)
##
## Coefficients:
## (Intercept) Girth Volume
## 83.2958 -1.8615 0.5756
summary(trees_lm)
##
## Call:
## lm(formula = Height ~ Girth + Volume, data = trees)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.7855 -3.3649 0.5683 2.3747 11.6910
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 83.2958 9.0866 9.167 6.33e-10 ***
## Girth -1.8615 1.1567 -1.609 0.1188
## Volume 0.5756 0.2208 2.607 0.0145 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.056 on 28 degrees of freedom
## Multiple R-squared: 0.4123, Adjusted R-squared: 0.3703
## F-statistic: 9.82 on 2 and 28 DF, p-value: 0.0005868
trees_lm <- update(trees_lm, .~. - Girth, data = trees)
summary(trees_lm)
##
## Call:
## lm(formula = Height ~ Volume, data = trees)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.7777 -2.9722 -0.1515 2.0804 10.6426
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 69.00336 1.97443 34.949 < 2e-16 ***
## Volume 0.23190 0.05768 4.021 0.000378 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.193 on 29 degrees of freedom
## Multiple R-squared: 0.3579, Adjusted R-squared: 0.3358
## F-statistic: 16.16 on 1 and 29 DF, p-value: 0.0003784
hist(resid(trees_lm))
The residuals in our plot are not consistent. The majority of the residuals are on the left-center side of the graph. The volume isn’t a great predictor of height. Having additional predictors would make a better model.
plot(fitted(trees_lm),resid(trees_lm))
abline(0, 0, col = "red")
If the residuals are normally distribuated, we would know the model is good. The points diverge from the line on the left and especially on the right side of this graph. The volume isn’t a good predictor.
qqnorm(resid(trees_lm))
qqline(resid(trees_lm),col="red")