Tree Girth, Volume and Height Analysis

Let’s view the data we’re working with:

head(trees)
##   Girth Height Volume
## 1   8.3     70   10.3
## 2   8.6     65   10.3
## 3   8.8     63   10.2
## 4  10.5     72   16.4
## 5  10.7     81   18.8
## 6  10.8     83   19.7

Let’s see if there appears to be a linear relationship between girth and height where the girth of the tree can predict the height of a tree:
* There appear to be a linear relationships of varying strength between girth and height and volume and height

pairs(trees)

Using the lm() function we see that the linear function is:
Height = 83.2958 - Girth * 1.8615 + Volume * 0.5756

attach(trees)
trees_lm <- lm(Height ~ Girth + Volume, data=trees)
trees_lm
## 
## Call:
## lm(formula = Height ~ Girth + Volume, data = trees)
## 
## Coefficients:
## (Intercept)        Girth       Volume  
##     83.2958      -1.8615       0.5756

Summary

summary(trees_lm)
## 
## Call:
## lm(formula = Height ~ Girth + Volume, data = trees)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.7855 -3.3649  0.5683  2.3747 11.6910 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  83.2958     9.0866   9.167 6.33e-10 ***
## Girth        -1.8615     1.1567  -1.609   0.1188    
## Volume        0.5756     0.2208   2.607   0.0145 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.056 on 28 degrees of freedom
## Multiple R-squared:  0.4123, Adjusted R-squared:  0.3703 
## F-statistic:  9.82 on 2 and 28 DF,  p-value: 0.0005868

Backward Elimination

trees_lm <- update(trees_lm, .~. - Girth, data = trees)
summary(trees_lm)
## 
## Call:
## lm(formula = Height ~ Volume, data = trees)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.7777  -2.9722  -0.1515   2.0804  10.6426 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 69.00336    1.97443  34.949  < 2e-16 ***
## Volume       0.23190    0.05768   4.021 0.000378 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.193 on 29 degrees of freedom
## Multiple R-squared:  0.3579, Adjusted R-squared:  0.3358 
## F-statistic: 16.16 on 1 and 29 DF,  p-value: 0.0003784

Now we’ll look more into residual analysis.

hist(resid(trees_lm))

The residuals in our plot are not consistent. The majority of the residuals are on the left-center side of the graph. The volume isn’t a great predictor of height. Having additional predictors would make a better model.

plot(fitted(trees_lm),resid(trees_lm))
abline(0, 0, col = "red")

QQ Plots

If the residuals are normally distribuated, we would know the model is good. The points diverge from the line on the left and especially on the right side of this graph. The volume isn’t a good predictor.

qqnorm(resid(trees_lm))
qqline(resid(trees_lm),col="red")