Tree Girth and Height Analysis

Let’s view the data we’re working with:

head(trees)
##   Girth Height Volume
## 1   8.3     70   10.3
## 2   8.6     65   10.3
## 3   8.8     63   10.2
## 4  10.5     72   16.4
## 5  10.7     81   18.8
## 6  10.8     83   19.7

Let’s see if there appears to be a linear relationship between girth and height where the girth of the tree can predict the height of a tree:
*It appears that there might be some positive relationship but I don’t know how strong it is

plot(trees$Girth,trees$Height)

Using the lm() function we see that the linear function is:
Height = 62.031 + Girth*1.054

attach(trees)
trees_lm <- lm(trees$Height ~ trees$Girth)
trees_lm
## 
## Call:
## lm(formula = trees$Height ~ trees$Girth)
## 
## Coefficients:
## (Intercept)  trees$Girth  
##      62.031        1.054

Plots & Summary

Let’s add the this relationship to the plot:

plot(trees$Girth,trees$Height)
abline(trees_lm)

summary(trees_lm)
## 
## Call:
## lm(formula = trees$Height ~ trees$Girth)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.5816  -2.7686   0.3163   2.4728   9.9456 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  62.0313     4.3833  14.152 1.49e-14 ***
## trees$Girth   1.0544     0.3222   3.272  0.00276 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.538 on 29 degrees of freedom
## Multiple R-squared:  0.2697, Adjusted R-squared:  0.2445 
## F-statistic: 10.71 on 1 and 29 DF,  p-value: 0.002758

Now we’ll look more into residual analysis.

Now let’s look at the residuals:
A residual is how far off the data point is from the regression line. A negative residual means the data point was below the regression line.

The residuals in our plot are pretty consistent although some random residuals on the left-center side of the graph are farther off. The girth isn’t a great predictor of height but it’s not the worst. Having additional predictors would make a better model.

plot(fitted(trees_lm),resid(trees_lm))

####QQ Plots
If the residuals are normally distribuated, we would know the model is good. The points diverge from the line on the left and especially on the right side of this graph. The girth alone isn’t a good predictor.

qqnorm(resid(trees_lm))
qqline(resid(trees_lm))