Week 11 Discussion

Tree Girth and Height Analysis

Let’s view the data we’re working with:

head(trees)

##   Girth Height Volume
## 1   8.3     70   10.3
## 2   8.6     65   10.3
## 3   8.8     63   10.2
## 4  10.5     72   16.4
## 5  10.7     81   18.8
## 6  10.8     83   19.7

Let’s see if there appears to be a linear relationship between girth and height where the girth of the tree can predict the height of a tree:
*It appears that there might be some positive relationship but I don’t know how strong it is

plot(trees$Girth,trees$Height)

Using the lm() function we see that the linear function is:
Height = 62.031 + Girth*1.054

attach(trees)
trees_lm <- lm(trees$Height ~ trees$Girth)
trees_lm

## 
## Call:
## lm(formula = trees$Height ~ trees$Girth)
## 
## Coefficients:
## (Intercept)  trees$Girth  
##      62.031        1.054

Plots & Summary

Let’s add the this relationship to the plot:

plot(trees$Girth,trees$Height)
abline(trees_lm)

summary(trees_lm)

## 
## Call:
## lm(formula = trees$Height ~ trees$Girth)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.5816  -2.7686   0.3163   2.4728   9.9456 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  62.0313     4.3833  14.152 1.49e-14 ***
## trees$Girth   1.0544     0.3222   3.272  0.00276 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.538 on 29 degrees of freedom
## Multiple R-squared:  0.2697, Adjusted R-squared:  0.2445 
## F-statistic: 10.71 on 1 and 29 DF,  p-value: 0.002758

Now we’ll look more into residual analysis.

The residuals appear to be roughly centered around the median, which is close to 0 and suggest they are normally distributed, which is good
To see if there is variability in the slope, we check that the standard error column is roughly 5-10 times smaller than the corresponding coefficient. Here we see (1.0544/0.322) = 3.27, which is low. This means there is variability in the slope estimate a1.
*Finally we check the probability, Pr(>|t|), that the corresponding coefficient is not relevant in the model. Here the probability that the girth is not relevant in the model is 0.00276. That tells us the girth is relevant! The probability that the intercept isn’t relevant is even smaller at 1.49e-14.
The R squared value is pretty low. The model only explains 24.45% of the data’s variability. More predictors would help make this model better.

Now let’s look at the residuals:
A residual is how far off the data point is from the regression line. A negative residual means the data point was below the regression line.

The residuals in our plot are pretty consistent although some random residuals on the left-center side of the graph are farther off. The girth isn’t a great predictor of height but it’s not the worst. Having additional predictors would make a better model.

plot(fitted(trees_lm),resid(trees_lm))

####QQ Plots
If the residuals are normally distribuated, we would know the model is good. The points diverge from the line on the left and especially on the right side of this graph. The girth alone isn’t a good predictor.

qqnorm(resid(trees_lm))
qqline(resid(trees_lm))

Week 11 Discussion

Devin Teran

11/5/2020

Tree Girth and Height Analysis

Plots & Summary

Now we’ll look more into residual analysis.