Today we delved deeper into multiple linear regression and can now run hypothesis tests and construct confidence intervals using these models. The trees data set in R has a good 3 different quantitative variables and is a good way to display how to use these tools.

data(trees)
attach(trees)
summary(trees)
##      Girth           Height       Volume     
##  Min.   : 8.30   Min.   :63   Min.   :10.20  
##  1st Qu.:11.05   1st Qu.:72   1st Qu.:19.40  
##  Median :12.90   Median :76   Median :24.20  
##  Mean   :13.25   Mean   :76   Mean   :30.17  
##  3rd Qu.:15.25   3rd Qu.:80   3rd Qu.:37.30  
##  Max.   :20.60   Max.   :87   Max.   :77.00

Let’s construct a model for the girth of a tree based on the height and volume.

tree.mod <- lm(Girth~Height+Volume)
tree.mod
## 
## Call:
## lm(formula = Girth ~ Height + Volume)
## 
## Coefficients:
## (Intercept)       Height       Volume  
##    10.81637     -0.04548      0.19518

Now we can perform a hypothesis test on the two predictors we have been given for height and volume respectively. In these tests, the null hypothesis we are operating under is that the each predictor has a value of 0, meaning that the tested predictor has no linear relationship with the response variable. The alternative hypothesis is that our prediction coefficien has a non-zero value meaning that our predictor has a linear relationship with the response variable. By taking a summary of our model, we can get the test-stats and p-values for the test on each predictor. Our test statistic for each predictor will follow a t-distribution with n - (k+1) degrees of freedom, with k being the number of prediction variables we use. We’ll also assume and alpha level of .05.

summary(tree.mod)
## 
## Call:
## lm(formula = Girth ~ Height + Volume)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.34288 -0.56696 -0.08628  0.80283  1.11642 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.81637    1.97320   5.482 7.45e-06 ***
## Height      -0.04548    0.02826  -1.609    0.119    
## Volume       0.19518    0.01096  17.816  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7904 on 28 degrees of freedom
## Multiple R-squared:  0.9408, Adjusted R-squared:  0.9366 
## F-statistic: 222.5 on 2 and 28 DF,  p-value: < 2.2e-16

For the height coefficient, we have a teststat of -1.609 leading to a p-value of 0.119. This means that if the null hypothesis were true then we would get a beta estimate of -.04548 11.9 percent of the time, which is not enough evidence to rejecto the null hypothesis. For the volume coefficient, we have a teststat of 17.816 leading to a very small p-value, so it would be nearly impossible to obtain the coefficient we did if the null hypothesis were true, so we reject the null hypothsesis in favor of the alternative. Another thing we learned is how to create confidence intervals for our beta coefficients. In R this is very simple. We’ll use a confidence level of 95% because I like that number.

confint(tree.mod, level = .95)
##                  2.5 %      97.5 %
## (Intercept)  6.7744625 14.85827852
## Height      -0.1033758  0.01240879
## Volume       0.1727389  0.21762057

The meaning of these intervals is as follows. We are 95% confident that the true beta coefficient for height is between -1.033758 and .01240879 if we hold volume constant. Similarly, we are 95% confident that the true beta coefficient for volume is between .1727389 and .21762057 if we hold height constant. We can also create confidenc intervals for a mean value of girth (y), based on given values of our coefficients. To do this we create a data frame for specific values of our predictors and then use the predict function. Again I’ll use a confidence level of 95% because I want to.

newdata <- data.frame(Height = 80, Volume = 30)
predict(tree.mod, newdata, interval = "confidence", level = .95)
##        fit      lwr      upr
## 1 13.03308 12.65992 13.40625

What this data means is that we are 95% confident that the mean girth of all black cherry trees with height of 80 and volume of 30 is between 12.65992 and 13.40625.