Multiple Linear Regression Models

In this document we will be creating a multiple linear regression (MLR) model. First, we will test the significance of each of our predictors. Then we will create a confidence interval for our model coefficients, a confidence interval for the mean value of our response, a prediction interval for an individual value of our response.

For this document we will be using the iris data set. Our response variable will be petal width (Petal.Width), and our predictors will be petal length (Petal.Length) and sepal length (Sepal.Length).

First, let’s call and attach our data.

library(MASS)
data(iris)
attach(iris)

Next, let’s create a multiple liner regression model of our data.

mod.iris <- lm(Petal.Width ~ Petal.Length + Sepal.Length)

Significance of Predictors

Now, we can test the significance of our predictors using two hypotheses. Our null hypothesis states that, after accounting for the other predictor, the predictor of interest does not have a linear relationship with the response. In other words, our predictor of interest has a coefficient of 0. Our alternative hypothesis states that, after condsidering the other predicotr, our predictor of interest does have a linear relationship with our response. Equivalently, our predictor of interest has a coefficient not equal to 0. If our test shows that a variable is significant, we reject the null hypothesis in favor of the alternative. We can use the summary function to test the significance of our predictors.

summary(mod.iris)
## 
## Call:
## lm(formula = Petal.Width ~ Petal.Length + Sepal.Length)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.60598 -0.12560 -0.02049  0.11616  0.59404 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -0.008996   0.182097  -0.049   0.9607    
## Petal.Length  0.449376   0.019365  23.205   <2e-16 ***
## Sepal.Length -0.082218   0.041283  -1.992   0.0483 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2044 on 147 degrees of freedom
## Multiple R-squared:  0.929,  Adjusted R-squared:  0.9281 
## F-statistic: 962.1 on 2 and 147 DF,  p-value: < 2.2e-16

Our output tells us a lot about our model, including the estimates for each of our parameters. The estimate for our intercept is -0.008996, but this can’t actually be interpretted because it doesn’t make sense for an iris to ever have a petal length or sepal length of 0. Plus, there are no observations in our data with a petal length or sepal length equal to or near 0. The summary also gives us the estimate for our petal length coefficient, which is 0.449376. This means that if all other predictors are constant, for every 1 centimeter increase in petal length, we would expect to see a 0.449376 centimeter increase in petal width. Also, our estimate for our sepal length coefficient is -0.082218. This means that if all else is held constant, for every 1 centimeter increase in sepal length we could expect to see a 0.082218 centimeter decrease in petal width.

Getting back to hyptheses, the summary also tells us whether or not our predictors are significant. The Pr(>|t|) column communicates whether or not a variable is significant. We see that our p-value for petal length is <2e-16, which is very small and close to 0. Since we have such a small p-value we can conclude that petal length is a significant predictor when determining petal length. In other words, with sepal length held constant, petal length does help us estimate petal width. Thus, we can reject the null hypothesis in favor of the alternative for this predictor. Sepal length on the other hand is not as significant, but, depending on our significance level, we may still decide to reject the null hypothesis. Our p-value for sepal length is 0.0483. If we chose a significance level of 95% or lower we would reject our null in favor of the hypothesis and conclude that sepal length is significant in estimating petal width. However, if we wanted a higher significance level, such as 99%, we would not reject our null in favor of the alternative; we would conclude that there is not sufficient evenidence to reject the null hypothesis.

Confidence Levels for Coefficients

Next, let’s construct confidence intervals for each of our coefficients in our MLR model. Let’s choose a confidence level of 95%. We can use the confint command to do so.

confint(mod.iris)
##                   2.5 %        97.5 %
## (Intercept)  -0.3688623  0.3508703167
## Petal.Length  0.4111061  0.4876461035
## Sepal.Length -0.1638030 -0.0006326179

Here our output gives us the lower and upper bounds for a 95% confidence for each of our coefficients. For example, our confidence interval for the coefficient for petal length is [0.4111061, 0.4876461035]. This means that we are 95% confident that for any given sepal length, a 1 centimenter increase in petal length will cause the petal width to increase by an amount between 0.4111061cm. and 0.4876461035cm.

Confidence Level for the Mean

Now, lets calculate a 97% confidence interval for the mean petal width, given that petal length is 1.5cm. and the sepal length is 4.5cm. First we need to create a new dataset that includes only those irises with a petal length of 1.5cm. and a sepal length of 4.5cm. Then we can create a confidence interval for the mean petal width.

new.iris <- data.frame(Petal.Length = 1.5, Sepal.Length = 4.5)
predict(mod.iris, new.iris, interval = "confidence", level = .97)
##        fit       lwr       upr
## 1 0.295088 0.2244786 0.3656975

The output gives us the estimated value of the mean petal width, given that petal length is 1.5cm. and the sepal length is 4.5cm. It also gives us the lower and upper bounds of a 97% confidence interval for the mean petal width. The output tells us that we can be 97% confident that the mean petal width for irises with a petal length of 1.5cm. and a sepal length of 4.5 cm. is between 0.2244786cm. and 0.3656975cm.

Prediction Interval for an Individual Value

Lastly, we can create a 97% prediciton interval for the petal width given that petal length is 1.5cm. and the sepal length is 4.5cm. We can use the same new dataset that we created for our confidence interval, but this time we just tell R that we want a prediciton niterval.

predict(mod.iris, new.iris, interval = "prediction", level = .97)
##        fit        lwr       upr
## 1 0.295088 -0.1584564 0.7486324

The output gives us the point estimate for the petal width of an individual iris with petal length of 1.5cm. and sepal length of 4.5cm (notice that this matches the estimate for the mean speal width calculated with our confidence interval for the mean). The output also gives us the lower and upper bound for an estimated value of the petal width of an individual iris. Our confidence interval tells us that we can be 97% confident that if an iris has a petal length of 1.5cm. and a sepal length of 4.5cm., the petal width will be between -0.1584564cm. and 0.7486324. Notice that it doesn’t make sense for petal width to ever be negative, so this negative lower bound could be a sign that not all of the linear regression assumptions are met. Also, remember that our prediciton interval is wider than our confidence interval because the variance of any individual value of our response is greater than the variance of the mean value of our response. ```