Multiple Linear Regression Model

I am going to set up a multiple linear regression model for the mussels data set we looked at earlier.

muss <- read.csv(url("http://cknudson.com/data/mussels.csv"))
attach(muss)

We looked at the effect of the average mass and whether or not the mussel group is attached to a rock on the release of ammonia. This creates the following model: \[\text{Average Ammonia}=\beta_0 + \beta_1 mass + \beta_2 I(attached=rock)\] In this case, the average ammonia is the response variable, and the average mass and attachment to a rock are the predictor variables. We can use the lm function to set up this model:

mymod<-lm(AvgAmmonia~AvgMass+attached, muss)

Hypothesis Test

We can conduct a hypothesis test to see whether or not \(\beta_1\) is equal to 0. Here are the hypotheses: \[H_0: \beta_1 = 0 \] \[H_A: \beta_1 \neq 0 \] We can use the summary function to see what the test stat and p-value are. These are going to determine whether or not we reject the null hypothesis.

summary(mymod)
## 
## Call:
## lm(formula = AvgAmmonia ~ AvgMass + attached, data = muss)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -2.019e-03 -5.240e-04 -5.959e-05  3.429e-04  2.526e-03 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.0011398  0.0005533   2.060     0.05 *  
## AvgMass       0.2392793  0.0215863  11.085 3.86e-11 ***
## attachedRock -0.0025629  0.0003931  -6.519 7.91e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.00103 on 25 degrees of freedom
## Multiple R-squared:  0.8574, Adjusted R-squared:  0.846 
## F-statistic: 75.18 on 2 and 25 DF,  p-value: 2.66e-11

Our test stat is 11.085, and our p-value is 3.86e-11. Since our p-value is much smaller than \(\alpha=0.05\), the null hypothesis is rejected in favor of the alternative hypothesis. So, we can say with 95% confidence that \(\beta_1 \neq 0\). Therefore, since it isn’t equal to zero, that means that there is a linear relationship between AvgAmmonia and AvgMass. Next, we can create a confidence interval for \(\beta_1\) using the confint function.

confint(mymod)
##                      2.5 %       97.5 %
## (Intercept)   1.999745e-07  0.002279427
## AvgMass       1.948215e-01  0.283737200
## attachedRock -3.372584e-03 -0.001753235

The 95% confidence interval for \(\beta_1\) is between 0.1948215 and 0.2837372. In other words, I am 95% confident that the value for \(\beta_1\) will fall in between 0.1948215 and 0.2837372 regardless of whether the mussels are attached to a rock. It can also be stated that 95% of the time, the value for \(\beta_1\) will fall within this interval no matter what the value for I(attached=rock) is.

Confidence Interval and Prediction Intervals for certain values of x

To calculate confident intervals and prediction intervals, we need to use a specific value of x. We can take a look at the data set, and find a value of the average mass to analyze.

muss
##    GroupID dry.mass count attached lipid protein carbo  ash Kcal ammonia
## 1        1     0.55    20     Rock  8.14   47.43 21.59 5.51 3.61    0.07
## 2        2     0.45    19     Rock  9.34   53.89 23.41 6.34 4.06    0.07
## 3        3     0.37    20     Rock  9.12   49.01 21.10 5.63 3.74    0.07
## 4        4     0.63    20     Rock 10.32   49.25 16.55 5.41 3.66    0.11
## 5        5     0.57    20     Rock 10.08   50.17 17.51 6.10 3.72    0.11
## 6        6     0.57    22     Rock 10.83   53.84 19.97 6.36 4.04    0.11
## 7        7     0.56    20     Rock  9.37   47.75 28.95 5.20 4.03    0.11
## 8        8     0.54    20     Rock  9.44   62.99 22.08 5.58 4.41    0.10
## 9        9     0.55    21     Rock  9.84   53.89 24.61 5.77 4.15    0.10
## 10      10     0.55    20     Rock 10.18   55.34 29.89 6.09 4.46    0.10
## 11      11     0.43    20     Rock 10.75   49.92 25.06   NA 4.08    0.08
## 12      12     0.51    19     Rock  9.98   54.16 20.52 5.55 4.01    0.09
## 13      13     0.34    20     Rock 10.44   58.43 21.85 5.44 4.29    0.07
## 14      14     0.34    20     Rock 10.24   53.94 20.09 6.37 4.00    0.07
## 15      15     0.30    20     Rock  9.00   71.94 18.52 4.62 4.61    0.05
## 16      16     0.83    34  Amblema 10.33   62.63 14.51 6.03 4.16    0.21
## 17      17     0.49    31  Amblema  7.91   59.62 17.40 5.42 3.94    0.09
## 18      18     0.46    72  Amblema  8.28   56.67 15.76 5.17 3.78    0.18
## 19      19     0.44    23  Amblema  6.82   53.20  9.97 5.39 3.27    0.14
## 20      20     1.16    27  Amblema  9.07   62.99 15.94 6.45 4.12    0.35
## 21      22     1.10    23  Amblema  6.50   59.36 22.95 6.57 4.04    0.25
## 22      23     1.07    42  Amblema  8.13   56.64 18.39 5.16 3.87    0.41
## 23      24     0.36    38  Amblema  8.27   43.49 21.10 4.91 3.43    0.11
## 24      25     0.47    38  Amblema 10.30   54.12 16.81 5.33 3.88    0.13
## 25      27     0.43    26  Amblema  9.53   52.88 15.60 5.55 3.71    0.12
## 26      28     0.89    28  Amblema  8.19   60.14    NA 5.53   NA    0.25
## 27      29     0.70    31  Amblema  9.44   48.96 20.83 5.76 3.75    0.26
## 28      30     0.68    64  Amblema  8.04      NA 17.89   NA   NA    0.23
##      O2  AvgAmmonia   AvgO2  AvgMass
## 1  0.82 0.003500000 0.04100 0.027500
## 2  0.70 0.003684210 0.03684 0.023684
## 3  0.62 0.003500000 0.03100 0.018500
## 4  0.89 0.005500000 0.04450 0.031500
## 5  1.09 0.005500000 0.05450 0.028500
## 6  1.00 0.005000000 0.04545 0.025909
## 7  0.92 0.005500000 0.04600 0.028000
## 8  0.92 0.005000000 0.04600 0.027000
## 9  0.93 0.004761905 0.04429 0.026190
## 10 0.92 0.005000000 0.04600 0.027500
## 11 0.77 0.004000000 0.03850 0.021500
## 12 0.80 0.004736842 0.04211 0.026842
## 13 0.57 0.003500000 0.02850 0.017000
## 14 0.62 0.003500000 0.03100 0.017000
## 15 0.38 0.002500000 0.01900 0.015000
## 16 1.93 0.006176471 0.05676 0.024412
## 17 0.84 0.002903226 0.02710 0.015806
## 18 1.05 0.002500000 0.01458 0.006389
## 19 1.05 0.006086956 0.04565 0.019130
## 20 2.39 0.012962963 0.08852 0.042963
## 21 2.07 0.010869565 0.09000 0.047826
## 22 2.77 0.009761905 0.06595 0.025476
## 23 0.79 0.002894737 0.02079 0.009474
## 24 0.97 0.003421053 0.02553 0.012368
## 25 1.24 0.004615385 0.04769 0.016538
## 26 2.68 0.008928571 0.09571 0.031786
## 27 2.26 0.008387097 0.07290 0.022581
## 28 2.05 0.003593750 0.03203 0.010625

I chose 0.02 as the avg mass to analyze. Any value within the cloud of data is valid. I am also looking at mussels that are attached to a rock

newdata<- data.frame(attached = "Rock", AvgMass = 0.02)

(confy<-predict(mymod, newdata, interval = "confidence"))
##           fit         lwr         upr
## 1 0.003362491 0.002785026 0.003939956
(predy<-predict(mymod, newdata, interval = "prediction"))
##           fit         lwr         upr
## 1 0.003362491 0.001163617 0.005561366

The confidence interval is [0.002785026,0.003939956]. This means that with 95% confidence, for mussels at mass 0.02 and are attached to a rock, the mean AvgAmmonia output is between 0.002785026 and 0.003939956. The prediction interval is [0.001163617,0.005561366]. This means that with 95% confidence, for mussels at a mass of 0.02 and are attached to a rock, the individual AvgAmonnia output is between 0.001163617 and 0.005561366. Overall, the prediction interval is going to be wider than the confidence interval since it is for the individual value of output, as opposed to the mean value, which is going to have a lower variance due to the variance being divided by n observations.