Learning Log #6 Using Mussels Data

First we find and attach our data.

mussels <- read.csv("http://cknudson.com/data/mussels.csv")
attach(mussels)

We want to look at testing the significance of two variables that do not have the same slope. We can test the linear relationship with the response (avg ammonia), after accounting for all other predictors in the model (avg mass and what it is attached to). #Creating the Linear Model Using a summary of the data, we can fill in the coefficients in the formula \(\hat{y_i}= \hat{\beta_0}+\hat{\beta_1} x_1+\hat{\beta_2} x_2\)

MusselsModel = lm(AvgAmmonia ~ AvgMass + attached, mussels)
summary(MusselsModel)

## 
## Call:
## lm(formula = AvgAmmonia ~ AvgMass + attached, data = mussels)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -2.019e-03 -5.240e-04 -5.959e-05  3.429e-04  2.526e-03 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.0011398  0.0005533   2.060     0.05 *  
## AvgMass       0.2392793  0.0215863  11.085 3.86e-11 ***
## attachedRock -0.0025629  0.0003931  -6.519 7.91e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.00103 on 25 degrees of freedom
## Multiple R-squared:  0.8574, Adjusted R-squared:  0.846 
## F-statistic: 75.18 on 2 and 25 DF,  p-value: 2.66e-11

From this summary, we can tell we have a very small p-value. Our \({\beta_0}\) is .0011398, our \({\beta_1}\) is .2392793, and our \({\beta_2}\) -.0025629. In a model with mulitple predictors, you must fix one of the variables to interpret the other’s coefficient. So, when we interpret the coefficients, if the mass is increased by 1, we would expect the average ammonia to be increased by .2392793, ceteris paribus. We can also look at the test statistic to create a hypothesis and decide if we want to reject the null. The general test statistic for testing the significance of independent variables is \({t} = \hat{\beta_i}/(SE)\), which, under \({H_0}\), is a t distribution with n-(k+1) degrees of freedom. We have 25 degrees of freedom here. For a t value of 11.085, our p value is very small (3.86e-11) which means we should reject the null of Beta equal to 0. If we had a pvalue for one of the coefficiencts that was large, we would want to not reject the null of beta equal to zero and remove that from the model. #Confidence and Prediction Intervals Confidence and prediction intervals both tell us the point estimate, then give us lower and upper boundaries. Confidence intervals provide an estimate for a mean value of y, whereas prediction intervals provide boundary estimates for an individual value. The individual boundaries deviate more from the mean than a mean value, so prediction intervals are larger. Let’s confirm that this is true for our data set.

musselsnewdat <-data.frame(AvgMass=.2,attached="Rock")
conf <-predict(MusselsModel, musselsnewdat, interval = "confidence")
conf

##          fit        lwr       upr
## 1 0.04643277 0.03859384 0.0542717

pred <- predict(MusselsModel, musselsnewdat, interval = "prediction")
pred

##          fit        lwr        upr
## 1 0.04643277 0.03831178 0.05455376

They are very close, but we can see that the prediction interval is slightly larger. We can also create confidence intervals for the coefficients to see how much deviation of coefficients there is in our data.

confint(MusselsModel)

##                      2.5 %       97.5 %
## (Intercept)   1.999745e-07  0.002279427
## AvgMass       1.948215e-01  0.283737200
## attachedRock -3.372584e-03 -0.001753235

Learning Log #6 Using Mussels Data

Caitlyn Rome

February 22, 2018