Include test and CI for reg coef of either the mussels data or another data set. Include the full process (hypothesis, test stat, pval, conclusion in context of prolem) for HT. Include the CI interpretation. **good data set: mosaic

Learning Log 6

mussels <- read.csv(url("http://cknudson.com/data/mussels.csv"))
head(mussels)
##   GroupID dry.mass count attached lipid protein carbo  ash Kcal ammonia
## 1       1     0.55    20     Rock  8.14   47.43 21.59 5.51 3.61    0.07
## 2       2     0.45    19     Rock  9.34   53.89 23.41 6.34 4.06    0.07
## 3       3     0.37    20     Rock  9.12   49.01 21.10 5.63 3.74    0.07
## 4       4     0.63    20     Rock 10.32   49.25 16.55 5.41 3.66    0.11
## 5       5     0.57    20     Rock 10.08   50.17 17.51 6.10 3.72    0.11
## 6       6     0.57    22     Rock 10.83   53.84 19.97 6.36 4.04    0.11
##     O2 AvgAmmonia   AvgO2  AvgMass
## 1 0.82 0.00350000 0.04100 0.027500
## 2 0.70 0.00368421 0.03684 0.023684
## 3 0.62 0.00350000 0.03100 0.018500
## 4 0.89 0.00550000 0.04450 0.031500
## 5 1.09 0.00550000 0.05450 0.028500
## 6 1.00 0.00500000 0.04545 0.025909
names(mussels)
##  [1] "GroupID"    "dry.mass"   "count"      "attached"   "lipid"     
##  [6] "protein"    "carbo"      "ash"        "Kcal"       "ammonia"   
## [11] "O2"         "AvgAmmonia" "AvgO2"      "AvgMass"
attach(mussels)

The linear model I will use will be the average ammonia (AvgAmmonia) as a function of what the mussel is attached to (attached) and how big it is (AvgMass).

musmod <- lm(AvgAmmonia ~ AvgMass+attached)
musmod
## 
## Call:
## lm(formula = AvgAmmonia ~ AvgMass + attached)
## 
## Coefficients:
##  (Intercept)       AvgMass  attachedRock  
##     0.001140      0.239279     -0.002563

From the linear model we can see the line is: AvgAmmonia = .001140 + .239279AvgMass -.002563I(attach=rock) + ??. Interpreting this by each variable. For average mass: for every gram (or unit that it is meausured) increases, the average ammonia will increase by .2393 units, holding all else constant. For the attached: if the mussel is attached to a rock, then it will release .2226 units of ammonia less than if the mussel was attached to an Amblema.

Next, we will conduct a hypothesis test. We will see if the Average Mass of a mussel is a good predictor for the average ammonia output. The null hyptothesis is that beta(1), the coefficient of AvgMass=0. The alternative hypothesis is that AvgMass???0. To see this we use the summary() function.

summary(musmod)
## 
## Call:
## lm(formula = AvgAmmonia ~ AvgMass + attached)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -2.019e-03 -5.240e-04 -5.959e-05  3.429e-04  2.526e-03 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.0011398  0.0005533   2.060     0.05 *  
## AvgMass       0.2392793  0.0215863  11.085 3.86e-11 ***
## attachedRock -0.0025629  0.0003931  -6.519 7.91e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.00103 on 25 degrees of freedom
## Multiple R-squared:  0.8574, Adjusted R-squared:  0.846 
## F-statistic: 75.18 on 2 and 25 DF,  p-value: 2.66e-11

Because we are looking at the AvgMass , the test stat is 11.085. This is determined by taking beta(1) and dividing by the standard error of beta(1). The p-val is 3.86e-11, which is less than our alpha (.05). We are able to reject the null, so we know that there is a linear relationship (after accounting for all others) between the average ammonia output and the average mass input.

Next, we will look at a 90% confidence interval for our linear model.

confint(musmod, level=.9)
##                        5 %         95 %
## (Intercept)   0.0001946405  0.002084987
## AvgMass       0.2024068478  0.276151813
## attachedRock -0.0032344372 -0.001891381

We can interpret the last 2 rows of this data. In terms of average mass, we are 90% sure that the average mass of these mussels are between [.2024, .2762], holding all else constant. For the attached rock, we are 90% sure that when the mussel is attached to a rock (instead of an amblema), it will release [-.0032, -.0019] ammonia.

We can also do confidence and prediction intervals for a specified value. We will chose a specific mass to use (one that we know is in the data set) to use are our point.

summary(mussels)
##     GroupID         dry.mass          count          attached 
##  Min.   : 1.00   Min.   :0.3000   Min.   :19.00   Amblema:13  
##  1st Qu.: 7.75   1st Qu.:0.4375   1st Qu.:20.00   Rock   :15  
##  Median :14.50   Median :0.5450   Median :21.50               
##  Mean   :14.93   Mean   :0.5836   Mean   :27.79               
##  3rd Qu.:22.25   3rd Qu.:0.6425   3rd Qu.:31.00               
##  Max.   :30.00   Max.   :1.1600   Max.   :72.00               
##                                                               
##      lipid           protein          carbo            ash       
##  Min.   : 6.500   Min.   :43.49   Min.   : 9.97   Min.   :4.620  
##  1st Qu.: 8.250   1st Qu.:50.05   1st Qu.:17.11   1st Qu.:5.395  
##  Median : 9.405   Median :53.94   Median :20.09   Median :5.550  
##  Mean   : 9.210   Mean   :54.91   Mean   :19.96   Mean   :5.663  
##  3rd Qu.:10.195   3rd Qu.:58.90   3rd Qu.:21.96   3rd Qu.:6.075  
##  Max.   :10.830   Max.   :71.94   Max.   :29.89   Max.   :6.570  
##                   NA's   :1       NA's   :1       NA's   :2      
##       Kcal          ammonia             O2           AvgAmmonia      
##  Min.   :3.270   Min.   :0.0500   Min.   :0.3800   Min.   :0.002500  
##  1st Qu.:3.743   1st Qu.:0.0875   1st Qu.:0.7975   1st Qu.:0.003500  
##  Median :4.005   Median :0.1100   Median :0.9250   Median :0.004749  
##  Mean   :3.955   Mean   :0.1443   Mean   :1.2157   Mean   :0.005296  
##  3rd Qu.:4.110   3rd Qu.:0.1875   3rd Qu.:1.4125   3rd Qu.:0.005647  
##  Max.   :4.610   Max.   :0.4100   Max.   :2.7700   Max.   :0.012963  
##  NA's   :2                                                           
##      AvgO2            AvgMass        
##  Min.   :0.01458   Min.   :0.006389  
##  1st Qu.:0.03100   1st Qu.:0.016885  
##  Median :0.04439   Median :0.024048  
##  Mean   :0.04564   Mean   :0.023107  
##  3rd Qu.:0.04939   3rd Qu.:0.027500  
##  Max.   :0.09571   Max.   :0.047826  
## 

We will use the 3rd quartile marker, the .0275 as our data frame.

newdata <- data.frame(AvgMass = .0275, attached = "Rock")
newdata
##   AvgMass attached
## 1  0.0275     Rock

We can now find specific prediction and confidence intervals for mussels that have a mass of .0275 attached to rocks.

predint <- predict(musmod, newdata, interval = "predict")
predint
##           fit         lwr         upr
## 1 0.005157086 0.002960629 0.007353544

Our prediction interval tells us that when a mussel is attached to a rock and weights .0275, we are 95% sure that the average ammonia output will be between [.00296, .00735].

Next we will run the confidence interval.

confident.int <- predict(musmod, newdata, interval = "confidence")
confident.int
##           fit         lwr         upr
## 1 0.005157086 0.004588894 0.005725278

Our 95% confidence interval shows that the mean average mass that’s .0275 and attached to a rock, the mean average ammonia output, we are 95% confident about, will be between [.0046, .0057].

In conclusion, a lot of the topics covered in the lesson today we also covered in simple linear regression. The main difference is to remember to mention one has to hold all the other variables constant when interpreting the different betas and intercepts. The confidence and predicition interval widths are similar; again, the prediction interval has a wider set of numbers. This can be easily explained by what the intervals are measuring. The prediction interval is measuring the variance of just one point (so it can go in a lot more directions), while the confidence interval is measuring the variance of the means, which should have a lot less variatability.