Include test and CI for reg coef of either the mussels data or another data set. Include the full process (hypothesis, test stat, pval, conclusion in context of prolem) for HT. Include the CI interpretation. **good data set: mosaic
mussels <- read.csv(url("http://cknudson.com/data/mussels.csv"))
head(mussels)
## GroupID dry.mass count attached lipid protein carbo ash Kcal ammonia
## 1 1 0.55 20 Rock 8.14 47.43 21.59 5.51 3.61 0.07
## 2 2 0.45 19 Rock 9.34 53.89 23.41 6.34 4.06 0.07
## 3 3 0.37 20 Rock 9.12 49.01 21.10 5.63 3.74 0.07
## 4 4 0.63 20 Rock 10.32 49.25 16.55 5.41 3.66 0.11
## 5 5 0.57 20 Rock 10.08 50.17 17.51 6.10 3.72 0.11
## 6 6 0.57 22 Rock 10.83 53.84 19.97 6.36 4.04 0.11
## O2 AvgAmmonia AvgO2 AvgMass
## 1 0.82 0.00350000 0.04100 0.027500
## 2 0.70 0.00368421 0.03684 0.023684
## 3 0.62 0.00350000 0.03100 0.018500
## 4 0.89 0.00550000 0.04450 0.031500
## 5 1.09 0.00550000 0.05450 0.028500
## 6 1.00 0.00500000 0.04545 0.025909
names(mussels)
## [1] "GroupID" "dry.mass" "count" "attached" "lipid"
## [6] "protein" "carbo" "ash" "Kcal" "ammonia"
## [11] "O2" "AvgAmmonia" "AvgO2" "AvgMass"
attach(mussels)
The linear model I will use will be the average ammonia (AvgAmmonia) as a function of what the mussel is attached to (attached) and how big it is (AvgMass).
musmod <- lm(AvgAmmonia ~ AvgMass+attached)
musmod
##
## Call:
## lm(formula = AvgAmmonia ~ AvgMass + attached)
##
## Coefficients:
## (Intercept) AvgMass attachedRock
## 0.001140 0.239279 -0.002563
From the linear model we can see the line is: AvgAmmonia = .001140 + .239279AvgMass -.002563I(attach=rock) + ??. Interpreting this by each variable. For average mass: for every gram (or unit that it is meausured) increases, the average ammonia will increase by .2393 units, holding all else constant. For the attached: if the mussel is attached to a rock, then it will release .2226 units of ammonia less than if the mussel was attached to an Amblema.
Next, we will conduct a hypothesis test. We will see if the Average Mass of a mussel is a good predictor for the average ammonia output. The null hyptothesis is that beta(1), the coefficient of AvgMass=0. The alternative hypothesis is that AvgMass???0. To see this we use the summary() function.
summary(musmod)
##
## Call:
## lm(formula = AvgAmmonia ~ AvgMass + attached)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.019e-03 -5.240e-04 -5.959e-05 3.429e-04 2.526e-03
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0011398 0.0005533 2.060 0.05 *
## AvgMass 0.2392793 0.0215863 11.085 3.86e-11 ***
## attachedRock -0.0025629 0.0003931 -6.519 7.91e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.00103 on 25 degrees of freedom
## Multiple R-squared: 0.8574, Adjusted R-squared: 0.846
## F-statistic: 75.18 on 2 and 25 DF, p-value: 2.66e-11
Because we are looking at the AvgMass , the test stat is 11.085. This is determined by taking beta(1) and dividing by the standard error of beta(1). The p-val is 3.86e-11, which is less than our alpha (.05). We are able to reject the null, so we know that there is a linear relationship (after accounting for all others) between the average ammonia output and the average mass input.
Next, we will look at a 90% confidence interval for our linear model.
confint(musmod, level=.9)
## 5 % 95 %
## (Intercept) 0.0001946405 0.002084987
## AvgMass 0.2024068478 0.276151813
## attachedRock -0.0032344372 -0.001891381
We can interpret the last 2 rows of this data. In terms of average mass, we are 90% sure that the average mass of these mussels are between [.2024, .2762], holding all else constant. For the attached rock, we are 90% sure that when the mussel is attached to a rock (instead of an amblema), it will release [-.0032, -.0019] ammonia.
We can also do confidence and prediction intervals for a specified value. We will chose a specific mass to use (one that we know is in the data set) to use are our point.
summary(mussels)
## GroupID dry.mass count attached
## Min. : 1.00 Min. :0.3000 Min. :19.00 Amblema:13
## 1st Qu.: 7.75 1st Qu.:0.4375 1st Qu.:20.00 Rock :15
## Median :14.50 Median :0.5450 Median :21.50
## Mean :14.93 Mean :0.5836 Mean :27.79
## 3rd Qu.:22.25 3rd Qu.:0.6425 3rd Qu.:31.00
## Max. :30.00 Max. :1.1600 Max. :72.00
##
## lipid protein carbo ash
## Min. : 6.500 Min. :43.49 Min. : 9.97 Min. :4.620
## 1st Qu.: 8.250 1st Qu.:50.05 1st Qu.:17.11 1st Qu.:5.395
## Median : 9.405 Median :53.94 Median :20.09 Median :5.550
## Mean : 9.210 Mean :54.91 Mean :19.96 Mean :5.663
## 3rd Qu.:10.195 3rd Qu.:58.90 3rd Qu.:21.96 3rd Qu.:6.075
## Max. :10.830 Max. :71.94 Max. :29.89 Max. :6.570
## NA's :1 NA's :1 NA's :2
## Kcal ammonia O2 AvgAmmonia
## Min. :3.270 Min. :0.0500 Min. :0.3800 Min. :0.002500
## 1st Qu.:3.743 1st Qu.:0.0875 1st Qu.:0.7975 1st Qu.:0.003500
## Median :4.005 Median :0.1100 Median :0.9250 Median :0.004749
## Mean :3.955 Mean :0.1443 Mean :1.2157 Mean :0.005296
## 3rd Qu.:4.110 3rd Qu.:0.1875 3rd Qu.:1.4125 3rd Qu.:0.005647
## Max. :4.610 Max. :0.4100 Max. :2.7700 Max. :0.012963
## NA's :2
## AvgO2 AvgMass
## Min. :0.01458 Min. :0.006389
## 1st Qu.:0.03100 1st Qu.:0.016885
## Median :0.04439 Median :0.024048
## Mean :0.04564 Mean :0.023107
## 3rd Qu.:0.04939 3rd Qu.:0.027500
## Max. :0.09571 Max. :0.047826
##
We will use the 3rd quartile marker, the .0275 as our data frame.
newdata <- data.frame(AvgMass = .0275, attached = "Rock")
newdata
## AvgMass attached
## 1 0.0275 Rock
We can now find specific prediction and confidence intervals for mussels that have a mass of .0275 attached to rocks.
predint <- predict(musmod, newdata, interval = "predict")
predint
## fit lwr upr
## 1 0.005157086 0.002960629 0.007353544
Our prediction interval tells us that when a mussel is attached to a rock and weights .0275, we are 95% sure that the average ammonia output will be between [.00296, .00735].
Next we will run the confidence interval.
confident.int <- predict(musmod, newdata, interval = "confidence")
confident.int
## fit lwr upr
## 1 0.005157086 0.004588894 0.005725278
Our 95% confidence interval shows that the mean average mass that’s .0275 and attached to a rock, the mean average ammonia output, we are 95% confident about, will be between [.0046, .0057].
In conclusion, a lot of the topics covered in the lesson today we also covered in simple linear regression. The main difference is to remember to mention one has to hold all the other variables constant when interpreting the different betas and intercepts. The confidence and predicition interval widths are similar; again, the prediction interval has a wider set of numbers. This can be easily explained by what the intervals are measuring. The prediction interval is measuring the variance of just one point (so it can go in a lot more directions), while the confidence interval is measuring the variance of the means, which should have a lot less variatability.