LL6

For this leaning log I will be looking at the iris data in order to determine the effect that species and sepal width have on sepal length.

mod <- lm(Sepal.Length ~ Sepal.Width + Species, data = iris)
summary(mod)

## 
## Call:
## lm(formula = Sepal.Length ~ Sepal.Width + Species, data = iris)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.30711 -0.25713 -0.05325  0.19542  1.41253 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         2.2514     0.3698   6.089 9.57e-09 ***
## Sepal.Width         0.8036     0.1063   7.557 4.19e-12 ***
## Speciesversicolor   1.4587     0.1121  13.012  < 2e-16 ***
## Speciesvirginica    1.9468     0.1000  19.465  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.438 on 146 degrees of freedom
## Multiple R-squared:  0.7259, Adjusted R-squared:  0.7203 
## F-statistic: 128.9 on 3 and 146 DF,  p-value: < 2.2e-16

Hypothesis test: H0 = 0 H1 ??? 0 In context, we are looking at the sepal width coefficient in this hypothesis test. In this case we need to look at assumptions in regards to normality, independence and random sample. I would also like to set alpha to ,05 because this is a common selection that is conversative without being overly conservative. Now we are going to calculate the test stat and p value to know if we should accept or reject the null.

tstat <- coef(summary(mod))[2,1]/coef(summary(mod))[2,2]
tstat

## [1] 7.556598

2*pt(tstat, 146, lower.tail=FALSE)

## [1] 4.18734e-12

From this p-value of 4.18734e-12, we can reject the null hypothesis and say that we found significant evidence that the sepal width is different from zero.

Now its on to confidence intervals!

confint(mod)

##                       2.5 %   97.5 %
## (Intercept)       1.5206309 2.982156
## Sepal.Width       0.5933983 1.013723
## Speciesversicolor 1.2371791 1.680307
## Speciesvirginica  1.7491525 2.144481

From this we can say that we are 95% confident that the true population of sepal width is between 0.5933983 and 1.013723. This is important because the interval does not contain zero, so we are 95% confident that the true population of sepal width does not include 0.

Next we move on to predicting and confidence intervals

newdata <- data.frame(Sepal.Width=2, Species="virginica")
confint <- predict(mod, newdata, interval = "confidence")
confint

##        fit      lwr      upr
## 1 5.805332 5.566826 6.043838

We are 95% confident that the mean sepal length of a virginica iris with a sepal width of 2 units is between 5.566826 and 6.043838 units.

newdata <- data.frame(Sepal.Width=2, Species="virginica")
predint <- predict(mod, newdata, interval = "predict")
predint

##        fit      lwr      upr
## 1 5.805332 4.907519 6.703145

We are 95% confident that the mean sepal length of a virginica iris with a sepal width of 2 units is between 4.907519 and 6.703145 units. Given the formula for prediction interval relative to the confidence interval it makes sense that this would be wider. It is also important to notice that the point estimate is the same for both.

 confint[1] == predint[1]

## [1] TRUE

LL6

Cassandra Carter

February 27, 2018