Getting data set up in R and creating initial linear regression model
data(iris)
attach(iris)
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
mod1 <- lm(Petal.Width~Petal.Length)
summary(mod1)
##
## Call:
## lm(formula = Petal.Width ~ Petal.Length)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.56515 -0.12358 -0.01898 0.13288 0.64272
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.363076 0.039762 -9.131 4.7e-16 ***
## Petal.Length 0.415755 0.009582 43.387 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2065 on 148 degrees of freedom
## Multiple R-squared: 0.9271, Adjusted R-squared: 0.9266
## F-statistic: 1882 on 1 and 148 DF, p-value: < 2.2e-16
High r-squared shows that the length is a good predictor of length.
Testing new correlation and confidence/prediction interval code.
confint(mod1, level=.99)
## 0.5 % 99.5 %
## (Intercept) -0.4668327 -0.2593183
## Petal.Length 0.3907505 0.4407604
cor(Petal.Length, Petal.Width)
## [1] 0.9628654
newdata <- data.frame (Petal.Length=2.5)
predy <- predict(mod1, newdata, interval = "predict")
confy <- predict(mod1, newdata, interval = "confidence")
confy %*% c(0, -1, 1) #conf interval width
## [,1]
## 1 0.08191303
predy %*% c(0, -1, 1) #pred interval width
## [,1]
## 1 0.8201774
confy[1] == predy[1]
## [1] TRUE
High correlation between petal length and petal width.
cor(Petal.Length, Petal.Width)
## [1] 0.9628654
The confidence interval at an alpha level of .99 is between .39075 and .44076 for petal length (the predictor).
confint(mod1, level=.99)
## 0.5 % 99.5 %
## (Intercept) -0.4668327 -0.2593183
## Petal.Length 0.3907505 0.4407604
After that I tried making a prediction for a flower of petal length 2.5. I tried making both confidence and prediction intervals for this new data.
newdata <- data.frame (Petal.Length=2.5)
predy <- predict(mod1, newdata, interval = "predict")
confy <- predict(mod1, newdata, interval = "confidence")
confy %*% c(0, -1, 1) #conf interval width
## [,1]
## 1 0.08191303
predy %*% c(0, -1, 1) #pred interval width
## [,1]
## 1 0.8201774
confy[1] == predy[1]
## [1] TRUE
\[\hat{y_i}= \hat{\beta_0}+\hat{\beta_1} x_i\]
While most of this was stuff that we covered in STAT 314, it was good to get a refresher on it and especially work with some data in R. I think the biggest area I still need some work in is just in interpreting things such as p-values and confidence intervals. I think I understand them but I think it would be good for me to read about that stuff a little more just to make sure I’m really getting it.