Getting data set up in R and creating initial linear regression model

data(iris)
attach(iris)
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
mod1 <- lm(Petal.Width~Petal.Length)
summary(mod1)
## 
## Call:
## lm(formula = Petal.Width ~ Petal.Length)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.56515 -0.12358 -0.01898  0.13288  0.64272 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -0.363076   0.039762  -9.131  4.7e-16 ***
## Petal.Length  0.415755   0.009582  43.387  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2065 on 148 degrees of freedom
## Multiple R-squared:  0.9271, Adjusted R-squared:  0.9266 
## F-statistic:  1882 on 1 and 148 DF,  p-value: < 2.2e-16

High r-squared shows that the length is a good predictor of length.

Testing new correlation and confidence/prediction interval code.

confint(mod1, level=.99)
##                   0.5 %     99.5 %
## (Intercept)  -0.4668327 -0.2593183
## Petal.Length  0.3907505  0.4407604
cor(Petal.Length, Petal.Width)
## [1] 0.9628654
newdata <- data.frame (Petal.Length=2.5)
predy <- predict(mod1, newdata, interval = "predict")
confy <- predict(mod1, newdata, interval = "confidence")
confy %*% c(0, -1, 1)  #conf interval width
##         [,1]
## 1 0.08191303
predy %*% c(0, -1, 1)  #pred interval width
##        [,1]
## 1 0.8201774
confy[1] == predy[1]
## [1] TRUE

High correlation between petal length and petal width.

cor(Petal.Length, Petal.Width)
## [1] 0.9628654

The confidence interval at an alpha level of .99 is between .39075 and .44076 for petal length (the predictor).

confint(mod1, level=.99)
##                   0.5 %     99.5 %
## (Intercept)  -0.4668327 -0.2593183
## Petal.Length  0.3907505  0.4407604

After that I tried making a prediction for a flower of petal length 2.5. I tried making both confidence and prediction intervals for this new data.

newdata <- data.frame (Petal.Length=2.5)
predy <- predict(mod1, newdata, interval = "predict")
confy <- predict(mod1, newdata, interval = "confidence")
confy %*% c(0, -1, 1)  #conf interval width
##         [,1]
## 1 0.08191303
predy %*% c(0, -1, 1)  #pred interval width
##        [,1]
## 1 0.8201774
confy[1] == predy[1]
## [1] TRUE

\[\hat{y_i}= \hat{\beta_0}+\hat{\beta_1} x_i\]

While most of this was stuff that we covered in STAT 314, it was good to get a refresher on it and especially work with some data in R. I think the biggest area I still need some work in is just in interpreting things such as p-values and confidence intervals. I think I understand them but I think it would be good for me to read about that stuff a little more just to make sure I’m really getting it.