In this report, we study the relationship between the number of days the ozone level exceeds 0.20 ppm(Days) and seasonal meteorological index(Index).
The goal is to determine whether the meteorological index is useful for explaining variation in the number of high-ozone days.
we use a simple linear regression model
\[ Y= \beta _{0}+\beta _{1}X+\epsilon , \]
Where Y is the number of days exceeding the ozone threshold and X is the meteorological index.
library(ggplot2)
Days <- c(91,105,106,108,88,91,58,82,81,65,61,48,61,43,33,36)
Index<- c(16.7,17.1,18.2,18.1,17.2,18.2,16,17.2,18,17.2,16.9,17.1,18.2,17.3,17.5,16.6)
df <- data.frame(Index,Days)
knitr::kable(df)
| Index | Days |
|---|---|
| 16.7 | 91 |
| 17.1 | 105 |
| 18.2 | 106 |
| 18.1 | 108 |
| 17.2 | 88 |
| 18.2 | 91 |
| 16.0 | 58 |
| 17.2 | 82 |
| 18.0 | 81 |
| 17.2 | 65 |
| 16.9 | 61 |
| 17.1 | 48 |
| 18.2 | 61 |
| 17.3 | 43 |
| 17.5 | 33 |
| 16.6 | 36 |
model <- lm(Days~Index,data = df)
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | -192.98383 | 163.503284 | -1.180306 | 0.2575450 |
| Index | 15.29637 | 9.420975 | 1.623650 | 0.1267446 |
| R_squared | Residual_SE |
|---|---|
| 0.1584636 | 23.79496 |
From the fitted model,the estimated regression equation is\[ \hat{Y}=-192.984+15.296X\]
Where Y is the number of days the ozone level exceeds 0.20 ppm and X is the meteorological index.
the estimated slope is 15.296,which means, on average ,an increase of one unit in meteorological index is associated with increase of about 15.3 days in which the ozone level exceeds 0.20 ppm.
The estimated intercept is -192.984,which represents the expected number of high-ozone days when the meteorological index is equal to zero.
from the regression output,the value of \(R^{2}=0.1584\)indicate that 15.84% of variability in the number of high-ozone days is explained by the meteorological index.
\[ H_{0}:\beta _1= 0 \quad\text{versus} \quad H_1:\beta _{1}\neq 0 \]
The standard error of the slope is 9.421, and the corresponding t-statistic is 1.624 with a p-value of 0.127. At the 5% significance level\((\alpha = 0.05)\), this provide insufficient evidence that the slope is different from zero. Therefore, the meteorological index is not a statistically significant predictor of the number of high-ozone days in this data set.
The confidence intervals describe uncertainty in the estimated mean response ,while the prediction intervals describe uncertainty in future observations.
min(Index)
## [1] 16
max(Index)
## [1] 18.2
newx <- seq(16,18.2,0.2)
conf <- predict(model,data.frame(Index=newx),interval = "confidence")
pred <- predict(model,data.frame(Index=newx),interval = "prediction")
yl <- range(c(Days,pred[,2],pred[,3]))
plot(Index,Days,ylim = yl)
abline(model)
lines(newx,conf[,2],col="hotpink")
lines(newx,conf[,3],col="hotpink")
lines(newx,pred[,2],col="purple")
lines(newx,pred[,3],col="purple")
| fit | lwr | upr |
|---|---|---|
| 51.75801 | 21.75791 | 81.75811 |
| 54.81728 | 28.41869 | 81.21588 |
| 57.87656 | 34.93253 | 80.82058 |
| 60.93583 | 41.22205 | 80.64961 |
| 63.99510 | 47.15763 | 80.83258 |
| 67.05437 | 52.52748 | 81.58127 |
| 70.11365 | 57.02842 | 83.19887 |
| 73.17292 | 60.36362 | 85.98222 |
| 76.23219 | 62.46281 | 90.00157 |
| 79.29147 | 63.55057 | 95.03237 |
| 82.35074 | 63.94915 | 100.75233 |
| 85.41001 | 63.91295 | 106.90708 |
| fit | lwr | upr |
|---|---|---|
| 51.75801 | -7.441550 | 110.9576 |
| 54.81728 | -2.641118 | 112.2757 |
| 57.87656 | 1.921126 | 113.8320 |
| 60.93583 | 6.225545 | 115.6461 |
| 63.99510 | 10.254218 | 117.7360 |
| 67.05437 | 13.992029 | 120.1167 |
| 70.11365 | 17.427738 | 122.7996 |
| 73.17292 | 20.554862 | 125.7910 |
| 76.23219 | 23.372211 | 129.0922 |
| 79.29147 | 25.883996 | 132.6989 |
| 82.35074 | 28.099467 | 136.6020 |
| 85.41001 | 30.032167 | 140.7879 |
df
plot(Index,Days)
abline(model)
summary(model)
conf
pred