1 Introduction

In this report, we study the relationship between the number of days the ozone level exceeds 0.20 ppm(Days) and seasonal meteorological index(Index).

The goal is to determine whether the meteorological index is useful for explaining variation in the number of high-ozone days.

we use a simple linear regression model

\[ Y= \beta _{0}+\beta _{1}X+\epsilon , \]

Where Y is the number of days exceeding the ozone threshold and X is the meteorological index.

2 Data

library(ggplot2)
Days <- c(91,105,106,108,88,91,58,82,81,65,61,48,61,43,33,36)
Index<- c(16.7,17.1,18.2,18.1,17.2,18.2,16,17.2,18,17.2,16.9,17.1,18.2,17.3,17.5,16.6)
df <- data.frame(Index,Days)
knitr::kable(df)
Index Days
16.7 91
17.1 105
18.2 106
18.1 108
17.2 88
18.2 91
16.0 58
17.2 82
18.0 81
17.2 65
16.9 61
17.1 48
18.2 61
17.3 43
17.5 33
16.6 36

2.1 Scatter plot of Days versus Index

2.2 Fitting the regression model

model <- lm(Days~Index,data = df)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -192.98383 163.503284 -1.180306 0.2575450
Index 15.29637 9.420975 1.623650 0.1267446
R_squared Residual_SE
0.1584636 23.79496

2.3 Regression coefficient

From the fitted model,the estimated regression equation is\[ \hat{Y}=-192.984+15.296X\]

Where Y is the number of days the ozone level exceeds 0.20 ppm and X is the meteorological index.

the estimated slope is 15.296,which means, on average ,an increase of one unit in meteorological index is associated with increase of about 15.3 days in which the ozone level exceeds 0.20 ppm.

The estimated intercept is -192.984,which represents the expected number of high-ozone days when the meteorological index is equal to zero.

2.3.1 R-squared

from the regression output,the value of \(R^{2}=0.1584\)indicate that 15.84% of variability in the number of high-ozone days is explained by the meteorological index.

2.3.2 Hypothesis test on the slope

\[ H_{0}:\beta _1= 0 \quad\text{versus} \quad H_1:\beta _{1}\neq 0 \]

The standard error of the slope is 9.421, and the corresponding t-statistic is 1.624 with a p-value of 0.127. At the 5% significance level\((\alpha = 0.05)\), this provide insufficient evidence that the slope is different from zero. Therefore, the meteorological index is not a statistically significant predictor of the number of high-ozone days in this data set.

2.3.3 Fitted model with confidence and prediction intervals

The confidence intervals describe uncertainty in the estimated mean response ,while the prediction intervals describe uncertainty in future observations.

min(Index)
## [1] 16
max(Index)
## [1] 18.2
newx <- seq(16,18.2,0.2)
conf <- predict(model,data.frame(Index=newx),interval = "confidence")
pred <- predict(model,data.frame(Index=newx),interval = "prediction")
yl <- range(c(Days,pred[,2],pred[,3]))
plot(Index,Days,ylim = yl)
abline(model)
lines(newx,conf[,2],col="hotpink")
lines(newx,conf[,3],col="hotpink")
lines(newx,pred[,2],col="purple")
lines(newx,pred[,3],col="purple")

confidence imterval for mean response
fit lwr upr
51.75801 21.75791 81.75811
54.81728 28.41869 81.21588
57.87656 34.93253 80.82058
60.93583 41.22205 80.64961
63.99510 47.15763 80.83258
67.05437 52.52748 81.58127
70.11365 57.02842 83.19887
73.17292 60.36362 85.98222
76.23219 62.46281 90.00157
79.29147 63.55057 95.03237
82.35074 63.94915 100.75233
85.41001 63.91295 106.90708
prediction interval for new observation
fit lwr upr
51.75801 -7.441550 110.9576
54.81728 -2.641118 112.2757
57.87656 1.921126 113.8320
60.93583 6.225545 115.6461
63.99510 10.254218 117.7360
67.05437 13.992029 120.1167
70.11365 17.427738 122.7996
73.17292 20.554862 125.7910
76.23219 23.372211 129.0922
79.29147 25.883996 132.6989
82.35074 28.099467 136.6020
85.41001 30.032167 140.7879

3 Complete R code

df
plot(Index,Days)
abline(model)
summary(model)
conf
pred