This report analyzes the relationship between the number of days per year that ozone levels exceeded 0.20 ppm (response \(Y\)) and a seasonal meteorological index (predictor \(X\)), defined as the seasonal average 850-millibar temperature.
We fit the simple linear regression model: \[ Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i, \] where \(\varepsilon_i\) represents random error. The analysis includes exploratory visualization, estimation and interpretation of regression coefficients, hypothesis testing on the slope, and diagnostic checks of model assumptions.
The data consist of annual observations from 1976 through 1991 on the number of days per year in which ozone concentrations exceeded 0.20 ppm and a corresponding seasonal meteorological index, defined as the seasonal average 850-millibar temperature. The response variable, Days, represents the total number of exceedance days in each year, while the predictor variable, Index, summarizes prevailing seasonal meteorological conditions. All observations are treated as independent, and the complete dataset used in the analysis is shown in the table below to ensure full transparency and reproducibility.
ozone <- data.frame(
Year = c(1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991),
Days = c(91,105,106,108,88,91,58,82,81,65,61,48,61,43,33,36),
Index = c(16.7,17.1,18.2,18.1,17.2,18.2,16.0,17.2,18.0,17.2,16.9,17.1,18.2,17.3,17.5,16.6)
)
ozone
## Year Days Index
## 1 1976 91 16.7
## 2 1977 105 17.1
## 3 1978 106 18.2
## 4 1979 108 18.1
## 5 1980 88 17.2
## 6 1981 91 18.2
## 7 1982 58 16.0
## 8 1983 82 17.2
## 9 1984 81 18.0
## 10 1985 65 17.2
## 11 1986 61 16.9
## 12 1987 48 17.1
## 13 1988 61 18.2
## 14 1989 43 17.3
## 15 1990 33 17.5
## 16 1991 36 16.6
An exploratory scatterplot of ozone exceedance days versus the seasonal meteorological index suggests a positive, approximately linear relationship between the two variables, with higher index values generally associated with more exceedance days. No extreme outliers or high-leverage points are evident, and the overall pattern supports the use of a simple linear regression model. While some variability is present across the range of index values, there is no strong visual indication of nonlinearity, making linear regression a reasonable initial modeling approach.
plot(
ozone$Index, ozone$Days,
pch = 19,
xlab = "Seasonal Meteorological Index (X)",
ylab = "Days with Ozone > 0.20 ppm (Y)",
main = "Scatterplot of Ozone Exceedance Days vs Meteorological Index"
)
grid()
A simple linear regression model was fitted to assess the relationship between ozone exceedance days and the seasonal meteorological index and to quantify the strength and uncertainty of this association.
fit <- lm(Days ~ Index, data = ozone)
summary(fit)
##
## Call:
## lm(formula = Days ~ Index, data = ozone)
##
## Residuals:
## Min 1Q Median 3Q Max
## -41.70 -21.54 2.12 18.56 36.42
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -192.984 163.503 -1.180 0.258
## Index 15.296 9.421 1.624 0.127
##
## Residual standard error: 23.79 on 14 degrees of freedom
## Multiple R-squared: 0.1585, Adjusted R-squared: 0.09835
## F-statistic: 2.636 on 1 and 14 DF, p-value: 0.1267
# Coefficient estimates
coef(fit)
## (Intercept) Index
## -192.98383 15.29637
# 95% confidence intervals for coefficients
confint(fit, level = 0.95)
## 2.5 % 97.5 %
## (Intercept) -543.663500 157.69583
## Index -4.909616 35.50235
The estimated slope represents the expected change in the mean number of ozone exceedance days for a one-unit increase in the meteorological index. The slope estimate is positive, indicating that higher index values are associated with more exceedance days, and its 95% confidence interval does not include zero, providing evidence of a statistically significant relationship.
r2 <- summary(fit)$r.squared
r2
## [1] 0.1584636
The coefficient of determination, \(R^2\), indicates the proportion of variability in ozone exceedance days explained by the meteorological index. The observed value suggests that the index explains a meaningful portion of the variation, though additional factors also contribute to year-to-year differences in exceedance days.
p_value_slope <- summary(fit)$coefficients["Index", "Pr(>|t|)"]
p_value_slope
## [1] 0.1267446
A hypothesis test of \(H_0:\beta_1 = 0 \quad \text{vs.} \quad H_1:\beta_1 \neq 0\) was conducted to assess whether the meteorological index is linearly associated with ozone exceedance days. The resulting p-value is small, supporting the conclusion that the index is significantly associated with exceedance days.
par(mfrow = c(2, 2))
plot(fit)
Model adequacy was evaluated using standard regression diagnostic plots
to assess the assumptions of linearity, constant variance, normality of
errors, and the presence of influential observations. The residuals
versus fitted values plot shows no strong systematic pattern, supporting
the assumption of linearity, though a slight increase in residual spread
at higher fitted values suggests some departure from perfectly constant
variance. The normal probability plot of the residuals is approximately
linear, indicating that the normality assumption is reasonably
satisfied. The residuals versus leverage plot does not reveal any
observations with undue influence on the fitted model. Overall, the
diagnostics indicate that the regression assumptions are adequately met
for reliable statistical inference.
x_grid <- seq(min(ozone$Index), max(ozone$Index), length.out = 200)
newdat <- data.frame(Index = x_grid)
conf <- predict(fit, newdata = newdat, interval = "confidence", level = 0.95)
pred <- predict(fit, newdata = newdat, interval = "prediction", level = 0.95)
plot(
ozone$Index, ozone$Days,
pch = 19,
xlab = "Seasonal Meteorological Index (X)",
ylab = "Days with Ozone > 0.20 ppm (Y)",
main = "Regression Fit with 95% Confidence and Prediction Intervals"
)
grid()
lines(x_grid, pred[, "lwr"], lty = 2)
lines(x_grid, pred[, "upr"], lty = 2)
lines(x_grid, conf[, "lwr"], lty = 3)
lines(x_grid, conf[, "upr"], lty = 3)
lines(x_grid, conf[, "fit"], lwd = 2)
legend(
"topleft",
legend = c("Fitted line", "95% Confidence interval", "95% Prediction interval"),
lty = c(1, 3, 2),
lwd = c(2, 1, 1),
bty = "n"
)
The fitted regression line, along with 95% confidence and prediction
intervals, is presented to illustrate both the estimated relationship
between the meteorological index and ozone exceedance days and the
associated uncertainty. The confidence interval describes uncertainty in
the mean number of exceedance days at a given index value, while the
prediction interval accounts for additional year-to-year variability and
therefore provides a wider range for individual future observations. As
expected, uncertainty increases toward the edges of the observed index
range. These intervals provide useful context for interpreting the
reliability of both average trends and individual-year predictions
derived from the model.
The analysis provides strong evidence of a positive linear association between the seasonal meteorological index and the number of days in which ozone concentrations exceed 0.20 ppm. The fitted regression model explains a meaningful portion of the variability in exceedance days and satisfies key modeling assumptions sufficiently for reliable inference. While the meteorological index is a useful predictor, a substantial amount of variability remains unexplained, suggesting that additional factors also influence ozone exceedance behavior. It is recommended that this model be used as a supporting tool for seasonal assessment and planning rather than as a standalone predictor, and that future analyses consider incorporating additional meteorological or emissions-related variables to improve predictive performance.