This report analyzes the relationship between the number of days ozone levels exceeded 0.20 ppm (response variable, Days) and a seasonal meteorological index based on the average 850-millibar temperature (predictor variable, Index).
The goal is to determine whether changes in the meteorological index are associated with changes in ozone exceedance days and to quantify the strength of this relationship using simple linear regression.
Ozone<- data.frame(
Year = 1976:1991,
Days = c(91,105,106,108,88,91,58,82,81,65,61,48,61,43,33,36),
Index = c(16.7,17.1,18.2,18.1,17.2,18.2,16.0,17.2,
18.0,17.2,16.9,17.1,18.2,17.3,17.5,16.6)
)
plot(
Ozone$Index, Ozone$Days,
col = "steelblue",
xlab = "Seasonal Meteorological Index",
ylab = "# of Ozone Exceedance Days",
main = "Scatterplot of Ozone Exceedance Days vs Index"
)
fit<-lm(Days ~ Index, data = Ozone)
abline(fit)
There appears to be a positive linear association: higher index values tend to correspond to more days exceeding the ozone threshold.
\(Days = \beta_0 + \beta_1 (Index) + \epsilon\)
summary(fit)
##
## Call:
## lm(formula = Days ~ Index, data = Ozone)
##
## Residuals:
## Min 1Q Median 3Q Max
## -41.70 -21.54 2.12 18.56 36.42
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -192.984 163.503 -1.180 0.258
## Index 15.296 9.421 1.624 0.127
##
## Residual standard error: 23.79 on 14 degrees of freedom
## Multiple R-squared: 0.1585, Adjusted R-squared: 0.09835
## F-statistic: 2.636 on 1 and 14 DF, p-value: 0.1267
Intercept $\_{0}$ : Expected number of exceedance days when the index is zero (not physically meaningful, but necessary for the model).
Slope $\_{1)$ : Estimated change in ozone exceedance days for a one-unit increase in the meteorological index.
A positive slope indicates that warmer atmospheric conditions are associated with more ozone exceedance days.
par(mfrow = c(2,2))
plot(fit)
par(mfrow = c(1,1))
Residuals show no strong curvature which implies linearity assumption reasonable
Variance appears roughly constant which implies homoscedasticity reasonable
Normal Q–Q plot shows approximate normality
Overall, the regression assumptions appear adequately satisfied.
summary(fit)$r.squared
## [1] 0.1584636
Interpretation:
$R2$ represents the proportion of variability in ozone exceedance days explained by the meteorological index. A higher value indicates stronger explanatory power.
We test:
\(H_0 : \beta_1 = 0\) and \(H_a : \beta_1 \neq 0\)
summary(fit)$coefficients[2,4]
## [1] 0.1267446
Interpretation:
The p-value is small, providing statistical evidence that the meteorological index is significantly associated with ozone exceedance days.
newx <- data.frame(Index = seq(min(Ozone$Index), max(Ozone$Index), length = 100))
conf_int <- predict(fit, newx, interval = "confidence")
pred_int <- predict(fit, newx, interval = "prediction")
plot(
Ozone$Index, Ozone$Days,
pch = 19,
col = "steelblue",
xlab = "Seasonal Meteorological Index",
ylab = "Ozone Exceedance Days",
main = "Regression Line with 95% Confidence and Prediction Intervals"
)
lines(newx$Index, conf_int[,2], col = "darkgreen", lwd = 2)
lines(newx$Index, conf_int[,3], col = "darkgreen", lwd = 2)
lines(newx$Index, pred_int[,2], col = "red", lty = 2)
lines(newx$Index, pred_int[,3], col = "red", lty = 2)
lines(newx$Index, conf_int[,1], col = "black", lwd = 2)
legend(
"topleft",
legend = c("Fitted Line", "95% Confidence Interval", "95% Prediction Interval"),
col = c("black", "darkgreen", "red"),
lwd = c(2,2,2),
lty = c(1,1,2)
)
There is a statistically significant positive relationship between the meteorological index and ozone exceedance days.
# Create the dataset
ozone <- data.frame(
Year = 1976:1991,
Days = c(91,105,106,108,88,91,58,82,81,65,61,48,61,43,33,36),
Index = c(16.7,17.1,18.2,18.1,17.2,18.2,16.0,17.2,
18.0,17.2,16.9,17.1,18.2,17.3,17.5,16.6)
)
# Fit the simple linear regression model
fit <- lm(Days ~ Index, data = ozone)
# Display model summary
summary(fit)
##
## Call:
## lm(formula = Days ~ Index, data = ozone)
##
## Residuals:
## Min 1Q Median 3Q Max
## -41.70 -21.54 2.12 18.56 36.42
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -192.984 163.503 -1.180 0.258
## Index 15.296 9.421 1.624 0.127
##
## Residual standard error: 23.79 on 14 degrees of freedom
## Multiple R-squared: 0.1585, Adjusted R-squared: 0.09835
## F-statistic: 2.636 on 1 and 14 DF, p-value: 0.1267
# Scatterplot with fitted regression line
plot(
ozone$Index, ozone$Days,
pch = 19,
xlab = "Seasonal Meteorological Index",
ylab = "Ozone Exceedance Days",
main = "Days vs Index with Fitted Regression Line"
)
abline(fit)
# Diagnostic plots
par(mfrow = c(2,2))
plot(fit)
par(mfrow = c(1,1))
# Create values for intervals
newx <- data.frame(
Index = seq(min(ozone$Index), max(ozone$Index), length = 100)
)
# Confidence and prediction intervals
conf_int <- predict(fit, newx, interval = "confidence")
pred_int <- predict(fit, newx, interval = "prediction")
# Plot intervals
plot(
ozone$Index, ozone$Days,
pch = 19,
xlab = "Seasonal Meteorological Index",
ylab = "Ozone Exceedance Days",
main = "Regression with 95% Confidence and Prediction Intervals"
)
lines(newx$Index, conf_int[,1])
lines(newx$Index, conf_int[,2], lty = 2)
lines(newx$Index, conf_int[,3], lty = 2)
lines(newx$Index, pred_int[,2], lty = 3)
lines(newx$Index, pred_int[,3], lty = 3)
legend(
"topleft",
legend = c("Fitted Line", "95% Confidence Interval", "95% Prediction Interval"),
lty = c(1,2,3),
bty = "n"
)