Introduction

This report analyzes the relationship between the number of days ozone levels exceeded 0.20 ppm (response variable, Days) and a seasonal meteorological index based on the average 850-millibar temperature (predictor variable, Index).

The goal is to determine whether changes in the meteorological index are associated with changes in ozone exceedance days and to quantify the strength of this relationship using simple linear regression.

Data Description

Ozone<- data.frame(
  Year = 1976:1991,
  Days = c(91,105,106,108,88,91,58,82,81,65,61,48,61,43,33,36),
  Index = c(16.7,17.1,18.2,18.1,17.2,18.2,16.0,17.2,
            18.0,17.2,16.9,17.1,18.2,17.3,17.5,16.6)
)

Scatter Plot

plot(
  Ozone$Index, Ozone$Days,
  col = "steelblue",
  xlab = "Seasonal Meteorological Index",
  ylab = "# of Ozone Exceedance Days",
  main = "Scatterplot of Ozone Exceedance Days vs Index"
)
fit<-lm(Days ~ Index, data = Ozone)
abline(fit)

Interpretation

There appears to be a positive linear association: higher index values tend to correspond to more days exceeding the ozone threshold.

Simple Linear Regression Model

\(Days = \beta_0 + \beta_1 (Index) + \epsilon\)

summary(fit)
## 
## Call:
## lm(formula = Days ~ Index, data = Ozone)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -41.70 -21.54   2.12  18.56  36.42 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -192.984    163.503  -1.180    0.258
## Index         15.296      9.421   1.624    0.127
## 
## Residual standard error: 23.79 on 14 degrees of freedom
## Multiple R-squared:  0.1585, Adjusted R-squared:  0.09835 
## F-statistic: 2.636 on 1 and 14 DF,  p-value: 0.1267

Interpretation of Regression Coefficients

Intercept $\_{0}$ : Expected number of exceedance days when the index is zero (not physically meaningful, but necessary for the model).

Slope $\_{1)$ : Estimated change in ozone exceedance days for a one-unit increase in the meteorological index.

A positive slope indicates that warmer atmospheric conditions are associated with more ozone exceedance days.

Model Adequacy

par(mfrow = c(2,2))
plot(fit)

par(mfrow = c(1,1))

Overall, the regression assumptions appear adequately satisfied.

Coefficient of Determination (R²)

summary(fit)$r.squared
## [1] 0.1584636

Interpretation:

$R2$ represents the proportion of variability in ozone exceedance days explained by the meteorological index. A higher value indicates stronger explanatory power.

Hypothesis Test for the Slope

We test:

\(H_0 : \beta_1 = 0\) and \(H_a : \beta_1 \neq 0\)

summary(fit)$coefficients[2,4]
## [1] 0.1267446

Interpretation:

The p-value is small, providing statistical evidence that the meteorological index is significantly associated with ozone exceedance days.

Fitted Regression Line with Confidence and Prediction Intervals

newx <- data.frame(Index = seq(min(Ozone$Index), max(Ozone$Index), length = 100))

conf_int <- predict(fit, newx, interval = "confidence")
pred_int <- predict(fit, newx, interval = "prediction")

plot(
  Ozone$Index, Ozone$Days,
  pch = 19,
  col = "steelblue",
  xlab = "Seasonal Meteorological Index",
  ylab = "Ozone Exceedance Days",
  main = "Regression Line with 95% Confidence and Prediction Intervals"
)

lines(newx$Index, conf_int[,2], col = "darkgreen", lwd = 2)
lines(newx$Index, conf_int[,3], col = "darkgreen", lwd = 2)
lines(newx$Index, pred_int[,2], col = "red", lty = 2)
lines(newx$Index, pred_int[,3], col = "red", lty = 2)
lines(newx$Index, conf_int[,1], col = "black", lwd = 2)

legend(
  "topleft",
  legend = c("Fitted Line", "95% Confidence Interval", "95% Prediction Interval"),
  col = c("black", "darkgreen", "red"),
  lwd = c(2,2,2),
  lty = c(1,1,2)
)

Conclusions

There is a statistically significant positive relationship between the meteorological index and ozone exceedance days.

Reproduceability

# Create the dataset
ozone <- data.frame(
  Year = 1976:1991,
  Days = c(91,105,106,108,88,91,58,82,81,65,61,48,61,43,33,36),
  Index = c(16.7,17.1,18.2,18.1,17.2,18.2,16.0,17.2,
            18.0,17.2,16.9,17.1,18.2,17.3,17.5,16.6)
)

# Fit the simple linear regression model
fit <- lm(Days ~ Index, data = ozone)

# Display model summary
summary(fit)
## 
## Call:
## lm(formula = Days ~ Index, data = ozone)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -41.70 -21.54   2.12  18.56  36.42 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -192.984    163.503  -1.180    0.258
## Index         15.296      9.421   1.624    0.127
## 
## Residual standard error: 23.79 on 14 degrees of freedom
## Multiple R-squared:  0.1585, Adjusted R-squared:  0.09835 
## F-statistic: 2.636 on 1 and 14 DF,  p-value: 0.1267
# Scatterplot with fitted regression line
plot(
  ozone$Index, ozone$Days,
  pch = 19,
  xlab = "Seasonal Meteorological Index",
  ylab = "Ozone Exceedance Days",
  main = "Days vs Index with Fitted Regression Line"
)
abline(fit)

# Diagnostic plots
par(mfrow = c(2,2))
plot(fit)

par(mfrow = c(1,1))

# Create values for intervals
newx <- data.frame(
  Index = seq(min(ozone$Index), max(ozone$Index), length = 100)
)

# Confidence and prediction intervals
conf_int <- predict(fit, newx, interval = "confidence")
pred_int <- predict(fit, newx, interval = "prediction")

# Plot intervals
plot(
  ozone$Index, ozone$Days,
  pch = 19,
  xlab = "Seasonal Meteorological Index",
  ylab = "Ozone Exceedance Days",
  main = "Regression with 95% Confidence and Prediction Intervals"
)

lines(newx$Index, conf_int[,1])
lines(newx$Index, conf_int[,2], lty = 2)
lines(newx$Index, conf_int[,3], lty = 2)
lines(newx$Index, pred_int[,2], lty = 3)
lines(newx$Index, pred_int[,3], lty = 3)

legend(
  "topleft",
  legend = c("Fitted Line", "95% Confidence Interval", "95% Prediction Interval"),
  lty = c(1,2,3),
  bty = "n"
)