1 Introduction

Ozone pollution is an important air-quality problem in California’s South Coast Air Basin. This report looks at whether the number of days with high ozone levels is related to seasonal weather conditions measured by a meteorological index. A simple linear regression model is used to study this relationship, check whether it is statistically significant, and see how well it can explain and predict ozone exceedance days.

2 Data Description

The dataset consists of annual observations from 1976 to 1991 for California’s South Coast Air Basin. The response variable is the number of days per year in which ozone concentrations exceeded 0.20 ppm. The explanatory variable is a seasonal meteorological index based on average 850-millibar temperature. As follows:

-Days (y): Number of days per year with ozone levels exceeding 0.20 ppm. -Index (x): Seasonal meteorological index based on average temperature.

year  <- 1976:1991
y <- c(91,105,106,108,88,91,58,82,81,65,61,48,61,43,33,36)
x <- c(16.7,17.1,18.2,18.1,17.2,18.2,16.0,17.2,18.0,17.2,16.9,17.1,18.2,17.3,17.5,16.6)
dat<- data.frame(x,y)
dat
##       x   y
## 1  16.7  91
## 2  17.1 105
## 3  18.2 106
## 4  18.1 108
## 5  17.2  88
## 6  18.2  91
## 7  16.0  58
## 8  17.2  82
## 9  18.0  81
## 10 17.2  65
## 11 16.9  61
## 12 17.1  48
## 13 18.2  61
## 14 17.3  43
## 15 17.5  33
## 16 16.6  36

3 Simple Linear Regression Analysis

The Regression model assumes that the expected value of the response changes linearly with the predictor and is written as

\[ \hat{Y} = \hat{\beta}_0 + \hat{\beta}_1 X \]

3.1 Fitted Regression Model and Coefficients

plot(x,y,
     col="deeppink3",
     xlab ="Seasonal Temperature Index",
     ylab= "Number of Ozone Exceednace Days",
     main="Seasonal Temperature vs Ozone Exceedance Days",
     col.main="green",
     )
model1 <- lm(y~x)
abline(model1,
     col="firebrick4",lwd=2
)

  • Figure 1 shows a scatterplot of the number of ozone exceedance days versus the seasonal temperature index. As the temperature index increases, the number of ozone exceedance days generally increases as well. Although there is variability in the data, the overall pattern suggests an approximately linear, positive relationship, indicating that higher seasonal temperatures are associated with more frequent ozone exceedances. This visual evidence supports the use of a simple linear regression model to quantify the relationship.
coefficients(model1)
## (Intercept)           x 
##  -192.98383    15.29637

3.2 Model Fit and Statistical Inference

summary(model1)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -41.70 -21.54   2.12  18.56  36.42 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -192.984    163.503  -1.180    0.258
## x             15.296      9.421   1.624    0.127
## 
## Residual standard error: 23.79 on 14 degrees of freedom
## Multiple R-squared:  0.1585, Adjusted R-squared:  0.09835 
## F-statistic: 2.636 on 1 and 14 DF,  p-value: 0.1267
r2<- summary(model1)$r.square
pval <- summary(model1)$coefficients["x","Pr(>|t|)"]
r2
## [1] 0.1584636
pval
## [1] 0.1267446
  • The coefficient of determination is \(R^2 = 0.158\), indicating that the meteorological index explains a relatively small portion of the variability in ozone exceedance days.

  • The p-value for the slope is 0.127, which is greater than 0.05, suggesting that the relationship is not statistically significant at the 5% level.

3.3 Model Adequacy and Diagnostic Plots

3.3.1 Normality of Residuals

res <- resid(model1)
n <- length(res)
p <- (1:n)/n
z <- qnorm(p)
res_sorted <- sort(res)
plot(z,res_sorted,
     xlab="Normal Probability(%)",
     ylab="Ordered Residuals",
     main="Normal Probability Plot of Residuals",
       col="orange",lwd=2)

  • The normal probability plot shows that the residuals lie approximately along a straight line. This suggests that the normality assumption for the regression errors is reasonable.

    Constant Variance (Homoscedasticity)

    y_hat <- fitted(model1)
    plot(
      y_hat, res, 
      xlab="Predicted Values",
      ylab="Residuals",
      main="Residuals vs Predicted Values",
      col="red")
      abline(h=0, col="black", lwd=2)

The residuals are randomly scattered around zero with no clear pattern. This indicates that the assumption of constant variance is reasonable.

3.3.2 Independence of Residuals

res <- resid(model1)
i <- 1:length(res)
plot(i, res, 
     xlab="Observation Order (Time)",
     ylab="Residuals",
     main="Residuals vs Order",
     col="red")
abline(h=0, col="black",lwd=2)

  • The residuals show a noticeable pattern over the observation order. This suggests that the independence assumption may be questionable.

    Confidence and Prediction Intervals

    min(x)
    ## [1] 16
    max(x)
    ## [1] 18.2
    newx <- seq(min(x),max(x),length.out=100)
    conf <- predict(model1,newdata=data.frame(x=newx),interval="confidence",level=0.95)
    pred<- predict(model1,newdata=data.frame(x=newx),interval="prediction",level=0.95)
    conf
    ##          fit      lwr       upr
    ## 1   51.75801 21.75791  81.75811
    ## 2   52.09793 22.50360  81.69225
    ## 3   52.43785 23.24803  81.62766
    ## 4   52.77777 23.99114  81.56440
    ## 5   53.11769 24.73287  81.50250
    ## 6   53.45761 25.47316  81.44205
    ## 7   53.79752 26.21196  81.38309
    ## 8   54.13744 26.94919  81.32569
    ## 9   54.47736 27.68479  81.26993
    ## 10  54.81728 28.41869  81.21588
    ## 11  55.15720 29.15080  81.16361
    ## 12  55.49712 29.88104  81.11320
    ## 13  55.83704 30.60933  81.06475
    ## 14  56.17696 31.33557  81.01834
    ## 15  56.51688 32.05968  80.97408
    ## 16  56.85680 32.78154  80.93206
    ## 17  57.19672 33.50104  80.89239
    ## 18  57.53664 34.21808  80.85519
    ## 19  57.87656 34.93253  80.82058
    ## 20  58.21647 35.64426  80.78868
    ## 21  58.55639 36.35314  80.75965
    ## 22  58.89631 37.05902  80.73361
    ## 23  59.23623 37.76175  80.71072
    ## 24  59.57615 38.46116  80.69115
    ## 25  59.91607 39.15708  80.67506
    ## 26  60.25599 39.84934  80.66264
    ## 27  60.59591 40.53773  80.65409
    ## 28  60.93583 41.22205  80.64961
    ## 29  61.27575 41.90209  80.64941
    ## 30  61.61567 42.57761  80.65372
    ## 31  61.95559 43.24837  80.66280
    ## 32  62.29551 43.91412  80.67689
    ## 33  62.63542 44.57459  80.69626
    ## 34  62.97534 45.22948  80.72121
    ## 35  63.31526 45.87850  80.75203
    ## 36  63.65518 46.52133  80.78904
    ## 37  63.99510 47.15763  80.83258
    ## 38  64.33502 47.78705  80.88299
    ## 39  64.67494 48.40923  80.94065
    ## 40  65.01486 49.02378  81.00594
    ## 41  65.35478 49.63030  81.07926
    ## 42  65.69470 50.22838  81.16102
    ## 43  66.03462 50.81758  81.25165
    ## 44  66.37454 51.39748  81.35160
    ## 45  66.71446 51.96760  81.46131
    ## 46  67.05437 52.52748  81.58127
    ## 47  67.39429 53.07666  81.71193
    ## 48  67.73421 53.61466  81.85377
    ## 49  68.07413 54.14100  82.00727
    ## 50  68.41405 54.65520  82.17290
    ## 51  68.75397 55.15680  82.35114
    ## 52  69.09389 55.64535  82.54243
    ## 53  69.43381 56.12041  82.74721
    ## 54  69.77373 56.58156  82.96589
    ## 55  70.11365 57.02842  83.19887
    ## 56  70.45357 57.46063  83.44650
    ## 57  70.79349 57.87789  83.70908
    ## 58  71.13341 58.27991  83.98690
    ## 59  71.47332 58.66648  84.28017
    ## 60  71.81324 59.03743  84.58905
    ## 61  72.15316 59.39265  84.91368
    ## 62  72.49308 59.73208  85.25409
    ## 63  72.83300 60.05571  85.61029
    ## 64  73.17292 60.36362  85.98222
    ## 65  73.51284 60.65591  86.36976
    ## 66  73.85276 60.93277  86.77275
    ## 67  74.19268 61.19441  87.19094
    ## 68  74.53260 61.44111  87.62408
    ## 69  74.87252 61.67319  88.07184
    ## 70  75.21244 61.89100  88.53388
    ## 71  75.55236 62.09492  89.00979
    ## 72  75.89227 62.28538  89.49917
    ## 73  76.23219 62.46281  90.00157
    ## 74  76.57211 62.62768  90.51655
    ## 75  76.91203 62.78043  91.04363
    ## 76  77.25195 62.92156  91.58234
    ## 77  77.59187 63.05154  92.13220
    ## 78  77.93179 63.17084  92.69274
    ## 79  78.27171 63.27993  93.26349
    ## 80  78.61163 63.37928  93.84397
    ## 81  78.95155 63.46935  94.43375
    ## 82  79.29147 63.55057  95.03237
    ## 83  79.63139 63.62337  95.63940
    ## 84  79.97131 63.68817  96.25444
    ## 85  80.31122 63.74537  96.87708
    ## 86  80.65114 63.79534  97.50695
    ## 87  80.99106 63.83846  98.14366
    ## 88  81.33098 63.87508  98.78689
    ## 89  81.67090 63.90552  99.43628
    ## 90  82.01082 63.93011 100.09153
    ## 91  82.35074 63.94915 100.75233
    ## 92  82.69066 63.96291 101.41840
    ## 93  83.03058 63.97168 102.08947
    ## 94  83.37050 63.97571 102.76529
    ## 95  83.71042 63.97523 103.44560
    ## 96  84.05034 63.97049 104.13018
    ## 97  84.39026 63.96169 104.81882
    ## 98  84.73017 63.94904 105.51131
    ## 99  85.07009 63.93273 106.20746
    ## 100 85.41001 63.91295 106.90708
    pred
    ##          fit         lwr      upr
    ## 1   51.75801 -7.44155045 110.9576
    ## 2   52.09793 -6.89703741 111.0929
    ## 3   52.43785 -6.35524179 111.2309
    ## 4   52.77777 -5.81619168 111.3717
    ## 5   53.11769 -5.27991517 111.5153
    ## 6   53.45761 -4.74644032 111.6617
    ## 7   53.79752 -4.21579519 111.8108
    ## 8   54.13744 -3.68800776 111.9629
    ## 9   54.47736 -3.16310598 112.1178
    ## 10  54.81728 -2.64111772 112.2757
    ## 11  55.15720 -2.12207077 112.4365
    ## 12  55.49712 -1.60599280 112.6002
    ## 13  55.83704 -1.09291138 112.7670
    ## 14  56.17696 -0.58285393 112.9368
    ## 15  56.51688 -0.07584772 113.1096
    ## 16  56.85680  0.42808014 113.2855
    ## 17  57.19672  0.92890273 113.4645
    ## 18  57.53664  1.42659334 113.6467
    ## 19  57.87656  1.92112548 113.8320
    ## 20  58.21647  2.41247289 114.0205
    ## 21  58.55639  2.90060958 114.2122
    ## 22  58.89631  3.38550982 114.4071
    ## 23  59.23623  3.86714821 114.6053
    ## 24  59.57615  4.34549962 114.8068
    ## 25  59.91607  4.82053928 115.0116
    ## 26  60.25599  5.29224277 115.2197
    ## 27  60.59591  5.76058603 115.4312
    ## 28  60.93583  6.22554540 115.6461
    ## 29  61.27575  6.68709763 115.8644
    ## 30  61.61567  7.14521989 116.0861
    ## 31  61.95559  7.59988981 116.3113
    ## 32  62.29551  8.05108547 116.5399
    ## 33  62.63542  8.49878545 116.7721
    ## 34  62.97534  8.94296884 117.0077
    ## 35  63.31526  9.38361523 117.2469
    ## 36  63.65518  9.82070478 117.4897
    ## 37  63.99510 10.25421819 117.7360
    ## 38  64.33502 10.68413675 117.9859
    ## 39  64.67494 11.11044232 118.2394
    ## 40  65.01486 11.53311742 118.4966
    ## 41  65.35478 11.95214515 118.7574
    ## 42  65.69470 12.36750929 119.0219
    ## 43  66.03462 12.77919427 119.2900
    ## 44  66.37454 13.18718519 119.5619
    ## 45  66.71446 13.59146785 119.8374
    ## 46  67.05437 13.99202876 120.1167
    ## 47  67.39429 14.38885515 120.3997
    ## 48  67.73421 14.78193497 120.6865
    ## 49  68.07413 15.17125693 120.9770
    ## 50  68.41405 15.55681049 121.2713
    ## 51  68.75397 15.93858589 121.5694
    ## 52  69.09389 16.31657413 121.8712
    ## 53  69.43381 16.69076702 122.1769
    ## 54  69.77373 17.06115716 122.4863
    ## 55  70.11365 17.42773794 122.7996
    ## 56  70.45357 17.79050357 123.1166
    ## 57  70.79349 18.14944910 123.4375
    ## 58  71.13341 18.50457038 123.7622
    ## 59  71.47332 18.85586409 124.0908
    ## 60  71.81324 19.20332775 124.4232
    ## 61  72.15316 19.54695971 124.7594
    ## 62  72.49308 19.88675917 125.0994
    ## 63  72.83300 20.22272615 125.4433
    ## 64  73.17292 20.55486151 125.7910
    ## 65  73.51284 20.88316695 126.1425
    ## 66  73.85276 21.20764500 126.4979
    ## 67  74.19268 21.52829904 126.8571
    ## 68  74.53260 21.84513326 127.2201
    ## 69  74.87252 22.15815268 127.5869
    ## 70  75.21244 22.46736313 127.9575
    ## 71  75.55236 22.77277125 128.3319
    ## 72  75.89227 23.07438452 128.7102
    ## 73  76.23219 23.37221117 129.0922
    ## 74  76.57211 23.66626024 129.4780
    ## 75  76.91203 23.95654155 129.8675
    ## 76  77.25195 24.24306569 130.2608
    ## 77  77.59187 24.52584399 130.6579
    ## 78  77.93179 24.80488854 131.0587
    ## 79  78.27171 25.08021217 131.4632
    ## 80  78.61163 25.35182840 131.8714
    ## 81  78.95155 25.61975149 132.2833
    ## 82  79.29147 25.88399637 132.6989
    ## 83  79.63139 26.14457865 133.1182
    ## 84  79.97131 26.40151460 133.5411
    ## 85  80.31122 26.65482115 133.9676
    ## 86  80.65114 26.90451583 134.3978
    ## 87  80.99106 27.15061682 134.8315
    ## 88  81.33098 27.39314285 135.2688
    ## 89  81.67090 27.63211326 135.7097
    ## 90  82.01082 27.86754794 136.1541
    ## 91  82.35074 28.09946731 136.6020
    ## 92  82.69066 28.32789233 137.0534
    ## 93  83.03058 28.55284445 137.5083
    ## 94  83.37050 28.77434561 137.9666
    ## 95  83.71042 28.99241822 138.4284
    ## 96  84.05034 29.20708512 138.8936
    ## 97  84.39026 29.41836961 139.3621
    ## 98  84.73017 29.62629535 139.8341
    ## 99  85.07009 29.83088644 140.3093
    ## 100 85.41001 30.03216732 140.7879
    ylim_all <- range(c(y,conf[,2:3],pred[,2:3])) #I have added this because the lines for prediction was not shown in the plot when i click knit 
    plot(x,y,ylim=ylim_all)
    abline(model1)
    lines(newx,conf[,2], lwd=2,col="green")
    lines(newx,conf[,3], lwd=2,col="green")
    lines(newx,pred[,2],lwd=2,col="blue")
    lines(newx,pred[,3],lwd=2,col="blue")

4 Conclusion

This analysis examined the relationship between the number of days ozone levels exceeded the regulatory threshold and the seasonal temperature index in California’s South Coast Air Basin. The scatterplot and fitted regression line suggest a positive association, indicating that higher seasonal temperatures tend to be associated with an increased number of ozone exceedance days.

The estimated regression model explains a modest portion of the variability in ozone exceedance days, as reflected by the value of \(R^2\). The hypothesis test on the slope indicates limited statistical evidence of a linear relationship at the 5% significance level. Diagnostic plots show that the assumptions of normality and constant variance are reasonably satisfied, while some caution may be warranted regarding the independence of residuals over time.

Overall, the linear regression model provides a useful first approximation of the relationship between temperature and ozone exceedance days, but additional variables or more advanced modeling approaches may be required to better capture the underlying dynamics.