The R datasets library contains data on air quality in New York (airquality). Conduct a hypothesis test to evaluate if ozone levels are a function of month. NOTE: dichotomize month. If that test were significant, what else would be required? Post your hypothesis test and R code here.

library(datasets)
datasets::airquality
##     Ozone Solar.R Wind Temp Month Day
## 1      41     190  7.4   67     5   1
## 2      36     118  8.0   72     5   2
## 3      12     149 12.6   74     5   3
## 4      18     313 11.5   62     5   4
## 5      NA      NA 14.3   56     5   5
## 6      28      NA 14.9   66     5   6
## 7      23     299  8.6   65     5   7
## 8      19      99 13.8   59     5   8
## 9       8      19 20.1   61     5   9
## 10     NA     194  8.6   69     5  10
## 11      7      NA  6.9   74     5  11
## 12     16     256  9.7   69     5  12
## 13     11     290  9.2   66     5  13
## 14     14     274 10.9   68     5  14
## 15     18      65 13.2   58     5  15
## 16     14     334 11.5   64     5  16
## 17     34     307 12.0   66     5  17
## 18      6      78 18.4   57     5  18
## 19     30     322 11.5   68     5  19
## 20     11      44  9.7   62     5  20
## 21      1       8  9.7   59     5  21
## 22     11     320 16.6   73     5  22
## 23      4      25  9.7   61     5  23
## 24     32      92 12.0   61     5  24
## 25     NA      66 16.6   57     5  25
## 26     NA     266 14.9   58     5  26
## 27     NA      NA  8.0   57     5  27
## 28     23      13 12.0   67     5  28
## 29     45     252 14.9   81     5  29
## 30    115     223  5.7   79     5  30
## 31     37     279  7.4   76     5  31
## 32     NA     286  8.6   78     6   1
## 33     NA     287  9.7   74     6   2
## 34     NA     242 16.1   67     6   3
## 35     NA     186  9.2   84     6   4
## 36     NA     220  8.6   85     6   5
## 37     NA     264 14.3   79     6   6
## 38     29     127  9.7   82     6   7
## 39     NA     273  6.9   87     6   8
## 40     71     291 13.8   90     6   9
## 41     39     323 11.5   87     6  10
## 42     NA     259 10.9   93     6  11
## 43     NA     250  9.2   92     6  12
## 44     23     148  8.0   82     6  13
## 45     NA     332 13.8   80     6  14
## 46     NA     322 11.5   79     6  15
## 47     21     191 14.9   77     6  16
## 48     37     284 20.7   72     6  17
## 49     20      37  9.2   65     6  18
## 50     12     120 11.5   73     6  19
## 51     13     137 10.3   76     6  20
## 52     NA     150  6.3   77     6  21
## 53     NA      59  1.7   76     6  22
## 54     NA      91  4.6   76     6  23
## 55     NA     250  6.3   76     6  24
## 56     NA     135  8.0   75     6  25
## 57     NA     127  8.0   78     6  26
## 58     NA      47 10.3   73     6  27
## 59     NA      98 11.5   80     6  28
## 60     NA      31 14.9   77     6  29
## 61     NA     138  8.0   83     6  30
## 62    135     269  4.1   84     7   1
## 63     49     248  9.2   85     7   2
## 64     32     236  9.2   81     7   3
## 65     NA     101 10.9   84     7   4
## 66     64     175  4.6   83     7   5
## 67     40     314 10.9   83     7   6
## 68     77     276  5.1   88     7   7
## 69     97     267  6.3   92     7   8
## 70     97     272  5.7   92     7   9
## 71     85     175  7.4   89     7  10
## 72     NA     139  8.6   82     7  11
## 73     10     264 14.3   73     7  12
## 74     27     175 14.9   81     7  13
## 75     NA     291 14.9   91     7  14
## 76      7      48 14.3   80     7  15
## 77     48     260  6.9   81     7  16
## 78     35     274 10.3   82     7  17
## 79     61     285  6.3   84     7  18
## 80     79     187  5.1   87     7  19
## 81     63     220 11.5   85     7  20
## 82     16       7  6.9   74     7  21
## 83     NA     258  9.7   81     7  22
## 84     NA     295 11.5   82     7  23
## 85     80     294  8.6   86     7  24
## 86    108     223  8.0   85     7  25
## 87     20      81  8.6   82     7  26
## 88     52      82 12.0   86     7  27
## 89     82     213  7.4   88     7  28
## 90     50     275  7.4   86     7  29
## 91     64     253  7.4   83     7  30
## 92     59     254  9.2   81     7  31
## 93     39      83  6.9   81     8   1
## 94      9      24 13.8   81     8   2
## 95     16      77  7.4   82     8   3
## 96     78      NA  6.9   86     8   4
## 97     35      NA  7.4   85     8   5
## 98     66      NA  4.6   87     8   6
## 99    122     255  4.0   89     8   7
## 100    89     229 10.3   90     8   8
## 101   110     207  8.0   90     8   9
## 102    NA     222  8.6   92     8  10
## 103    NA     137 11.5   86     8  11
## 104    44     192 11.5   86     8  12
## 105    28     273 11.5   82     8  13
## 106    65     157  9.7   80     8  14
## 107    NA      64 11.5   79     8  15
## 108    22      71 10.3   77     8  16
## 109    59      51  6.3   79     8  17
## 110    23     115  7.4   76     8  18
## 111    31     244 10.9   78     8  19
## 112    44     190 10.3   78     8  20
## 113    21     259 15.5   77     8  21
## 114     9      36 14.3   72     8  22
## 115    NA     255 12.6   75     8  23
## 116    45     212  9.7   79     8  24
## 117   168     238  3.4   81     8  25
## 118    73     215  8.0   86     8  26
## 119    NA     153  5.7   88     8  27
## 120    76     203  9.7   97     8  28
## 121   118     225  2.3   94     8  29
## 122    84     237  6.3   96     8  30
## 123    85     188  6.3   94     8  31
## 124    96     167  6.9   91     9   1
## 125    78     197  5.1   92     9   2
## 126    73     183  2.8   93     9   3
## 127    91     189  4.6   93     9   4
## 128    47      95  7.4   87     9   5
## 129    32      92 15.5   84     9   6
## 130    20     252 10.9   80     9   7
## 131    23     220 10.3   78     9   8
## 132    21     230 10.9   75     9   9
## 133    24     259  9.7   73     9  10
## 134    44     236 14.9   81     9  11
## 135    21     259 15.5   76     9  12
## 136    28     238  6.3   77     9  13
## 137     9      24 10.9   71     9  14
## 138    13     112 11.5   71     9  15
## 139    46     237  6.9   78     9  16
## 140    18     224 13.8   67     9  17
## 141    13      27 10.3   76     9  18
## 142    24     238 10.3   68     9  19
## 143    16     201  8.0   82     9  20
## 144    13     238 12.6   64     9  21
## 145    23      14  9.2   71     9  22
## 146    36     139 10.3   81     9  23
## 147     7      49 10.3   69     9  24
## 148    14      20 16.6   63     9  25
## 149    30     193  6.9   70     9  26
## 150    NA     145 13.2   77     9  27
## 151    14     191 14.3   75     9  28
## 152    18     131  8.0   76     9  29
## 153    20     223 11.5   68     9  30
summary(airquality)
##      Ozone           Solar.R           Wind             Temp      
##  Min.   :  1.00   Min.   :  7.0   Min.   : 1.700   Min.   :56.00  
##  1st Qu.: 18.00   1st Qu.:115.8   1st Qu.: 7.400   1st Qu.:72.00  
##  Median : 31.50   Median :205.0   Median : 9.700   Median :79.00  
##  Mean   : 42.13   Mean   :185.9   Mean   : 9.958   Mean   :77.88  
##  3rd Qu.: 63.25   3rd Qu.:258.8   3rd Qu.:11.500   3rd Qu.:85.00  
##  Max.   :168.00   Max.   :334.0   Max.   :20.700   Max.   :97.00  
##  NA's   :37       NA's   :7                                       
##      Month            Day      
##  Min.   :5.000   Min.   : 1.0  
##  1st Qu.:6.000   1st Qu.: 8.0  
##  Median :7.000   Median :16.0  
##  Mean   :6.993   Mean   :15.8  
##  3rd Qu.:8.000   3rd Qu.:23.0  
##  Max.   :9.000   Max.   :31.0  
## 

In order to find a possible correlation between ozone levels and month one must develop a hypothesis test with a p-value of 0.05, anything smaller than that will mean that you will be able to reject the null hypothesis.

We will first define each month in order to obtain precise findings.

month5<-subset(airquality,airquality$Month=="5")
month6<-subset(airquality,airquality$Month=="6")
month7<-subset(airquality,airquality$Month=="7")
month8<-subset(airquality,airquality$Month=="8")
month9<-subset(airquality,airquality$Month=="9")
boxplot(airquality$Ozone~airquality$Month)

We are visualizing the levels of ozone per month, I would also want to see if the level of ozone is dependent on temperature which I am assuming it is. The reason why I am looking at temperature is because I am inferring it is the variable that would have the most direct impact on ozone levels per month.

plot(airquality$Ozone~airquality$Temp)

It appears that the higher the temperature is the higher the levels of Ozone. Thus, they appear to have a positive correlation.

We can now see the specific relationship for each month

lmm5<- lm(month5$Ozone~month5$Temp)
summary(lmm5)
## 
## Call:
## lm(formula = month5$Ozone ~ month5$Temp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -30.316  -8.622  -2.411   5.320  68.259 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -102.159     38.750  -2.636  0.01446 * 
## month5$Temp    1.885      0.578   3.261  0.00331 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.88 on 24 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:  0.307,  Adjusted R-squared:  0.2781 
## F-statistic: 10.63 on 1 and 24 DF,  p-value: 0.003315
lmm6<- lm(month6$Ozone~month6$Temp)
summary(lmm6)
## 
## Call:
## lm(formula = month6$Ozone ~ month6$Temp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -12.995  -9.337  -6.309  11.082  23.271 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -91.9910    51.3120  -1.793   0.1161  
## month6$Temp   1.5524     0.6531   2.377   0.0491 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.48 on 7 degrees of freedom
##   (21 observations deleted due to missingness)
## Multiple R-squared:  0.4467, Adjusted R-squared:  0.3676 
## F-statistic: 5.651 on 1 and 7 DF,  p-value: 0.04909
lmm7<- lm(month7$Ozone~month7$Temp)
summary(lmm7)
## 
## Call:
## lm(formula = month7$Ozone ~ month7$Temp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -32.108 -14.522  -1.161   7.582  75.290 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -372.921     84.453  -4.416 0.000184 ***
## month7$Temp    5.150      1.005   5.123 3.05e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.32 on 24 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:  0.5223, Adjusted R-squared:  0.5024 
## F-statistic: 26.24 on 1 and 24 DF,  p-value: 3.048e-05
lmm8<- lm(month8$Ozone~month8$Temp)
summary(lmm8)
## 
## Call:
## lm(formula = month8$Ozone ~ month8$Temp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -40.421 -17.651  -8.067   9.974 118.579 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -238.861     82.023  -2.912  0.00764 **
## month8$Temp    3.559      0.974   3.654  0.00126 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 32.46 on 24 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:  0.3575, Adjusted R-squared:  0.3307 
## F-statistic: 13.35 on 1 and 24 DF,  p-value: 0.001256
lmm9<- lm(month9$Ozone~month9$Temp)
summary(lmm9)
## 
## Call:
## lm(formula = month9$Ozone ~ month9$Temp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -27.447  -8.585  -3.692  11.041  31.392 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -149.3469    23.6876  -6.305 9.51e-07 ***
## month9$Temp    2.3511     0.3062   7.677 2.95e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.78 on 27 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.6858, Adjusted R-squared:  0.6742 
## F-statistic: 58.94 on 1 and 27 DF,  p-value: 2.945e-08

While looking at all of this data one can note that all the p-values are significantly smaller than the previously established significance level of 0.05. Which means that with a 95% confidence level one can reject the null hypothesis and conclude that ozone is indeed a function of month and temperature. It is important to note that with the exception of month 6, if we were to establish a larger confidence level, such as 99% and 98%, we would still arrive to the same conclusion.