Week 5 Discussion

I started by looking at the summary statistics for the data:

str(airquality)

## 'data.frame':    153 obs. of  6 variables:
##  $ Ozone  : int  41 36 12 18 NA 28 23 19 8 NA ...
##  $ Solar.R: int  190 118 149 313 NA NA 299 99 19 194 ...
##  $ Wind   : num  7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
##  $ Temp   : int  67 72 74 62 56 66 65 59 61 69 ...
##  $ Month  : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ Day    : int  1 2 3 4 5 6 7 8 9 10 ...

summary(airquality)

##      Ozone           Solar.R           Wind             Temp      
##  Min.   :  1.00   Min.   :  7.0   Min.   : 1.700   Min.   :56.00  
##  1st Qu.: 18.00   1st Qu.:115.8   1st Qu.: 7.400   1st Qu.:72.00  
##  Median : 31.50   Median :205.0   Median : 9.700   Median :79.00  
##  Mean   : 42.13   Mean   :185.9   Mean   : 9.958   Mean   :77.88  
##  3rd Qu.: 63.25   3rd Qu.:258.8   3rd Qu.:11.500   3rd Qu.:85.00  
##  Max.   :168.00   Max.   :334.0   Max.   :20.700   Max.   :97.00  
##  NA's   :37       NA's   :7                                       
##      Month            Day      
##  Min.   :5.000   Min.   : 1.0  
##  1st Qu.:6.000   1st Qu.: 8.0  
##  Median :7.000   Median :16.0  
##  Mean   :6.993   Mean   :15.8  
##  3rd Qu.:8.000   3rd Qu.:23.0  
##  Max.   :9.000   Max.   :31.0  
##

library(psych)
describe(airquality)

##         vars   n   mean    sd median trimmed   mad  min   max range  skew
## Ozone      1 116  42.13 32.99   31.5   37.80 25.95  1.0 168.0   167  1.21
## Solar.R    2 146 185.93 90.06  205.0  190.34 98.59  7.0 334.0   327 -0.42
## Wind       3 153   9.96  3.52    9.7    9.87  3.41  1.7  20.7    19  0.34
## Temp       4 153  77.88  9.47   79.0   78.28  8.90 56.0  97.0    41 -0.37
## Month      5 153   6.99  1.42    7.0    6.99  1.48  5.0   9.0     4  0.00
## Day        6 153  15.80  8.86   16.0   15.80 11.86  1.0  31.0    30  0.00
##         kurtosis   se
## Ozone       1.11 3.06
## Solar.R    -1.00 7.45
## Wind        0.03 0.28
## Temp       -0.46 0.77
## Month      -1.32 0.11
## Day        -1.22 0.72

Next I divided the data by month. From the summary, I know the months go from month 5 to month 9, which I am labeling as May through September:

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

May<-filter(airquality, airquality$Month == "5")
June<-filter(airquality, airquality$Month == "6")
July<-filter(airquality, airquality$Month == "7")
August<-filter(airquality, airquality$Month == "8")
September<-filter(airquality, airquality$Month == "9")

I was thinking this would be a good scenario to perform an ANOVA test since I would like to compare whether the mean ozone level is the same across months. In order to do this test, the variability across groups must be about equal, so I did a boxplot to assess this.

boxplot(airquality$Ozone~airquality$Month,main="Ozone Level by Month")

Given that the variability appears to differ a good amount by month, I decided to use a different approach. In order to test whether ozone levels are a function of month, I created a two-sample t-test, comparing the ozone levels in May to the ozone levels in each other month. My hypothesis for each test is the following:

H0: The average ozone level is the same between May and other month.

HA: The average ozone level varies between May and other month.

t.test(May$Ozone,June$Ozone)

## 
##  Welch Two Sample t-test
## 
## data:  May$Ozone and June$Ozone
## t = -0.7801, df = 16.938, p-value = 0.4461
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -21.598419   9.940299
## sample estimates:
## mean of x mean of y 
##  23.61538  29.44444

The p-value is greater than the significance level of 0.05 so I would fail to reject the null hypothesis and say that ozone level is not impacted by month when comparing May and June.

t.test(May$Ozone,July$Ozone)

## 
##  Welch Two Sample t-test
## 
## data:  May$Ozone and July$Ozone
## t = -4.682, df = 44.843, p-value = 2.647e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -50.77291 -20.22709
## sample estimates:
## mean of x mean of y 
##  23.61538  59.11538

The p-value is less than the significance level of 0.05 so I would reject the null hypothesis and say that ozone level is impacted by month when comparing May and July.

t.test(May$Ozone,August$Ozone)

## 
##  Welch Two Sample t-test
## 
## data:  May$Ozone and August$Ozone
## t = -4.0749, df = 39.279, p-value = 0.0002169
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -54.38358 -18.30873
## sample estimates:
## mean of x mean of y 
##  23.61538  59.96154

The p-value is less than the significance level of 0.05 so I would reject the null hypothesis and say that ozone level is impacted by month when comparing May and August.

t.test(May$Ozone,September$Ozone)

## 
##  Welch Two Sample t-test
## 
## data:  May$Ozone and September$Ozone
## t = -1.2527, df = 52.957, p-value = 0.2158
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -20.374201   4.708419
## sample estimates:
## mean of x mean of y 
##  23.61538  31.44828

The p-value is greater than the significance level of 0.05 so I would fail to reject the null hypothesis and say that ozone level is not impacted by month when comparing May and September.

Week 5 Discussion

Greg Adelsberger

2/15/2018

I started by looking at the summary statistics for the data:

Next I divided the data by month. From the summary, I know the months go from month 5 to month 9, which I am labeling as May through September:

I was thinking this would be a good scenario to perform an ANOVA test since I would like to compare whether the mean ozone level is the same across months. In order to do this test, the variability across groups must be about equal, so I did a boxplot to assess this.

H0: The average ozone level is the same between May and other month.

HA: The average ozone level varies between May and other month.

The p-value is greater than the significance level of 0.05 so I would fail to reject the null hypothesis and say that ozone level is not impacted by month when comparing May and June.

The p-value is less than the significance level of 0.05 so I would reject the null hypothesis and say that ozone level is impacted by month when comparing May and July.

The p-value is less than the significance level of 0.05 so I would reject the null hypothesis and say that ozone level is impacted by month when comparing May and August.

The p-value is greater than the significance level of 0.05 so I would fail to reject the null hypothesis and say that ozone level is not impacted by month when comparing May and September.

Overall I would say ozone levels are a function of month because I found that to be the case when comparing May to July and August.