Air Quality

# Import the data
data("airquality")
# Convert numeric months to month names
airquality$Month <- factor(airquality$Month, labels = month.name[5:9])

Wind Speed

# Average wind speed by month
avg_wind_month <- airquality %>%
  group_by(Month) %>%
  summarise(avg_wind = mean(Wind, na.rm = TRUE))

# Print the result
avg_wind_month
## # A tibble: 5 × 2
##   Month     avg_wind
##   <fct>        <dbl>
## 1 May          11.6 
## 2 June         10.3 
## 3 July          8.94
## 4 August        8.79
## 5 September    10.2
# Bar plot for average wind speed by month
ggplot(avg_wind_month, aes(x = Month, y = avg_wind, fill = Month)) +
  geom_bar(stat = "identity") +
  theme_minimal() +
  labs(title = "Average Wind Speed by Month", 
       x = "Month", 
       y = "Average Wind Speed (mph)") +
  scale_fill_brewer(palette = "Set3")

# Linear regression of wind and month
reg_wind_month <- lm(Wind ~ Month, data = airquality)
summary(reg_wind_month)
## 
## Call:
## lm(formula = Wind ~ Month, data = airquality)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.5667 -2.2667 -0.1226  2.1774 10.4333 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     11.6226     0.6127  18.970  < 2e-16 ***
## MonthJune       -1.3559     0.8737  -1.552  0.12280    
## MonthJuly       -2.6806     0.8665  -3.094  0.00236 ** 
## MonthAugust     -2.8290     0.8665  -3.265  0.00136 ** 
## MonthSeptember  -1.4426     0.8737  -1.651  0.10082    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.411 on 148 degrees of freedom
## Multiple R-squared:  0.08707,    Adjusted R-squared:  0.0624 
## F-statistic: 3.529 on 4 and 148 DF,  p-value: 0.00879

Ozone Level

# Average ozone level by month
avg_ozone_month <- airquality %>%
  group_by(Month) %>%
  summarise(avg_ozone = mean(Ozone, na.rm = TRUE))

# Print the result
avg_ozone_month
## # A tibble: 5 × 2
##   Month     avg_ozone
##   <fct>         <dbl>
## 1 May            23.6
## 2 June           29.4
## 3 July           59.1
## 4 August         60.0
## 5 September      31.4
# Bar plot for average ozone level by month
ggplot(avg_ozone_month, aes(x = Month, y = avg_ozone, fill = Month)) +
  geom_bar(stat = "identity") +
  theme_minimal() +
  labs(title = "Average Ozone Level by Month", 
       x = "Month", 
       y = "Average Ozone Level (ppb)") +
  scale_fill_brewer(palette = "Set3")

# Linear regression of ozone and month
reg_ozone_month <- lm(Ozone ~ Month, data = airquality)
summary(reg_ozone_month)
## 
## Call:
## lm(formula = Ozone ~ Month, data = airquality)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -52.115 -16.823  -7.282  13.125 108.038 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      23.615      5.759   4.101 7.87e-05 ***
## MonthJune         5.829     11.356   0.513    0.609    
## MonthJuly        35.500      8.144   4.359 2.93e-05 ***
## MonthAugust      36.346      8.144   4.463 1.95e-05 ***
## MonthSeptember    7.833      7.931   0.988    0.325    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 29.36 on 111 degrees of freedom
##   (37 observations deleted due to missingness)
## Multiple R-squared:  0.2352, Adjusted R-squared:  0.2077 
## F-statistic: 8.536 on 4 and 111 DF,  p-value: 4.827e-06

Findings

The data reveals some variation in wind speeds across different months. May recorded the highest average wind speed at 11.62 mph, while July and August had the lowest, at 8.9 mph and 8.8 mph respectively. The linear regression model shows that there are statistically significant differences in wind speeds for July and August. However, the multiple r-squared value of 0.087 suggests that the month only explains 8.7% of the variation in wind speed.

There is also variation in ozone levels across different months. On average, July and August had the highest ozone levels at 59.1 ppb and 60 ppb, respectively. May had the lowest average ozone level at 23.6 ppb. The linear regression model shows that the differences in ozone levels in July and August are statistically significant. The multiple r-squared value of 0.2352 indicates that the month explains 23.5% of the variation in ozone levels.

These findings indicate that, on average, May experiences the highest wind speeds and the lowest ozone levels between the months of May and September. In contrast, July and August have the lowest wind speeds and the highest ozone levels. Statistically significant differences in both wind speed and ozone levels were observed for July and August compared to May. These difference could be attributed to the role of wind in dispersing ozone concentrations.