Airquality Assignment

Author

Ameer Adegun

### Load tidyverse
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
### Load the airquality dataset
data("airquality")
# Preview data
head(airquality)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6
# Convert months to names
airquality$Month[airquality$Month == 5] <- "May"
airquality$Month[airquality$Month == 6] <- "June"
airquality$Month[airquality$Month == 7] <- "July"
airquality$Month[airquality$Month == 8] <- "August"
airquality$Month[airquality$Month == 9] <- "September"

airquality$Month <- factor(
  airquality$Month,
  levels = c("May", "June", "July", "August", "September")
)

Plot 1: Histogram using qplot

p1 <- qplot(
  data = airquality,
  Temp,
  fill = Month,
  geom = "histogram",
  bins = 20
)
Warning: `qplot()` was deprecated in ggplot2 3.4.0.
p1

Plot 2: Histogram using ggplot

p2 <- airquality %>%
  ggplot(aes(x = Temp, fill = Month)) +
  geom_histogram(
    position = "identity",
    alpha = 0.5,
    binwidth = 5,
    color = "white"
  ) +
  scale_fill_discrete(
    name = "Month",
    labels = c("May", "June", "July", "August", "September")
  )
p2

Plot 3: Side-by-Side Boxplots

p3 <- airquality %>%
  ggplot(aes(Month, Temp, fill = Month)) +
  ggtitle("Temperatures") +
  xlab("Monthly Temperatures") +
  ylab("Temperature (F)") +
  geom_boxplot() +
  scale_fill_discrete(
    name = "Month",
    labels = c("May", "June", "July", "August", "September")
  )
p3

Plot 4: Boxplots with Grey Scale

p4 <- airquality %>%
  ggplot(aes(Month, Temp, fill = Month)) +
  ggtitle("Monthly Temperature Variations") +
  xlab("Monthly Temperatures") +
  ylab("Temperature (F)") +
  geom_boxplot() +
  scale_fill_grey(
    name = "Month",
    labels = c("May", "June", "July", "August", "September")
  )
p4

Plot 5

p5 <- airquality %>%
  ggplot(aes(x = Ozone, y = Temp)) +
  geom_point(size = 2, alpha = 0.6) +
  geom_smooth(method = "loess", se = FALSE) +
  labs(
    title = "Temperature Changes Across Ozone Levels",
    x = "Ozone (ppb)",
    y = "Temperature (F)",
    caption = "Source: airquality dataset (R)"
  )
p5
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 37 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 37 rows containing missing values or values outside the scale range
(`geom_point()`).

The 5th graph is a histogram to show the distribution of daily air temperatures. Whereas all the prior graphs visualize relationships between the ozone levels and the other variable in the data, this graph shows frequency of values for temperatures only. The x axis represents the air temperatures in degrees Fahrenheit and the y axis represents the frequency of each range of temperatures.

This plot suggests that a good portion of the temperatures are in the mid 70’s to mid 80’s range, so this plot mostly represents warm summer days. There is not a large number of values in either extremes, meaning there is not as much variety.

For this plot, I employed the ggplot() function using geom_histogram(). The x aesthetic was defined as the temperature variable, and the number of bins were chosen to be 15 to produce a clearly visible distribution and avoid excessive smoothing. Labels and a caption were added using the labs function to provide context and source for the data. This plot helps to summaries the dataset, providing context for relating temperature with air quality.