Air Quality HW

Author

M Desir

Air Quality Assignment

Smokestack

Smokestack

load the libraries tidyverse and ggplot2

library(tidyverse)
library(ggplot2)

load the data in the global environment

data("airquality")

get snapshot of data

head(airquality)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6
airquality$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September"


airquality$Month<-factor(airquality$Month, 
                         levels=c("May", "June","July", "August",
                                  "September"))

load first plot

p1 <- airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity")+
  scale_fill_discrete(name = "Month", 
                      labels = c("May", "June","July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept", 
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")
p1
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

p2 <- airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white")+
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept", 
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")
p2

p3 <- airquality |>
  ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Months from May through September", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot() +
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))

p3

p4 <- airquality |>
  ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Monthly Temperatures", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot()+
  scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))

p4

p5 <- airquality |>
  ggplot(aes(x=Month, y=Ozone,fill = Month)) + 
  labs(x = "Months from May through September", y = "Ozone Levels", 
       title = "Side-by-Side Violin Plot of Monthly Ozone Levels",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_violin(color = "slategrey") +
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))



p5
Warning: Removed 37 rows containing non-finite outside the scale range
(`stat_ydensity()`).

First, as to the information I wanted to display, I selected the ozone levels for the months of May-September. For my custom plot, I researched various options and eventually settled on a violin plot. I like this plot because it does a very good job of showing density levels. While boxplots usually are used for mapping density, I noticed that certain months had so many outliers that it would reduce the efficiency of a boxplot. Also, I like violins. I found that I could change the color of each representation’s outline, so I picked a color I like (slate grey). This assignment helped me to gain a basic understanding of ggplot2’s versatility for plotting.