Airquality

Author

Allan Maino

AirQuality HW-Allan Maino Vieytes

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(viridis)
Loading required package: viridisLite
data("airquality")
head(airquality)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6
airquality$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September"
airquality$Month<-factor(airquality$Month, levels=c("May", "June","July", "August", "September"))
p1 <- airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity")+
  scale_fill_discrete(name = "Month", 
                      labels = c("May", "June","July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept", 
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")  #provide the data source
p1
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

p2 <- airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white")+
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept", 
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")
p2

p3 <- airquality |>
  ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Months from May through September", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot() +
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))
p3 

p4 <- airquality |>
  ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Monthly Temperatures", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot()+
  scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))
p4

p5 <- airquality %>%
  group_by( Month ) %>% # Group data by month
  mutate( Month.avg.temp = mean( Temp )) %>% # Create monthly average variable
  ungroup() %>% # Ungroups data so there are no more group operations after this line of code
  ggplot( aes( x = Month, y = Wind, fill = Month.avg.temp)) # feeding data coming down the pipe into ggplot function. As well as assigning aes

p5 + 
  geom_boxplot() + # adding a boxplot layer
  scale_fill_viridis( "Monthly Avg. Temp" ) + # Color pallete for temp gradient
  theme_classic() + # gets rid of grid
  labs( caption = "Source: New York State Department of Conservation and the National Weather Service", 
        title = "Monthly Wind Measurements, with Monthly Avg. Temperature Layer" ) + # Adds caption and title
  ylab( "Wind (MPH)" ) # changes y axis label

Essay: The 5th plot shows 3 variables; wind speed, time in months and the avg. temperature during those months. To create the avg. temp I first grouped the data by month by using the groupby function. I then used the mutate function to create the month. avg. temp variable. I then had to end the grouping operation with the ungroup function. I then filled in all of my variables in the gg_plot function. I then used geom_boxplot to create my boxplot layer and used the viridis color pallete to dipict the temperature gradient. I then used the theme function to remove the grid to clean up my visualization. Finally I created the source caption and title and changed the y axis label to include MPH.