Airquality HW

Author

A Lopez

Load the library

library(tidyverse)

Load the dataset into your global environment

data("airquality")

Set Up Data

airquality$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September"
airquality$Month<-factor(airquality$Month, 
                         levels=c("May", "June","July", "August", "September"))

Plot 1

p1 <- airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity")+
  scale_fill_discrete(name = "Month", 
                      labels = c("May", "June","July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept", 
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")  #provide the data source

p1
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Plot 2

p2 <- airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white")+
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept", 
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")
p2

Plot 3

p3 <- airquality |>
  ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Months from May through September", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot() +
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))
p3

Plot 4

p4 <- airquality |>
ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Monthly Temperatures", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot()+
  scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))
p4

Plot 5

p5 <- airquality |>
  ggplot(aes(x=Temp, y=Solar.R, color=Month)) + 
  labs(x = "Temperature", y = "Solar Radiation", 
       title = "Temperature in Comparison to Solar Radiation in May - September, 1973",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_path() +
  geom_point(size = 1) +
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) 
p5
Warning: Removed 7 rows containing missing values or values outside the scale range
(`geom_point()`).

My plot is of the correlation of temperature and solar radiation levels throughout the months of May to September in 1973. The x-axis is the temperature in Fahrenheit, the y-axis are the solar radiation levels and I used five different colors to illustrate the different months in the data set. I used the geom_path() function to purposefully create a line graph unique to every month to display a month’s unique comparison of temperatures in comparison to solar radiation levels. With this illustration, one can view a month’s “cluster” and where they sit around temperature-wise while highlighting the varying solar radiation levels within said month day after day. I also used geom_point() to add a point to every object/day to better display the rate of change within days in a month.