library(tidyverse)
library(ggplot2)Air Quality HW
Air Quality Assignment
load the libraries tidyverse and ggplot2
load the data in the global environment
data("airquality")get snapshot of data
head(airquality) Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
airquality$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September"
airquality$Month<-factor(airquality$Month,
levels=c("May", "June","July", "August",
"September"))load first plot
p1 <- airquality |>
ggplot(aes(x=Temp, fill=Month)) +
geom_histogram(position="identity")+
scale_fill_discrete(name = "Month",
labels = c("May", "June","July", "August", "September")) +
labs(x = "Monthly Temperatures from May - Sept",
y = "Frequency of Temps",
title = "Histogram of Monthly Temperatures from May - Sept, 1973",
caption = "New York State Department of Conservation and the National Weather Service")
p1`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
p2 <- airquality |>
ggplot(aes(x=Temp, fill=Month)) +
geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white")+
scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) +
labs(x = "Monthly Temperatures from May - Sept",
y = "Frequency of Temps",
title = "Histogram of Monthly Temperatures from May - Sept, 1973",
caption = "New York State Department of Conservation and the National Weather Service")
p2p3 <- airquality |>
ggplot(aes(Month, Temp, fill = Month)) +
labs(x = "Months from May through September", y = "Temperatures",
title = "Side-by-Side Boxplot of Monthly Temperatures",
caption = "New York State Department of Conservation and the National Weather Service") +
geom_boxplot() +
scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))
p3p4 <- airquality |>
ggplot(aes(Month, Temp, fill = Month)) +
labs(x = "Monthly Temperatures", y = "Temperatures",
title = "Side-by-Side Boxplot of Monthly Temperatures",
caption = "New York State Department of Conservation and the National Weather Service") +
geom_boxplot()+
scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))
p4p5 <- airquality |>
ggplot(aes(x=Month, y=Ozone,fill = Month)) +
labs(x = "Months from May through September", y = "Ozone Levels",
title = "Side-by-Side Violin Plot of Monthly Ozone Levels",
caption = "New York State Department of Conservation and the National Weather Service") +
geom_violin(color = "slategrey") +
scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))
p5Warning: Removed 37 rows containing non-finite outside the scale range
(`stat_ydensity()`).
First, as to the information I wanted to display, I selected the ozone levels for the months of May-September. For my custom plot, I researched various options and eventually settled on a violin plot. I like this plot because it does a very good job of showing density levels. While boxplots usually are used for mapping density, I noticed that certain months had so many outliers that it would reduce the efficiency of a boxplot. Also, I like violins. I found that I could change the color of each representation’s outline, so I picked a color I like (slate grey). This assignment helped me to gain a basic understanding of ggplot2’s versatility for plotting.