library(tidyverse)
Airquality HW
Load the library
Load the dataset into your global environment
data("airquality")
Look at the structure of the data
head(airquality)
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
Rename the Months from number to names
$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September" airquality
Now look at the summary statistics of the dataset
summary(airquality$Month)
Length Class Mode
153 character character
$Month<-factor(airquality$Month,
airqualitylevels=c("May", "June","July", "August",
"September"))
Plot 1
<- airquality |>
p1 ggplot(aes(x=Temp, fill=Month)) +
geom_histogram(position="identity")+
scale_fill_discrete(name = "Month",
labels = c("May", "June","July", "August", "September")) +
labs(x = "Monthly Temperatures from May - Sept",
y = "Frequency of Temps",
title = "Histogram of Monthly Temperatures from May - Sept, 1973",
caption = "New York State Department of Conservation and the National Weather Service") #provide the data source
Plot 1 Output
p1
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.
Plot 2 Code
<- airquality |>
p2 ggplot(aes(x=Temp, fill=Month)) +
geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white")+
scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) +
labs(x = "Monthly Temperatures from May - Sept",
y = "Frequency of Temps",
title = "Histogram of Monthly Temperatures from May - Sept, 1973",
caption = "New York State Department of Conservation and the National Weather Service")
p2
Plot 3 Code
<- airquality |>
p3 ggplot(aes(Month, Temp, fill = Month)) +
labs(x = "Months from May through September", y = "Temperatures",
title = "Side-by-Side Boxplot of Monthly Temperatures",
caption = "New York State Department of Conservation and the National Weather Service") +
geom_boxplot() +
scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))
p3
Plot 4 Code
<- airquality |>
p4 ggplot(aes(Month, Temp, fill = Month)) +
labs(x = "Monthly Temperatures", y = "Temperatures",
title = "Side-by-Side Boxplot of Monthly Temperatures",
caption = "New York State Department of Conservation and the National Weather Service") +
geom_boxplot()+
scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))
p4
Plot 5 Code
<- airquality |>
p5 ggplot(aes(Month, Ozone, fill = Month)) +
labs(x = "May - September", y = "Ozone levels",
title = "Boxplot of Monthly Ozone Levels",) +
geom_boxplot() +
scale_fill_discrete(name = "Month", labels = c("May", "June", "July", "August", "September")) +
coord_flip()
p5
Warning: Removed 37 rows containing non-finite outside the scale range
(`stat_boxplot()`).
The plot I have created is a boxplot with the ozone levels from the months May through September. The plot shows the how the median, upper quartile, and lower quartile ozone levels from each month differ from month to month, while also giving data on the higher outliers for each month. On average, July has the highest median ozone levels compared to any other month in the graph, but August is close with a higher upper quartile of ozone levels and the highest maximum value. I also did the coord_flip() function becaue it helps the viewer have a neasier time processing the boxplots and I personally feel that it is easier to read. Since the key on the right is vertical, it makes sense that the boxplots should also be lined up vertically for easier readability. You can clearly see that August has the largest outlier compared to all the other boxplots, while September, May, and June are more compact and have a lower range compared to August and July.