library(tidyverse)
Air Quality HW
Load the library
##Load the dataset into your global environment
{r} data(“airquality”)
##head(“airquality”)
mean(airquality$Temp)
[1] 77.88235
mean(airquality[,4])
[1] 77.88235
sd(airquality$Wind)
[1] 3.523001
var(airquality$Wind)
[1] 12.41154
$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September" airquality
summary(airquality$Month)
Length Class Mode
153 character character
$Month<-factor(airquality$Month,
airqualitylevels=c("May", "June","July", "August",
"September"))
<- airquality |>
p1 ggplot(aes(x=Temp, fill=Month)) +
geom_histogram(position="identity")+
scale_fill_discrete(name = "Month",
labels = c("May", "June","July", "August", "September")) +
labs(x = "Monthly Temperatures from May - Sept",
y = "Frequency of Temps",
title = "Histogram of Monthly Temperatures from May - Sept, 1973",
caption = "New York State Department of Conservation and the National Weather Service") #provide the data source
p1
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.
<- airquality |>
p2 ggplot(aes(x=Temp, fill=Month)) +
geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white")+
scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) +
labs(x = "Monthly Temperatures from May - Sept",
y = "Frequency of Temps",
title = "Histogram of Monthly Temperatures from May - Sept, 1973",
caption = "New York State Department of Conservation and the National Weather Service")
p2
<- airquality |>
p3 ggplot(aes(Month, Temp, fill = Month)) +
labs(x = "Months from May through September", y = "Temperatures",
title = "Side-by-Side Boxplot of Monthly Temperatures",
caption = "New York State Department of Conservation and the National Weather Service") +
geom_boxplot() +
scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))
p3
<- airquality |>
p4 ggplot(aes(Month, Temp, fill = Month)) +
labs(x = "Monthly Temperatures", y = "Temperatures",
title = "Side-by-Side Boxplot of Monthly Temperatures",
caption = "New York State Department of Conservation and the National Weather Service") +
geom_boxplot()+
scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))
p4
<- airquality |>
p5 ggplot(aes(Month, Ozone, fill = Month)) +
labs(x = "Month", y = "Ozone Level",
title = "Side-by-Side Boxplot of Monthly Ozone levels",
caption = "New York State Department of Conservation and the National Weather Service") +
geom_boxplot()+
scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))
p5
Warning: Removed 37 rows containing non-finite outside the scale range
(`stat_boxplot()`).
SUMMARY: So this chunk of code shows the ozone levels of New York, from May through September. I started by naming my chunk p5. I then inserted the dataset “airquality” which is pre-loaded into RStudio, this was the source of the Ozone data for the months I mentioned above. I mapped the Month to the X- axis, and the Ozone Levels to the Y- axis. This is easier to showcase because the ozone levels are numbers…and having them on the Axis that rises up (Y) makes it so much easier to show rises and drops with numbers. Since the months aren’t set to increase or reduce - as they’re not numerical, the X axis is a good place to have them. I used the “labs” function, which stands for labels, to label my axes. It also was used to enable the caption below my boxplot, as well as the title on top of the boxplot. “Scale_fill_grey” simply fills the specified parts of the plot with grey colours. That made it uninformed and easier to view, less colours give us less details to focus on.