library(tidyverse)
Air quality HW
Load the library
Load the dataset into your global environment
data("airquality")
Look at the structure of the data
View the data using the “head” function
head(airquality)
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
Calculate Summary Statistics
mean(airquality$Temp)
[1] 77.88235
mean(airquality[,4])
[1] 77.88235
Calculate Median, Standard Deviation, and Variance
median(airquality$Temp)
[1] 79
sd(airquality$Wind)
[1] 3.523001
var(airquality$Wind)
[1] 12.41154
Rename the Months forom numbers to names
$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September" airquality
Now look at the summary statistic sof the dataset
summary(airquality$Month)
Length Class Mode
153 character character
Month is a categorical variable with different levels, called factors
$Month<-factor(airquality$Month,
airqualitylevels=c("May", "June","July", "August",
"September"))
Plot 1: Create a histogram categorized by Month
Plot 1 Code
<- airquality |>
p1 ggplot(aes(x=Temp, fill=Month)) +
geom_histogram(position="identity")+
scale_fill_discrete(name = "Month",
labels = c("May", "June","July", "August", "September")) +
labs(x = "Monthly Temperatures from May - Sept",
y = "Frequency of Temps",
title = "Histogram of Monthly Temperatures from May - Sept, 1973",
caption = "New York State Department of Conservation and the National Weather Service") #provide the data source
Plot 1 Output
p1
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Plot 2: Improve the histogram of Average Temperature by Month
Plot 2 Code
<- airquality |>
p2 ggplot(aes(x=Temp, fill=Month)) +
geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white")+
scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) +
labs(x = "Monthly Temperatures from May - Sept",
y = "Frequency of Temps",
title = "Histogram of Monthly Temperatures from May - Sept, 1973",
caption = "New York State Department of Conservation and the National Weather Service")
Plot 2 Output
p2
Plot 3: Create side-by-side boxplots categorized by Month
<- airquality |>
p3 ggplot(aes(Month, Temp, fill = Month)) +
labs(x = "Months from May through September", y = "Temperatures",
title = "Side-by-Side Boxplot of Monthly Temperatures",
caption = "New York State Department of Conservation and the National Weather Service") +
geom_boxplot() +
scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))
Plot 3 Output
p3
Plot 4: Side by Side Boxplots in Gray Scale
PLot 4 Code
<- airquality |>
p4 ggplot(aes(Month, Temp, fill = Month)) +
labs(x = "Monthly Temperatures", y = "Temperatures",
title = "Side-by-Side Boxplot of Monthly Temperatures",
caption = "New York State Department of Conservation and the National Weather Service") +
geom_boxplot()+
scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))
Plot 4 Output
p4
##Plot 5:
<- airquality |>
df1 filter(Month == "May")
<- df1 |>
p5 ggplot(aes(x = Day, y = Wind)) +
geom_line() +
geom_point() +
labs(x = "Day", y = "Wind Speed",
title = "Wind Speed Everyday Throughout May",
caption = "New York State Department of Conservation and the National Weather Service") +
theme_minimal()
p5
Brief Essay
I decided to create a line graph showing wind speed over time. Originally I wanted separate lines showing each month, but I couldn’t figure out how to do that, so instead I chose to do only 1 month. I achieved this by creating a separate dataframe, filtering for only entries from May, then graphing to make it easier. In addition to a line representing the data, I added points so it was clearer what the values were. What this shows is that most days are on the lower side, then every few days there is an extremely windy day, and almost all windy days are followed immedietly by a normal day again.