library(tidyverse)Airquality HW
Load the library
Load the dataset into your global environment
data("airquality")Temp Chunk
plotX <- airquality |>
ggplot(aes(x=Ozone , y= Temp))+
geom_point()
plotXWarning: Removed 37 rows containing missing values or values outside the scale range
(`geom_point()`).
View the data using the “head” function
head(airquality) Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
Calculate summary statistics
mean(airquality$Temp)[1] 77.88235
mean(airquality[,4])[1] 77.88235
Calculate median, standard deviation, and variance
median(airquality$Temp)[1] 79
sd(airquality$Wind)[1] 3.523001
var(airquality$Wind)[1] 12.41154
Rename the months from numbers to names
airquality$Month[airquality$Month == 5] <- "May"
airquality$Month[airquality$Month == 6] <- "June"
airquality$Month[airquality$Month == 7] <- "July"
airquality$Month[airquality$Month == 8] <- "August"
airquality$Month[airquality$Month == 9] <- "September"Now look at the summary statistics of the data set
summary(airquality$Month) Length Class Mode
153 character character
Month is categorical variable with different levels, called factors.
airquality$Month<-factor(airquality$Month, levels=c("May","June","July","August","September"))Plot 1: Create a histogram categorized by Month
P1 <- airquality |>
ggplot(aes(x=Temp, fill=Month)) +
geom_histogram(position= "identity")+
scale_fill_discrete(name= "Month", labels = c("May","June","July","August","September"))+
labs(x = "Monthly Temperatures from May - Sept", y= "Frequency of Temps", title = "Histogram of Monthly Teperatures from May - Sept, 1973", caption = "New York State Department of Conservation and the National Weather Service")Plot 1 output
P1`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Plot 2: Improve the histogram of Average Temperature by Month
p2 <-airquality |>
ggplot(aes(x= Temp, fill=Month))+
geom_histogram(position= "identity", alpha=0.5, binwidth=5, color= "white")+
scale_fill_discrete(name= "Month", labels = c("May","June","July","August","September"))+
labs(x = "Monthly Temperatures from May - Sept", y= "Frequency of Temps", title = "Histogram of Monthly Teperatures from May - Sept, 1973", caption = "New York State Department of Conservation and the National Weather Service")Plot2 output
p2Plot3: Create side-by-side boxplots categorized by Month
p3 <-airquality |>
ggplot(aes(Month,Temp, fill = Month))+
labs(x= "Months from May through September", y= "Temperatures", caption = "New York State Department of Conservation and the National Weather Service")+
geom_boxplot()+
scale_fill_discrete(name = "Month",labels = c("May","June","July","August","September"))Plot3 output
p3Plot4: Side by Side Boxplots in Gray Scale
p4 <- airquality |>
ggplot(aes(Month, Temp, fill = Month)) +
labs(x = "Monthly Temperatures", y = "Temperatures",
title = "Side-by-Side Boxplot of Monthly Temperatures",
caption = "New York State Department of Conservation and the National Weather Service") +
geom_boxplot()+
scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))Plot4 output
p4Plot5: Create a scatter plot categorized by Month
p5 <- airquality |>
ggplot(aes(x= Temp, y = Month))+
labs(x="Monthly Temperatures from May - Sept", y= "Months", title= "Scatter plot of Monthly Temperatures",caption= "New York State Department of Conservation and the National Weather Service")+
geom_point()+
scale_fill_discrete(name = "Month", labels = c("May","June","July","August","September"))Plot5 output
p5Essay
For my Plot 5, I created a scatter plot to show the temperatures recorded from May through September. I think this type of plot is a good choice because it clearly shows how temperatures vary across different months and makes it easy to compare them side by side.
From this plot, I noticed that temperatures started lower in May and gradually increased through June and July, with the highest temperatures appearing in July and August. In September, the temperatures began to decrease again as the season shifted toward fall. This matches the expected pattern of summer weather.
For the code, I used geom_point() to create the scatter plot. At first, I set it up with ggplot(aes(x= Month,Temp, fill= Month)). However, this caused all the data points for each month to stack vertically, which looked messy and made the plot harder to read. To fix this I switched the axes so that x=Temp and y=Month.This spread the temperature values across the x-axis and placed the months on the y-axis, creating a much cleaner and easier to read scatter plot.