library(tidyverse)
Airquality HW
Load the library
Load the dataset into your global environment
data("airquality")
Temp Chunk
<- airquality |>
plotX ggplot(aes(x=Ozone , y= Temp))+
geom_point()
plotX
Warning: Removed 37 rows containing missing values or values outside the scale range
(`geom_point()`).
View the data using the “head” function
head(airquality)
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
Calculate summary statistics
mean(airquality$Temp)
[1] 77.88235
mean(airquality[,4])
[1] 77.88235
Calculate median, standard deviation, and variance
median(airquality$Temp)
[1] 79
sd(airquality$Wind)
[1] 3.523001
var(airquality$Wind)
[1] 12.41154
Rename the months from numbers to names
$Month[airquality$Month == 5] <- "May"
airquality$Month[airquality$Month == 6] <- "June"
airquality$Month[airquality$Month == 7] <- "July"
airquality$Month[airquality$Month == 8] <- "August"
airquality$Month[airquality$Month == 9] <- "September" airquality
Now look at the summary statistics of the data set
summary(airquality$Month)
Length Class Mode
153 character character
Month is categorical variable with different levels, called factors.
$Month<-factor(airquality$Month, levels=c("May","June","July","August","September")) airquality
Plot 1: Create a histogram categorized by Month
<- airquality |>
P1
ggplot(aes(x=Temp, fill=Month)) +
geom_histogram(position= "identity")+
scale_fill_discrete(name= "Month", labels = c("May","June","July","August","September"))+
labs(x = "Monthly Temperatures from May - Sept", y= "Frequency of Temps", title = "Histogram of Monthly Teperatures from May - Sept, 1973", caption = "New York State Department of Conservation and the National Weather Service")
Plot 1 output
P1
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Plot 2: Improve the histogram of Average Temperature by Month
<-airquality |>
p2 ggplot(aes(x= Temp, fill=Month))+
geom_histogram(position= "identity", alpha=0.5, binwidth=5, color= "white")+
scale_fill_discrete(name= "Month", labels = c("May","June","July","August","September"))+
labs(x = "Monthly Temperatures from May - Sept", y= "Frequency of Temps", title = "Histogram of Monthly Teperatures from May - Sept, 1973", caption = "New York State Department of Conservation and the National Weather Service")
Plot2 output
p2
Plot3: Create side-by-side boxplots categorized by Month
<-airquality |>
p3 ggplot(aes(Month,Temp, fill = Month))+
labs(x= "Months from May through September", y= "Temperatures", caption = "New York State Department of Conservation and the National Weather Service")+
geom_boxplot()+
scale_fill_discrete(name = "Month",labels = c("May","June","July","August","September"))
Plot3 output
p3
Plot4: Side by Side Boxplots in Gray Scale
<- airquality |>
p4 ggplot(aes(Month, Temp, fill = Month)) +
labs(x = "Monthly Temperatures", y = "Temperatures",
title = "Side-by-Side Boxplot of Monthly Temperatures",
caption = "New York State Department of Conservation and the National Weather Service") +
geom_boxplot()+
scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))
Plot4 output
p4
Plot5: Create a scatter plot categorized by Month
<- airquality |>
p5 ggplot(aes(x= Temp, y = Month))+
labs(x="Monthly Temperatures from May - Sept", y= "Months", title= "Scatter plot of Monthly Temperatures",caption= "New York State Department of Conservation and the National Weather Service")+
geom_point()+
scale_fill_discrete(name = "Month", labels = c("May","June","July","August","September"))
Plot5 output
p5
Essay
For my Plot 5, I created a scatter plot to show the temperatures recorded from May through September. I think this type of plot is a good choice because it clearly shows how temperatures vary across different months and makes it easy to compare them side by side.
From this plot, I noticed that temperatures started lower in May and gradually increased through June and July, with the highest temperatures appearing in July and August. In September, the temperatures began to decrease again as the season shifted toward fall. This matches the expected pattern of summer weather.
For the code, I used geom_point() to create the scatter plot. At first, I set it up with ggplot(aes(x= Month,Temp, fill= Month)). However, this caused all the data points for each month to stack vertically, which looked messy and made the plot harder to read. To fix this I switched the axes so that x=Temp and y=Month.This spread the temperature values across the x-axis and placed the months on the y-axis, creating a much cleaner and easier to read scatter plot.