# to access tools(ggplot2 etc)
library(tidyverse)Airqality Assignment
Airquality Assignment
Load in the library
Load the dataset into your global environment
#to store data
data("airquality")Look at the structure of the data
View the data using the “head” function
#to view data
head(airquality) Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
Calculate Summary Statistics
#calculate mean
#1
mean(airquality$Temp)[1] 77.88235
#2
mean(airquality[,4])[1] 77.88235
Calculate Median, Standard Deviation, and Variance
#calculate median
median(airquality$Temp)[1] 79
#calculate standard deviation
sd(airquality$Wind)[1] 3.523001
#calculate variance
var(airquality$Wind)[1] 12.41154
Rename the Months from number to names
#airquality$Month
airquality$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September"Now look at the summary statistics of the dataset
#months change from numbers to characters
summary(airquality$Month) Length Class Mode
153 character character
Month is a categorical variable with different levels, called factors.
#reorder months
airquality$Month<-factor(airquality$Month,
levels=c("May","June","July","August",
"September"))Plot 1: Create a histogram categorized by Month
Plot 1 Code
p1 <- airquality |>
ggplot(aes(x=Temp, fill=Month)) +
geom_histogram(position="identity") +
scale_fill_discrete(name = "Month",
labels = c("May", "June","July", "August", "September")) +
labs(x = "Monthly Temperatures from May - Sept",
y = "Frequency of Temps",
title = "Histogram of Monthly Temperatures from May - Sept, 1973",
caption = "New York State Department of Conservation and the National Weather Service") #provide the data sourcePlot 1 Output
p1`stat_bin()` using `bins = 30`. Pick better value `binwidth`.
Plot 2: Improve the histogram of Average Temperature by Month
Plot 2 Code
p2 <- airquality |>
ggplot(aes(x=Temp, fill=Month)) +
geom_histogram(position="identity", alpha=0.6, binwidth = 5, color = "orange")+
#binwidth only for histogram
scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) +
labs(x = "Monthly Temperatures from May - Sept",
y = "Frequency of Temps",
title = "Histogram of Monthly Temperatures from May - Sept, 1973",
caption = "New York State Department of Conservation and the National Weather Service")Plot 2 Output
p2Plot 3: Create side-by-side boxplots categorized by Month
p3 <- airquality |>
ggplot(aes(Month, Temp, fill = Month)) +
labs(x = "Months from May through September", y = "Temperatures",
title = "Side-by-Side Boxplot of Monthly Temperatures",
caption = "New York State Department of Conservation and the National Weather Service") +
geom_boxplot() +
scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))Plot 3 Output
p3Plot 4: Side by Side Boxplots in Gray Scale
Plot 4 Code
p4 <- airquality |>
ggplot(aes(Month, Temp, fill = Month)) +
labs(x = "Monthly Temperatures", y = "Temperatures",
title = "Side-by-Side Boxplot of Monthly Temperatures",
caption = "New York State Department of Conservation and the National Weather Service") +
geom_boxplot()+
scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))Plot 4 Output
p4Plot 5:Scatterplot of Wind Speed and Ozone Levels
Plot 5 Code
p5 <- airquality |>
ggplot(aes(x=Wind, y=Ozone)) +
geom_point(aes(color=factor(Month)),
alpha=0.6,
na.rm=TRUE)+
geom_smooth(method = "lm")+
labs(x = "Wind Speed",
y = "Ozone Level",
title = "Scatterplot of Wind Speed and Ozone Levels",
caption = "New York State Department of Conservation and the National Weather Service")Plot 5 Output
p5`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 37 rows containing non-finite outside the scale range
(`stat_smooth()`).
Essay
In Plot 5, I created a scatterplot to show the relationship between wind speed and ozone levels in New York from May to September 1973. This graph shows the relationship between wind speed and ozone levels. In the graph, the x-axis is wind speed, and the y-axis is ozone level.
From the plot, we can see a general negative relationship between wind speed and ozone levels from blue line. When wind speed is low, ozone levels tend to be higher. As wind speed increases, ozone levels decrease. It shows that stronger winds may help clear air pollutants and lower ozone concentrations. The points are quite spread out, but we can still see a downward trend.
I used geom_point() to display the data points and included na.rm = TRUE to remove missing ozone values. Additionally, I added color=factor(Month) to add more aesthetic visual stimulation for fun. I also adjusted the color transparency using alpha to make overlapping points more visible and aesthetically pleasing. Last but not the least, I used geom_smooth(method = “lm”) to show the blue line to make the downward trend more visible.