library(tidyverse)
Airquality Assignment
Load the library
Load the dateset into your global environment
data("airquality")
View the data using “head” function
head(airquality)
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
Calculate Summary Statistics
mean(airquality$Temp)
[1] 77.88235
mean(airquality[,4])
[1] 77.88235
Calculate Median, Standard Deviation, and Variance
median(airquality$Temp)
[1] 79
sd(airquality$Wind)
[1] 3.523001
var(airquality$Wind)
[1] 12.41154
Rename the Months from number to names
$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September" airquality
Now look at the summary statistics of the dataset
summary(airquality$Month)
Length Class Mode
153 character character
Month is a categorical variable with different levels, called factors.
$Month<-factor(airquality$Month,
airqualitylevels=c("May", "June","July", "August",
"September"))
Plot 1: Create a histogram categorized by Month
<- airquality |>
p1 ggplot(aes(x=Temp, fill=Month)) +
geom_histogram(position="identity")+
scale_fill_discrete(name = "Month",
labels = c("May", "June","July", "August", "September")) +
labs(x = "Monthly Temperatures from May - Sept",
y = "Frequency of Temps",
title = "Histogram of Monthly Temperatures from May - Sept, 1973",
caption = "New York State Department of Conservation and the National Weather Service") #provide the data sourcep1
p1
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Plot 2: Improve the histogram of Average Temperature by Month
<- airquality |>
p2 ggplot(aes(x=Temp, fill=Month)) +
geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white")+
scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) +
labs(x = "Monthly Temperatures from May - Sept",
y = "Frequency of Temps",
title = "Histogram of Monthly Temperatures from May - Sept, 1973",
caption = "New York State Department of Conservation and the National Weather Service")
p2
Plot 3: Create side-by-side boxplots categorized by Month
<- airquality |>
p3 ggplot(aes(Month, Temp, fill = Month)) +
labs(x = "Months from May through September", y = "Temperatures",
title = "Side-by-Side Boxplot of Monthly Temperatures",
caption = "New York State Department of Conservation and the National Weather Service") +
geom_boxplot() +
scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))
p3
Plot 4: Side by Side Boxplots in Gray Scale
<- airquality |>
p4 ggplot(aes(Month, Temp, fill = Month)) +
labs(x = "Monthly Temperatures", y = "Temperatures",
title = "Side-by-Side Boxplot of Monthly Temperatures",
caption = "New York State Department of Conservation and the National Weather Service") +
geom_boxplot()+
scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))
p4
Plot 5: Scatterplot of Ozone Levels vs. Monthly Temperatures from May to September, 1973
<- airquality |>
p5_scatterplot ggplot(aes(x = Temp, y = Ozone, color = factor(Month))) +
geom_point(na.rm = TRUE)+
scale_color_discrete(name = "Month",labels = c("May", "June", "July", "August", "September")) +
labs(x = "Monthly Temperatures from May - Sept",
y = "Ozone Levels",
title = "Scatterplot of Ozone Levels vs. Monthly Temperatures from May - Sept, 1973",
caption = "New York State Department of Conservation and the National Weather Service")
p5_scatterplot
Brief Essay
For this assignment, I created a scatterplot to show the relationship between temperature and ozone levels. The plot is called “Scatterplot of Ozone Levels vs. Monthly Temperatures from May to September, 1973.” Each point on the plot shows how much ozone was recorded at a certain temperature. I also added colors to the points to represent the months, so it is easier to see how the data changes over time.
From the plot, we can see that ozone levels are not the same across months. Some months have higher ozone levels, while others are lower. The scatterplot makes it clear that temperature and ozone are connected, but the relationship looks different depending on the month. This helps us clearly understand how air quality will change during warmer or cooler parts of the year.
To make this plot, I used special parts of the code. First, I used color = factor(Month) so each month has its own color. This makes it easier to tell them apart. Next, I used geom_point(na.rm = TRUE) to make the scatterplot. At first, I only used geom_point(), but R gave me a warning because some data was missing. By adding na.rm = TRUE, I told R to ignore the missing values and just show the points with real data. This fixed the warning and kept the plot clean.