library(tidyverse)Airquality Assignment
Airquality Assignment
Load the library
Load the dataset into your global environment
data("airquality")Look at the structure of the data
View the data using the “head” function
head(airquality) Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
Calculate Summary Statistics
mean(airquality$Temp)[1] 77.88235
Calculate Median, Standard Deviation, and Variance
median(airquality$Temp)[1] 79
sd(airquality$Wind)[1] 3.523001
var(airquality$Wind)[1] 12.41154
Rename the Months from number to names
#airquality$Month
airquality$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September"Now look at the summary statistics of the dataset
summary(airquality$Month) Length Class Mode
153 character character
Month is a categorical variable with different levels, called factors.
airquality$Month<-factor(airquality$Month,
labels = c("May", "June","July", "August",
"September"))Plot 1: Create a histogram categorized by Month
p1 <- airquality |>
ggplot(aes(x=Temp, fill=Month)) +
geom_histogram(position="identity")+
scale_fill_discrete(name = "Month",
labels = c("May", "June","July", "August", "September")) +
labs(x = "Monthly Temperatures from May - Sept",
y = "Frequency of Temps",
title = "Histogram of Monthly Temperatures from May - Sept, 1973",
caption = "New York State Department of Conservation and the National Weather Service")
p1`stat_bin()` using `bins = 30`. Pick better value `binwidth`.
Plot 2: Improve the histogram of Average Temperature by Month
p2 <- airquality |>
ggplot(aes(x=Temp, fill=Month)) +
geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white")+
scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) +
labs(x = "Monthly Temperatures from May - Sept",
y = "Frequency of Temps",
title = "Histogram of Monthly Temperatures from May - Sept, 1973",
caption = "New York State Department of Conservation and the National Weather Service")
p2Create side-by-side boxplots categorized by Month
p3 <- airquality |>
ggplot(aes(Month, Temp, fill = Month)) +
labs(x = "Months from May through September", y = "Temperatures",
title = "Side-by-Side Boxplot of Monthly Temperatures",
caption = "New York State Department of Conservation and the National Weather Service") +
geom_boxplot() +
scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))
p3Side by Side Boxplots in Gray Scale
p4 <- airquality |>
ggplot(aes(Month, Temp, fill = Month)) +
labs(x = "Monthly Temperatures", y = "Temperatures",
title = "Side-by-Side Boxplot of Monthly Temperatures",
caption = "New York State Department of Conservation and the National Weather Service") +
geom_boxplot()+
scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))
p4scatterplot of Solar Radiation and Ozone levels by Month
p5 <- airquality |>
ggplot(aes(x = Solar.R, y = Ozone, color = Month)) +
geom_point(size = 3) +
scale_color_brewer(palette = "Dark2", name = "Month", labels = c("May", "June","July", "August", "September")) +
labs(x = "Solar Radiation", y = "Ozone",
title = "Relationship between Solar Radiation and ozone levels",
caption = "New York State Department of Conservation and the National Weather Service")
p5Warning: Removed 42 rows containing missing values or values outside the scale range
(`geom_point()`).
Essay
I created a scatterplot to show the relationship between solar radiation and ozone levels. A scatterplot is useful for showing the relationship between two quantitative variables. In this graph, solar radiation is on the x-axis and ozone is on the y-axis, and the different colors represent each month from May to September. The plot shows that when solar radiation increases, ozone levels also tend to increase, which suggests a positive relationship between the two variables.The variation in colors also shows that some months, especially July and August, tend to have higher ozone values. I used geom_point() to create the scatterplot and scale_color_brewer() to add different colors for each month.