library(tidyverse)Airquality final
Air quality Assignment
Load the library
Load the dataset into your global environment =
data("airquality")View the data using the “head” function
head(airquality) Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
Calculate Summary
mean(airquality$Temp)[1] 77.88235
mean(airquality[,4])[1] 77.88235
Calculate Median, Standard Deviation, and Variance
median(airquality$Temp)[1] 79
sd(airquality$Wind)[1] 3.523001
var(airquality$Wind)[1] 12.41154
Rename the Months from number to names
airquality$Month[airquality$Month ==5]<- "May"
airquality$Month[airquality$Month ==6]<- "June"
airquality$Month[airquality$Month ==7]<- "July"
airquality$Month[airquality$Month ==8]<- "August"
airquality$Month[airquality$Month ==9]<- "September"Now look at the summary statistics of the dataset
summary(airquality$Month) Length Class Mode
153 character character
Month is a categorical variable with different levels, called factors.
airquality$Month<-factor(airquality$Month,
levels=c("May","June","July","August","September")) Plot 1: Create a histogram categorized by Month
p1 <- airquality |>
ggplot(aes(x=Temp, fill=Month)) +
geom_histogram(position="identity")+
scale_fill_discrete(name = "Month",
labels = c("May", "June","July", "August", "September")) +
labs(x = "Monthly Temperatures from May - Sept",
y = "Frequency of Temps",
title = "Histogram of Monthly Temperatures from May - Sept, 1973",
caption = "New York State Department of Conservation and the National Weather Service") #provide the data source
p1`stat_bin()` using `bins = 30`. Pick better value `binwidth`.
Plot 2: Improve the histogram of Average Temperature by Month
p2 <- airquality |>
ggplot(aes(x=Temp, fill=Month)) +
geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white")+
scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) +
labs(x = "Monthly Temperatures from May - Sept",
y = "Frequency of Temps",
title = "Histogram of Monthly Temperatures from May - Sept, 1973",
caption = "New York State Department of Conservation and the National Weather Service")
p2Plot 3: Create side-by-side boxplots categorized by Month
p3 <- airquality |>
ggplot(aes(Month, Temp, fill = Month)) +
labs(x = "Months from May through September", y = "Temperatures",
title = "Side-by-Side Boxplot of Monthly Temperatures",
caption = "New York State Department of Conservation and the National Weather Service") +
geom_boxplot() +
scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))
p3Plot 4: Side by Side Boxplots in Gray Scale
p4 <- airquality |>
ggplot(aes(Month, Temp, fill = Month)) +
labs(x = "Monthly Temperatures", y = "Temperatures",
title = "Side-by-Side Boxplot of Monthly Temperatures",
caption = "New York State Department of Conservation and the National Weather Service") +
geom_boxplot()+
scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))
p4Plot 5: Scatterplot of Solar.R and Temperature
p5 <- airquality |>
ggplot(aes(x=Solar.R, y=Temp)) +
geom_point(aes(color=factor(Month)),
alpha=0.6,
na.rm=TRUE)+
geom_smooth(method = "lm")+
labs(x = "Solar.R",
y = "Temp",
title = "Scatterplot of Solar.R and Temp",
caption = "New York State Department of Conservation and the National Weather Service")
p5`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 7 rows containing non-finite outside the scale range
(`stat_smooth()`).
Essay
In Plot 5, I created a scatter plot to see how Solar Radiation (Solar.R) correlates with Temperature (Temp) in the airquality data set. I accomplished this by utilizing the ggplot() function to map Solar.R on the x-axis and Temp on the y-axis inside the aes() function. I then plotted this scatter plot to see how Temp changes with the increase of Solar.R. To view the values on the graph, I added geom_point() so that each observation would be displayed in the scatter plot. I also set the transparency to alpha = 0.6, so that I could see where multiple points landed on the graph.
Then, to differentiate how this scatterplot looks at different points in time, I colored the points by month by adding color = factor(Month) inside geom_point(). This turned Month into a factor to give each point a different color by month so that I could see if there were any seasonal trends reflected in the data. I also added the geom_smooth(method = “lm”) for linear regression, which plots a smoothing line so that viewers of this graph can see the positive correlation between Solar.R and Temp. The upward slope that is added from this function confirms that as solar radiation increases, so too do temperature readings.
Finally, I adjusted the plot with the labs() function to add axis labels, as well as a title that explains the plot content, and a caption that acknowledges the data source. As you can see, this plot successfully demonstrates the positive correlation between solar radiation and temperature, while still maintaining the integrity of the variations that can be seen by month.
’