library(tidyverse)
Airquality HW
Load the library
Load the dataset into your global environment
data("airquality")
Look at the structure of the data
View the data using the “head” function
head(airquality)
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
Calculate Summary Statistics
mean(airquality$Temp)
[1] 77.88235
mean(airquality[,4])
[1] 77.88235
Calculate Median, Standard Deviation, and Variance
median(airquality$Temp)
[1] 79
sd(airquality$Wind)
[1] 3.523001
var(airquality$Wind)
[1] 12.41154
Rename the Months from number to names
$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September" airquality
Now look at the summary statistics of the dataset
summary(airquality$Month)
Length Class Mode
153 character character
Month is a categorical variable with different levels, called factors.
$Month<-factor(airquality$Month,
airqualitylevels=c("May", "June","July", "August",
"September"))
Plot 1: Create a histogram categorized by Month
Plot 1 Code
<- airquality |>
p1 ggplot(aes(x = Temp, fill = factor(Month))) +
geom_histogram(position = "identity", binwidth = 1.44, alpha = 1.0) +
scale_fill_discrete(name = "Month",
labels = c("May", "June", "July", "August", "September")) +
labs(x = "Monthly Temperatures from May - Sept",
y = "Frequency of Temps",
title = "Histogram of Monthly Temperatures from May - Sept, 1973",
caption = "New York State Department of Conservation and the National Weather Service")
Plot 1 Output
p1
Plot 2: Improve the histogram of Average Temperature by Month
Plot 2 Code
<- airquality |>
p2 ggplot(aes(x=Temp, fill= factor(Month))) +
geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white")+
scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) +
labs(x = "Monthly Temperatures from May - Sept",
y = "Frequency of Temps",
title = "Histogram of Monthly Temperatures from May - Sept, 1973",
caption = "New York State Department of Conservation and the National Weather Service")
Plot 2 Output
p2
Plot 3: Create side-by-side boxplots categorized by Month
Plot 3 Code
<- airquality |>
p3 ggplot(aes(Month, Temp, fill = factor(Month))) +
labs(x = "Months from May through September", y = "Temperatures",
title = "Side-by-Side Boxplot of Monthly Temperatures",
caption = "New York State Department of Conservation and the National Weather Service") +
geom_boxplot() +
scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))
Plot 3 Output
p3
Plot 4: Side by Side Boxplots in Gray Scale
Plot 4 Code
<- airquality |>
p4 ggplot(aes(Month, Temp, fill = factor(Month))) +
labs(x = "Monthly Temperatures", y = "Temperatures",
title = "Side-by-Side Boxplot of Monthly Temperatures",
caption = "New York State Department of Conservation and the National Weather Service") +
geom_boxplot()+
scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))
Plot 4 Output
p4
Plot 5: Create a Barplot categorized by Month
Plot 5 Code
<- airquality |>
p5 ggplot(aes(x = factor(Month), fill = factor(Temp))) +
geom_bar(position = "stack", color = "white") +
scale_fill_discrete(name = "Temperature Ranges") +
labs(x = "Months (May - Sept)", y = "Frequency of Temps",
title = "Stacked Bar Graph of Monthly Temperatures",
caption = "New York State Department of Conservation and the National Weather Service")
Plot 5 Output
p5
Brief Essay
For Plot 5, the type of plot I chose to create is the Stacked Bar plot which can be shown above as the output. The Stacked Bar plot shows where we can see the relationship between the Frequency of Temps and Months. Each data point is differentiated and categorized by the colors of the months that is matched to the right of the Stacked Bar plot. In addition, a clear axis labels such as Months and Frequency of Temps, a title “Stacked Bar Graph of Monthly Temperatures”, and a caption that is connected to the data source called, “New York state Department of Conservation and the National Weather Service”. As for the coding I chose to use, is the same as the others with similar data such as the ggplot except I only made a few changes to it since it is Stacked Bar plot. Based on the notes from this assignment and the example bar_chart_with_diamond assignment, I added a geom_point with different size and a geom_bar, a position to have it stacked with the color white and other colors same as the plots before it. The colors represents the temperature ranges that belongs with each month from May to September and based on the height of the stacked bar graph it also represents the frequency of that specific temperature range.