library(tidyverse)“— title:”Air Quality Assignment” author: “Z Griffin” format: html editor: visual —
Air Quality Assignment
Load the library
Load the data set into rstudio
data("airquality")Look at the Structure of the Data
View the data using the “head” function
Display only the first six rows of data
head(airquality) Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
Calculate Summary Statistics
Two different ways to calc mean for the variable temperature
mean(airquality$Temp)[1] 77.88235
mean(airquality[,4])[1] 77.88235
The second way is looking for the matrix [row, column], only giving it column 4 (temp), and using all rows
Calculate Median, Standard Deviantion, and Variance
median(airquality$Temp)[1] 79
sd(airquality$Wind)[1] 3.523001
var(airquality$Wind)[1] 12.41154
Rename Months from Numbers to Names
airquality$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September"Now look at the summary statistics of the data set
see how Month has changed from
summary(airquality$Month) Length Class Mode
153 character character
Moth is a categorical variable with different levels, called factors.
This is one way to reorder the Months so they don’t default to alphabetical.
airquality$Month<- factor(airquality$Month, levels=c("May", "June", "July", "August", "September"))Plot 1: Create a historgram categorized by month
Histogram of temperature by month.
fill month colors the histogram by month
scale_fill_discrete(name = “Month”…) puts the month names on the right as a legend in chronological order. Different way to order it than done above.
labs labels! like title, axes, and a caption for the data source
Plot 1 Code
p1 <- airquality |>
ggplot(aes(x=Temp, fill=Month)) +
geom_histogram(position="identity")+
scale_fill_discrete(name = "Month", labels =c("May", "June", "July", "August", "September")) +
labs(x = "Monthly Temperatures from May - Sept",
y = "Frequency of Temps",
title = "Histogram of Monthly Temperatures from May - Sept, 1973",
caption = "New York State Department of Conservation and the National Weather Service") #data source!Plot 1 Output
p1Plot 2: Improve the histogram of Temp by Month
Outline the bars using color = “white”
add transparency using alpha
change binwidth
p2 <- airquality |>
ggplot(aes(x=Temp, fill=Month)) +
geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white")+
scale_fill_discrete(name = "Month", labels = c("May", "June", "July", "August", "September"))+
labs(x = "Monthly Temperatures from May - Sept",
y = "Frequency of Temps",
title = "Histogram of Monthly Temperatures from May - Sept, 1973",
caption = "New York State Department of Conservation and the National Weather Service")Plot 2 Output
p2Plot 3: Create side-by-side boxplots categorized by Month
p3 <- airquality |>
ggplot(aes(Month, Temp, fill = Month)) +
labs(x = "Months from May through September", y = "Temperature",
title = "Side-by-Side Boxplots of Monthly Temperatures",
caption = "New York State Dept of Conservation and the National Weather Service") +
geom_boxplot() +
scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))Plot 3 Output
p3Plot 4: Side by Side Boxplots in Gray Scale
uses the same code as previously, except scale_fill_grey instead of **_discrete**
Plot 4 Code
p4 <- airquality |>
ggplot(aes(Month, Temp, fill = Month)) +
labs(x = "Months from May through September", y = "Temperature",
title = "Side-by-Side Boxplots of Monthly Temperatures",
caption = "New York State Dept of Conservation and the National Weather Service") +
geom_boxplot() +
scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))Plot 4 Output
p4Plot 5 Code:
p5 <- airquality |>
ggplot(aes(Solar.R, Ozone, color = Month)) +
geom_point(size = 2.5, alpha = 0.7) +
scale_x_continuous(breaks = seq(0, 350, by = 50)) +
labs(x= "Solar Radiation (Langleys*)", y = "Ozone, parts per billion",
title = "Scatterplot of Ozone vs Solar Radiation",
caption = "New York State Dept of Conservation and the National Weather Service
*Solar Radiation in Langleys in the frequency band 4000-7700 Angstroms")Plot 5 Output
note that 42 observations from the data set did not contain an ozone or a solar radiation reading and were thus automatically omitted from the scatterplot
p5Write up
``` I made a scatter plot of ozone vs solar radiation, with each observation colored by what month it was from. There is likely little to no relationship between ozone and solar radiation, as most of the observations are on the lower end of the ozone scale no matter how much radiation was measured. However, since I colored the observations by month it it can be seen that the ozone is higher July and August. I learned how to change the size of the scatterplot dots using the ‘size’ command insize geom_point. I also learned how to change how often the scale on the axes were marked with scale_x_continuous. I tested out using geom_path to connect the observations in the order they appeared on the data table, but it was an absolute mess and did not help interpreting the graph at all.