Airquality Assignment

Author

Phoebe Lam

Air Quality Tutorial and Homework Assignment

Source:https://www.airnow.gov/?city=Germantown&state=MD&country=USA

Source:https://www.airnow.gov/?city=Germantown&state=MD&country=USA

Load in the library

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Load dataset into global environment

data("airquality")

View data with “head” function

head(airquality)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6

Calculate summary statistics

mean(airquality$Temp)
[1] 77.88235

or

mean(airquality[,4])
[1] 77.88235

Calculate Median, Stdev, and Variance

median for Temp

median(airquality$Temp)
[1] 79

standard deviation for Temp

sd(airquality$Temp)
[1] 9.46527

Variance for Temp

var(airquality$Temp)
[1] 89.59133

Rename Months from numbers to names

Numbers 5-9 to May through September

airquality$Month[airquality$Month == 5] <- "May"
airquality$Month[airquality$Month == 6] <- "June"
airquality$Month[airquality$Month == 7] <- "July"
airquality$Month[airquality$Month == 8] <- "August"
airquality$Month[airquality$Month == 9] <- "September"

Summary statistics of dataset

summary(airquality$Month)
   Length     Class      Mode 
      153 character character 

Order months categorically

airquality$Month<-factor(airquality$Month, levels=c("May", "June", "July", "August", "September"))

Create histogram categorized by month

p1 <- airquality |>
 ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity") +
  scale_fill_discrete(name = "Month", labels = c("May", "June", "July", "August", "September")) + 
  labs(x = "Monthly Temperatures from May - Sept", y = "Frequency of Temps", title = "Histogram of Monthly Temperatures from May - Sept, 1973", caption = "New York State Department of Conservation and the National Weather Service") #provide the data source
p1
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

p2 <- airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white") +
  scale_fill_discrete(name = "Month", labels = c("May", "June", "July", "August", "September")) + 
  labs(x = "Monthly Temperatures from May - Sept", y = "Frequency of Temps", title = "Histogram of Monthly Temperatures from May - Sept, 1973", caption = "New York State Department of Conservation and the National Weather Service") #provide the data source
p2

Create side-by-side boxplots categorized by month

p3 <- airquality |> 
  ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Months from May through September", y = "Temperatures", title = "Side-by-Side Boxplot of Monthly Temperatures", caption = "New York State Department of Conservation and the National Weather Service") + 
  geom_boxplot(alpha=0.5, color = "dimgrey") + 
  scale_fill_discrete(name = "Month", labels = c("May", "June", "July", "August", "September"))
p3

Side-by-Side boxplot in grey-scale

p4 <- airquality |> 
  ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Months from May through September", y = "Temperatures", title = "Side-by-Side Boxplot of Monthly Temperatures", caption = "New York State Department of Conservation and the National Weather Service") + 
  geom_boxplot() + 
  scale_fill_grey(name = "Month", labels = c("May", "June", "July", "August", "September"))
p4

Histogram of Solar Radiation between May to September,1973

p5 <- airquality |>
   ggplot(aes(x=Solar.R, fill=Month)) +
  geom_histogram(position="identity", alpha=0.5, binwidth = 15, color = "white") +
  scale_fill_discrete(name = "Month", labels = c("May", "June", "July", "August", "September")) + 
  labs(x = "Monthly Solar Radiation in Langleys (Ly) from May - Sept", y = "Frequency of Solar.R", title = "Histogram of Monthly Solar Radiation from May - Sept, 1973", caption = "New York State Department of Conservation and the National Weather Service") 
p5
Warning: Removed 7 rows containing non-finite outside the scale range
(`stat_bin()`).

I chose to show Solar Radiation during the months May through September of 1973 in a histogram. The x-axis is solar radiation in Langleys and the y-axis is the frequency of each Langley per month. I changed the binwidth to 15 to better display the monthly frequencies of solar radiation.

ScatterPlot of Monthly Solar Radiation

p6 <- airquality |> 
  ggplot(aes(x=Day, y=Solar.R, color=Month)) +
  geom_point(size=2) +
  geom_density_2d() +
  scale_fill_discrete(name = "Month", labels = c("May", "June", "July", "August", "September")) + 
  labs(y = "Solar Radiation in Langleys (Ly)", x = "Days of each Month", title = "Scatterplot of Solar Radiation per day from May - Sept, 1973", caption = "New York State Department of Conservation and the National Weather Service") 
p6
Warning: Removed 7 rows containing non-finite outside the scale range
(`stat_density2d()`).
Warning: Removed 7 rows containing missing values or values outside the scale range
(`geom_point()`).

I thought the histogram was comprehensive and nice to look at but the multiple overlapping colors was hard for me to differentiate the frequencies in solar radiation from one month to the other. So I decided to do a scatterplot for fun, where the x-axis is the days of the month and the y-axis is Solar.R in Ly. It turned out even messier, especially with the addition of geom_density_2d. I thought it would display the density of all the months combined. The boxplot probably would’ve displayed this dataset the best.