Air Quality Homework

library (tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
data ("airquality")
head (airquality)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6
mean(airquality$Temp)
[1] 77.88235
median(airquality$Temp) 
[1] 79
sd(airquality$Wind)
[1] 3.523001
var(airquality$Wind)
[1] 12.41154
 airquality$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September"
summary(airquality$Month)
   Length     Class      Mode 
      153 character character 
airquality$Month<-factor(airquality$Month, 
                         levels = c("May", "June", "July", "August", "September"))
p1 <- airquality |>
  ggplot(aes(x=Temp, fill = Month)) +
  geom_histogram(position="identity") +
  scale_fill_discrete(name ="Month",
                        labels = c("May", "June", "July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept",
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")

print(p1) # I couldn't figure out why it wasn't producing a graph and I ended up searching up what might be happening and adding this was recommended 
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.

p2 <- airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white")+
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept", 
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")
print(p2)

p3 <- airquality |>
  ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Months from May through September", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot() +
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))
print(p3)

p4 <- airquality |>
ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Monthly Temperatures", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot()+
  scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))

print(p4)

p5 <- airquality |>
  ggplot(aes(x=Temp, y=Ozone, colour = Month)) +
  geom_point() +
  scale_fill_discrete(name ="Month",
                        labels = c("May", "June", "July", "August", "September")) +
  labs(x = "Temperature",
       y = "Ozone Level",
       title = "Scatterplot of Ozone Levels at Daily Temperature in May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")

print(p5)
Warning: Removed 37 rows containing missing values or values outside the scale range
(`geom_point()`).

##Short Essay Explanation

I created a scatter plot of the ozone levels at different temperatures throughout the time period of May-September 1973. The X axis is the temperature, and the Y axis is the ozone level. The graph shows a potential relationship between temperature and ozone level (at least during that period of time), higher temperatures and higher ozone levels generally correlated. I colored the dots by month to show where that fits in the year and in our data set. It also helps account for another variable.

It required a little trial and error to figure out how to color the points by month. I initially thought “fill=“ would work, but I turned out to need “colour=“. I used “geom_point=“ to create the scatter plot. Starting from the beginning, I had an issue getting the code to produce graphs. After some trial and error, looking through notes and a few Google searches, I found that I needed to add “print()”.