AirQuality HW Assign

Author

Walter Hinkley

Air Quality Assignment

Reasons and Effects of Air Pollution

Reasons and Effects of Air Pollution

Load the Library Tidyverse

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Load the Data in the Global Environment

data("airquality")

Display the first 6 lines of the data set

head(airquality)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6

Show the Mean Temperature

mean(airquality$Temp)
[1] 77.88235

Show the Mean number of collumn 4

mean(airquality[,4]) 
[1] 77.88235

Show the Median of the Temperatures

median(airquality$Temp)
[1] 79

Display the Standard Deviation of wind speeds

sd(airquality$Wind)
[1] 3.523001

Calculate the Variance of wind speeds

var(airquality$Wind)
[1] 12.41154

Rename the Month Numbers to Names

airquality$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September"

Show summary of Months classified as character

summary(airquality$Month)
   Length     Class      Mode 
      153 character character 

Reorder the Months so they are not alphabetical

airquality$Month<-factor(airquality$Month, 
                         levels=c("May", "June","July", "August",
                                  "September"))

p1 Histogram

p1 <- airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity")+
  scale_fill_discrete(name = "Month", 
                      labels = c("May", "June","July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept", 
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")  #provide the data source
p1
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

p2 Histogram

p2 <- airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white")+
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept", 
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")
p2

p3 Boxplot

p3 <- airquality |>
  ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Months from May through September", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot() +
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))
p3

p4 Boxplot

p4 <- airquality |>
  ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Monthly Temperatures", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot()+
  scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))
p4

p5 Boxplot (Solar. and Month)

p5 <- airquality |>
  ggplot(aes(Month, Solar.R, fill = Month)) +
  labs(x = "Months from May through September", y = "Solar.R", 
       title = "Side by Side Box Plot of Monthly Solar R.",
       caption = "New York State Department of Conservation and the National Weather Service") + 
  geom_boxplot() +
  scale_fill_discrete(name = "Month", labels = c("May", 
"June", "July", "August", "September"))                                                 
p5
Warning: Removed 7 rows containing non-finite outside the scale range
(`stat_boxplot()`).

Summary

I tried to find a correlation between some other variables other than Month and Temp. It was not easy to see one, or to see how each variable affected the others. I was curious if the level of Solar R was different at different times of the year so I created a side by side box plot to see if it changed. From month to month it really didn’t change except for July. July has a definite higher median and a shorter range hi and lo readings. The median the rest of the year is fairly consistent but the range in May is much wider than the rest of the months. Would be curious with more data if this was an error or was there an event that caused different readings in July. To create the box plot I followed p3 but had to change the titles as well as the ggplot aesthetic categories.