Airquality Homework Assignment

Author

N Diker

Air quality Assignment

Air pollution - Taiwan Skyline

Air pollution - Taiwan Skyline

Load library(tidyverse)

Aids in keeping our data “tidy”.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Load the data in the global environment

This data set is a compilation of data spanning from the months of May to September sourced from the New York State Department of Conservation and the National Weather Service.

data("airquality")

Using this function, we are able to isolate the first six rows of data from the data set.

head(airquality)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6

Using these two functions, we are able to identify the mean air quality within the data set (either one can be used).

mean(airquality$Temp)
[1] 77.88235
mean(airquality[,4]) 
[1] 77.88235

This function gives us the mean temperature.

median(airquality$Temp)
[1] 79

This function gives us the standard deviation for wind.

sd(airquality$Wind)
[1] 3.523001

This function gives us the variance for wind.

var(airquality$Wind)
[1] 12.41154

This function converts the numbers 5-9 into the corresponding months (names).

airquality$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September"

This function gives us the summary statistics of our data set.

summary(airquality$Month)
   Length     Class      Mode 
      153 character character 

This function reorders the months May-September and prevents them from being alphabetically categorized by default.

airquality$Month<-factor(airquality$Month, 
                         levels=c("May", "June","July", "August",
                                  "September"))

This piece of code creates our first plot which is a histogram categorized by Month

p1 <- airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity")+
  scale_fill_discrete(name = "Month", 
                      labels = c("May", "June","July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept", 
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")  #provide the data source

Output for plot 1

p1
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

This piece of code creates our second plot which improves the histogram of Average Temperature by Month (the first plot).

p2 <- airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white")+
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept", 
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")

Output for plot 2.

p2

This piece of code creates our third plot which is side-by-side boxplots categorized by Month.

p3 <- airquality |>
  ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Months from May through September", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot() +
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))

Output for plot 3.

p3

This piece of code creates our fourth plot which is side-by-side box plots in gray scale.

p4 <- airquality |>
  ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Monthly Temperatures", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot()+
  scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))

Output for plot 4.

p4

This piece of code creates our fifth plot which is a scatter plot of how solar radiation levels affect the temperature.

p5 <- airquality |>


  ggplot(aes(Solar.R, Temp)) +
  geom_point(color = "green", alpha = 0.9) +
  labs(x = "Solar Radiation", y = "Temperatures", 
       title = "Scatterplot Comparing How Solar Radiation Affects Temperature",
       caption = "New York State Department of Conservation and the National 
       Weather Service"
    )

Output for plot 5.

p5
Warning: Removed 7 rows containing missing values or values outside the scale range
(`geom_point()`).

For plot 5, I created a scatter plot that visualizes the relationship between Solar Radiation and Temperatures. The plot fails to show a strong relationship between the two variables; however, we can see that there is a general increase in temperatures when solar radiation is higher. Using the geom_point function, I was able to create the scatter plot points and change their colors/shades.