Airquality Homework Assignment

Author

R Hernandez

Airquality Assignment

Air Quality Index

Air Quality Index

Load the library tidyverse

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Load the data in the global environment

data(airquality)

Subtitle

The function, head, will only disply the first 6 rows of the dataset. Notice in the global environment to the right, there are 153 observations (rows)

head(airquality)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6
mean(airquality$Temp)
[1] 77.88235
mean(airquality[,4])
[1] 77.88235
median(airquality$Temp)
[1] 79
sd(airquality$Wind)
[1] 3.523001
var(airquality$Wind)
[1] 12.41154
airquality$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September"
summary(airquality$Month)
   Length     Class      Mode 
      153 character character 
airquality$Month<-factor(airquality$Month,
                         levels = c("May", "June","July", "August","September"))
airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity") +
  scale_fill_discrete(name="Month",
                      labels = c("May", "June","July","August","September")) +
  labs (x= "Monthly Temperatures from May - Sept",
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service") #provide the data source
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

 airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white") +
  scale_fill_discrete(name= "Month", labels = c("May", "June","July","August","September")) +
  labs(x = "Monthly Temperatures from May - Sept",
       y = "Frequency of Temps", 
       title = "Histogram of Monthly Temperatures from May - Sept, 1973", 
       caption = "New York State Department of Conservation and the National Weather Service")

 airquality |>
  ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Months from May through September", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot() +
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))

airquality |>
  ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Monthly Temperatures", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot()+
  scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))

airquality |>
  ggplot(aes(Month,Ozone, fill=Month)) +
  labs(x = "Months from May through September", y = "Concentration of Ozone", 
       title = "Side-by-Side Boxplot of Monthly Ozones from May to September, 1973",
       caption = "New York State Department of Conservation and the National Weather Service",) +
  geom_boxplot() +
  scale_fill_viridis_d(name = "Month", labels = c("May", "June","July", "August", "September"))
Warning: Removed 37 rows containing non-finite outside the scale range
(`stat_boxplot()`).

###" Brief Essay: The plot that I created was a boxplot, similar to the practice plots 3-4 in the rpubs tutorial. The plot was used to compare the distributions of ozone concentrations from May to September. The varaibles that were used were Ozone, and Months from the New York state Department of Conservation and the National Weather Service data that was provided by the professor. A boxoplot graph contains IQR (Inter Quartile Ranger), which is helpful because it gives us a better visualization of the median which is the line inside the box, Q1 Percentiles, Q3 Percentiles. This is helpful to understand the difference ozone concentrations in each month. The insights that the boxplot gives us is the increase in ozone concentration from may to august, Ozone concentrations decrease in September. Given from the data, August displays the highest outlier and the highest median. I didn't use any special code for this plot because I'm inexperienced in R coding so my knowledge is very minimal. Although I was exploring the Scale_fill, to change the color of the plot."