Airquality

Author

Ava Haghighi

Airquality Tutorial and homework assignment

 library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
 data("airquality")

 head(airquality)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6

Mean

[1] 77.88235

Median

[1] 79

Standard Deviation

[1] 3.523001

Variance

[1] 12.41154

summary

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  5.000   6.000   7.000   6.993   8.000   9.000 
airquality$Month<-factor(airquality$Month, levels=c("May", "June","July", "August", "September"))

Histogram categorized by Month

airquality$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September" 

p2 <- airquality |>
  ggplot(aes(x=Temp, fill= Month)) +
  geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white")+
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept", 
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")
p2

Side by Side Boxplots in Gray Scale

airquality$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September"

p3<- airquality |>

  ggplot(aes(Month, Temp, fill = "Month")) + 
  labs(x = "Months from May through September", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot() +
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))
p3 

Create side-by-side boxplots categorized by Month

airquality$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September"
p4 <- airquality |>
  ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Monthly Temperatures", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot()+
  scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))
p4

Graph of Ozone Level of each Month

x <- c(1:31)
Y <- c(83,24,77,NA,NA,NA,255,229,110,NA,NA,44,28,65,NA,22,59,23,31,44,21,9,NA,45,168,73,NA,76,118,84,85)
plot(x ,y , xlab="Month" , ylab= "Ozone Level" , main = "Ozone Satuts Month August " , type = "o" , pch = 20 , lwd = 2 , col= "brown")

I was unsuccessful in completing the first three charts.
over view: The project covers four factors: wind, temperature, solar radiation, and ozone levels from May to September. According to the project tutorial, the goal is to create a box plot and histogram to represent the data. Unfortunately, despite numerous attempts and extensive research for help, the resulting box plot and histogram were inadequate and did not display the month on the x-axis.

for plot five My first attempt involved using the argument geom_line to connect the data points on the x-axis, aiming to create a line chart that would show the different factors of the given data for each month, such as temperature, wind, and ozone levels. However, I was not successful in arranging my variables correctly for this argument. Therefore, I had to create my chart using two numerical variables, resulting in a separate chart for each day of the month linked to a specific ozone level and five different charts for each month. This is not the representation I wanted; I wished to create a single chart with five different lines showing the levels throughout the month. Unfortunately, I could not find a way to select the specific month from the database.