Air quality

Author

Daniel Ekane

Airquality

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Load dataset into global environment

data("airquality")

View data using the head function

head(airquality)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6

two ways to calculate the mean of temp. dollar sign helps pinpoint temp

mean(airquality$Temp)
[1] 77.88235

the other way is to call on the dataset(airquality) and count which position temp is on the dataset

mean(airquality[,4])
[1] 77.88235

Calculating Median, standard deviation, and variance

median(airquality$Temp)
[1] 79
sd(airquality$Wind)
[1] 3.523001
var(airquality$Wind)
[1] 12.41154

Rename the months from numbers to names

Number 5-9 to May through September

airquality$Month[airquality$Month == 5] <- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September"

Months has changed to have characters instead of numbers

summary(airquality$Month)
   Length     Class      Mode 
      153 character character 

This is one way to reorder the Months so they do not default to alphabetical

airquality$Month<-factor(airquality$Month, levels=c("May", "June","July", "August", "September"))

Plot 1: Create a histogram categorized by Month```

p1 <- airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity")+
  scale_fill_discrete(name = "Month", 
                      labels = c("May", "June","July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept", 
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service") 
p1
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Plot 2: Improve the Histogram using ggplot

Histogram of Average Temperature by Month

p2<- airquality |>
  ggplot(aes(x=Temp, fill= Month)) +
  geom_histogram(position= "identity", alpha=0.5, binwidth = 5, color="white") +
  scale_fill_discrete(name= "Month", labels= c("May", "June","July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept", 
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")
p2

Plot 3: Create side-by-side boxplots categorized by Month

August has the highest temperatures based on the boxplot distribution.

p3<- airquality |>
  ggplot(aes(Month, Temp, fill= Month)) +
  labs(x = "Months from May through September", y= "Temperatures", 
       title= "Side-by-Side Boxplot of Monthly temperatures",
       caption= "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot() +
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))
p3

Plot 4: Make the same side-by-side boxplots, but in grey scale

p4 <- airquality |>
  ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Monthly Temperatures", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot()+
  scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))
p4

Plot 5:

p5<- airquality |>
  ggplot(aes(Month, Temp, fill= Month))+
  labs(x = "Months from May through September", y= "Temperatures", 
       title= "Scatter Plot of Monthly temperatures by days",
       caption= "New York State Department of Conservation and the National Weather Service") +
  geom_point(col="blue")+
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))
p5

As stated by my graph’s title, this graph represents the monthly temperature by days.I used a scatter plot to represent it by adding the “geom_point” function. As you can see on the graph; June,July,August, and September had most of their days with a temperature equal to or higher than 8o degrees, while May only had one day in which the temperature was over 80 degrees.