Air Quality assignment

Author

Charlene Stephia

##Loaded in the library

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
data("airquality")

##View the data using the “head” function

head(airquality)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6

##Calculate Summary Statistics

mean(airquality$Temp)
[1] 77.88235

##Calculating the mean

mean(airquality[,4]) 
[1] 77.88235

##Calculate Median, Standard Deviation, and Variance

median(airquality$Temp)
[1] 79

##Calculating the Standard Deviation

sd(airquality$Wind)
[1] 3.523001

##Calculating the Variance

var(airquality$Wind)
[1] 12.41154

##Renaming the months from numbers to letters

airquality$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September"

##summary statistics of the dataset

summary(airquality$Month)
   Length     Class      Mode 
      153 character character 

##Month with a categorical variable with different levels, called factors.

airquality$Month<-factor(airquality$Month, 
                         levels=c("May", "June","July", "August",
                                  "September"))

##Created histogram categorized by Month

p1 <- airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="Identity")+
  scale_fill_discrete(name = "Month", 
                      labels = c("5", "6","7", "8", "9")) +
  labs(x = "Monthly Temperatures from May - Sept", 
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")  #provide the data source
print(p1)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

##Plot 2: Improve the histogram of Average Temperature by Month

p2 <- airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white")+
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept", 
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")
p2

##Plot 3: Create side-by-side boxplots categorized by Month

p3 <- airquality |>
  ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Months from May through September", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot() +
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))
p3

p4 <- airquality |>
ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Monthly Temperatures", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot()+
  scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))
p4

##plot 5

p5 <- airquality |>
  ggplot(aes(x = factor(Month), y = Wind, fill = factor(Month))) + 
  geom_boxplot() + 
  labs(x = "Month", 
       y = "Wind speed", 
       title = "Wind speed by Month",
       caption = "Data Source: Maryland State Department of Conservation and the National Weather Service") +
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))
p5

##write a brief essay here

##Describe the plot type you have created The plot I created is a side-by-side boxplot. A boxplot is used to show how data is spread out. It displays temperature accross DC that shows the elements of wind speed and monthly humidity.

This boxplot compares temperature (Temp) across different months (Month) using grey shades to fill each month’s box.

##Any insights that the plot shows The x-axis shows “Monthly humidity The y-axis shows”wind speed Temperature differences across months – Some months have higher or lower temperaturesTemperature differences across months Some months have higher or lower temperatures

##Describe any special code you used to make this plot One special code I used is ggplot(aes(Month, Temp, fill = Month)) – This tells ggplot to plot Month on the x-axis andTemp on the y-axis and to fill the boxes with different shades for each month. I also used scale_fill_grey() – This changes the fill color of the boxes to different shades of grey instead of colors. In addition I used labs() – This is used to add the x-axis label, y-axis label, title, and caption for the plot.