Air Quality Assignment

Author

E Jude

Air Quality Assignment

Load the Library

library(tidyverse)

##Load the dataset into your global environment

data("airquality")

Look at the structure of the data

View the data using the “head” function

head(airquality)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6

Calculate Summary Statistics

mean(airquality$Temp)
[1] 77.88235
mean(airquality[,4]) 
[1] 77.88235

Calculate Median, Standard Deviation, and Variance

median(airquality$Temp)
[1] 79
sd(airquality$Wind)
[1] 3.523001
var(airquality$Wind)
[1] 12.41154

Rename the Months from numbers to names

airquality$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September"

Now look at the summary statistics of the dataset

summary(airquality$Month)
   Length     Class      Mode 
      153 character character 

Month is a categorical variable with different levels, called factors.

airquality$Month<-factor(airquality$Month, 
                         levels=c("May", "June","July", "August",
                                  "September"))

Plot 1: Create a histogram categorized by Month

Plot 1 Code

p1 <- airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity")+
  scale_fill_discrete(name = "Month", 
                      labels = c("May", "June","July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept", 
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")  #provide the data source

Plot 1 Output

p1
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.

Plot 2: Improve the histogram of Average Temperature by Month

Plot 2 Code

p2 <- airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white")+
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept", 
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")

Plot 2 Output

p2

Plot 3: Create side-by-side boxplots categorized by Month

p3 <- airquality |>
  ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Months from May through September", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot() +
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))

Plot 3 Output

p3

Plot 4: Side by Side Boxplots in Gray Scale

Plot 4 Code

p4 <- airquality |>
ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Monthly Temperatures", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot()+
  scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))

Plot 4 Output

p4

Plot 5

p5 <- airquality |>
  ggplot(aes(x = Temp, fill = factor(Month))) +
  geom_density(alpha=0.5) +
  labs(x = "Temperature (°F)", y = "Density", 
       title = "Density Plot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") + 
  scale_fill_discrete(name = "Month", labels = c("May", "June", "July", "August", "September"))
p5

Essay

This is a density plot of the air quality data. It provides viewers with key information about how dense the weather was depending on the temperatures in the month. It’s a see-through plot that overlaps the data or information to compare and contrast what is being displayed. There are multiple changes you have to make to get a density plot, one of the key ones being putting month in factor’s parenthesis (factor(month)) to create separate colors for each month. Another is changing geom to geom_density to get the plot. One thing I really liked or found intriguing is how you can adjust how transparent the graph is by changing the numbers of the alpha. I would have liked to do more with the graph, but I couldn’t figure out different forms for scale_fill_discrete.