Airquality HW

Author

J Amaya

Load the library

library(tidyverse)

Load the dataset into your global environment

data("airquality")

Look at the structure of the data

View the data using the “head” function

head(airquality)

  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6

Calculate Summary Statistics

mean(airquality$Temp)

[1] 77.88235

mean(airquality[,4])

[1] 77.88235

Calculate Median, Standard Deviation, and Variance

median(airquality$Temp)

[1] 79

sd(airquality$Wind)

[1] 3.523001

var(airquality$Wind)

[1] 12.41154

Rename the Months from number to names

airquality$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September"

Now look at the summary statistics of the dataset

summary(airquality$Month)

   Length     Class      Mode 
      153 character character

Month is a categorical variable with different levels, called factors.

airquality$Month<-factor(airquality$Month, 
                         levels=c("May", "June","July", "August",
                                  "September"))

Plot 1: Create a histogram categorized by Month

Plot 1 Code

p1 <- airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity")+
  scale_fill_discrete(name = "Month", 
                      labels = c("May", "June","July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept", 
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")  #provide the data source

Plot 1 Output

p1

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Plot 2: Improve the histogram of Average Temperature by Month

Plot 2 Code

p2 <- airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white")+
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept", 
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")

Plot 2 Output

p2

Plot 3: Create side-by-side boxplots categorized by Month

p3 <- airquality |>
  ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Months from May through September", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot() +
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))

Plot 3 Output

p3

Plot 4: Side by Side Boxplots in Gray Scale

Plot 4 Code

p4 <- airquality |>
ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Monthly Temperatures", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot()+
  scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))

Plot 4 Output

p4

Plot 5:

mean_ozone <- mean(airquality$Ozone, na.rm = TRUE)

p5 <- airquality |>
  ggplot(aes(x=Ozone, fill=Month)) +
  geom_histogram(position="identity", alpha=.3, binwidth = 10, color = " white")+
  geom_vline(aes(xintercept = mean(Ozone, na.rm = TRUE)), color = "darkred", linetype = "solid", linewidth = 1)+
  scale_fill_brewer(name = "Month", labels = c("May", "June","July", "August", "September"), palette = "RdPu", direction = 1) +
  labs(x = "Ozone Levels from May - Sept", 
       y = "Frequency of Ozone",
       title = "Ozone Levels in May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")

Plot 5 Output

p5

Warning: Removed 37 rows containing non-finite outside the scale range
(`stat_bin()`).

Write a brief essay here

The data visualization that I created for Plot 5 is to view the Ozone Levels from May-Sept. I used similar code from Plot 2 but there are two main differences which are changing the default color palette and adding a mean line.

Firstly, I changed from the Temp data to Ozone data in the line of “ggplot(aes(x=Ozone, fill=Month)”. Then, I adjusted the titles so it can fit with the data.

For the color palette, I analyzed the “scale_fill_discrete” code from the tutorial, I found a similar code that worked which was “scale_fill_brewer”. The “brewer” part is a way to change the color of the plot using color palettes. This led me to further research R’s color palette library where I managed to find a color palette (RdPu) that I personally like and looks good with the data. I had to adjust the thickness and the opacity of the histogram bars so it is clearer to view.

Code for the color palette:

scale_fill_brewer(name = “Month”, labels = c(“May”, “June”,“July”, “August”, “September”), palette = “RdPu”, direction = 1)

I added the mean line by analyzing the previous plot’s code where I found the line “geom_vline”. When I hovered over the code, it showed me how to use it where I then customized the line color, width and style so it can stand out in the data. I did that by adding this code:

geom_vline(aes(xintercept = mean(Ozone, na.rm = TRUE)), color = “darkred”, linetype = “solid”, linewidth = 1)

Overall, the data is now easy to analyze by having a dark red line to show what the mean of the Ozone levels were from May-Sept. This can be helpful to determine any outliers or abnormal levels Also, changing the color palette to included lighter shades is helpful to see when the frequency of ozone levels overlap each other so it is easier to view.