Airquality HW

Author

Micaela T

Load the library

library(tidyverse)

Load the dataset into your global environment

data("airquality")

Look at the structure of the data

View the data using the “head” function

head(airquality)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6

Calculate Summary Statistics

mean(airquality$Temp)
[1] 77.88235
mean(airquality[,4]) 
[1] 77.88235

Calculate Median, Standard Deviation, and Variance

median(airquality$Temp)
[1] 79
sd(airquality$Wind)
[1] 3.523001
var(airquality$Wind)
[1] 12.41154

Rename the Months from number to names

airquality$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September"

Now look at the summary statistics of the dataset

summary(airquality$Month)
   Length     Class      Mode 
      153 character character 

Month is a categorical variable with different levels, called factors.

airquality$Month<-factor(airquality$Month, 
                         levels=c("May", "June","July", "August",
                                  "September"))

Plot 1: Create a histogram categorized by Month

Plot 1 Code

p1 <- airquality |>
  ggplot(aes(x = Temp, fill = factor(Month))) +
  geom_histogram(position = "identity", binwidth = 1.44, alpha = 1.0) + 
  scale_fill_discrete(name = "Month", 
                      labels = c("May", "June", "July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept", 
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")

Plot 1 Output

p1

Plot 2: Improve the histogram of Average Temperature by Month

Plot 2 Code

p2 <- airquality |>
  ggplot(aes(x=Temp, fill= factor(Month))) +
  geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white")+
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept", 
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")

Plot 2 Output

p2

Plot 3: Create side-by-side boxplots categorized by Month

Plot 3 Code

p3 <- airquality |>
  ggplot(aes(Month, Temp, fill = factor(Month))) + 
  labs(x = "Months from May through September", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot() +
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))

Plot 3 Output

p3

Plot 4: Side by Side Boxplots in Gray Scale

Plot 4 Code

p4 <- airquality |>
ggplot(aes(Month, Temp, fill = factor(Month))) + 
  labs(x = "Monthly Temperatures", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot()+
  scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))

Plot 4 Output

p4

Plot 5: Create a Barplot categorized by Month

Plot 5 Code

p5 <- airquality |>
  ggplot(aes(x = factor(Month), fill = factor(Temp))) + 
  geom_bar(position = "stack", color = "white") + 
  scale_fill_discrete(name = "Temperature Ranges") + 
  labs(x = "Months (May - Sept)", y = "Frequency of Temps",
    title = "Stacked Bar Graph of Monthly Temperatures",
    caption = "New York State Department of Conservation and the National Weather Service")

Plot 5 Output

p5

Brief Essay

For Plot 5, the type of plot I chose to create is the Stacked Bar plot which can be shown above as the output. The Stacked Bar plot shows where we can see the relationship between the Frequency of Temps and Months. Each data point is differentiated and categorized by the colors of the months that is matched to the right of the Stacked Bar plot. In addition, a clear axis labels such as Months and Frequency of Temps, a title “Stacked Bar Graph of Monthly Temperatures”, and a caption that is connected to the data source called, “New York state Department of Conservation and the National Weather Service”. As for the coding I chose to use, is the same as the others with similar data such as the ggplot except I only made a few changes to it since it is Stacked Bar plot. Based on the notes from this assignment and the example bar_chart_with_diamond assignment, I added a geom_point with different size and a geom_bar, a position to have it stacked with the color white and other colors same as the plots before it. The colors represents the temperature ranges that belongs with each month from May to September and based on the height of the stacked bar graph it also represents the frequency of that specific temperature range.