Data 110 - Air Quality Assignment

Author

Shadeja Fuentes

Load Air Quality data set into the global environment

airquality <- airquality
str(airquality)
'data.frame':   153 obs. of  6 variables:
 $ Ozone  : int  41 36 12 18 NA 28 23 19 8 NA ...
 $ Solar.R: int  190 118 149 313 NA NA 299 99 19 194 ...
 $ Wind   : num  7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
 $ Temp   : int  67 72 74 62 56 66 65 59 61 69 ...
 $ Month  : int  5 5 5 5 5 5 5 5 5 5 ...
 $ Day    : int  1 2 3 4 5 6 7 8 9 10 ...

Calculate summary statistics

mean(airquality$Temp)
[1] 77.88235
median(airquality$Temp)
[1] 79
sd(airquality$Wind)
[1] 3.523001
var(airquality$Wind)
[1] 12.41154

Convert numbered months to categorical labels

airquality$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September"

Observe the changes in the summary statistics, which now display the months as categorical labels

str(airquality)
'data.frame':   153 obs. of  6 variables:
 $ Ozone  : int  41 36 12 18 NA 28 23 19 8 NA ...
 $ Solar.R: int  190 118 149 313 NA NA 299 99 19 194 ...
 $ Wind   : num  7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
 $ Temp   : int  67 72 74 62 56 66 65 59 61 69 ...
 $ Month  : chr  "May" "May" "May" "May" ...
 $ Day    : int  1 2 3 4 5 6 7 8 9 10 ...
summary(airquality)
     Ozone           Solar.R           Wind             Temp      
 Min.   :  1.00   Min.   :  7.0   Min.   : 1.700   Min.   :56.00  
 1st Qu.: 18.00   1st Qu.:115.8   1st Qu.: 7.400   1st Qu.:72.00  
 Median : 31.50   Median :205.0   Median : 9.700   Median :79.00  
 Mean   : 42.13   Mean   :185.9   Mean   : 9.958   Mean   :77.88  
 3rd Qu.: 63.25   3rd Qu.:258.8   3rd Qu.:11.500   3rd Qu.:85.00  
 Max.   :168.00   Max.   :334.0   Max.   :20.700   Max.   :97.00  
 NA's   :37       NA's   :7                                       
    Month                Day      
 Length:153         Min.   : 1.0  
 Class :character   1st Qu.: 8.0  
 Mode  :character   Median :16.0  
                    Mean   :15.8  
                    3rd Qu.:23.0  
                    Max.   :31.0  
                                  

Sort months in sequential order

library("ggplot2")
airquality$Month<-factor(airquality$Month, levels=c("May", "June","July", "August", "September"))

Air Temperatures by Month in Histogram

p1 <- qplot(data = airquality,Temp,fill =  Month,geom = "histogram", bins = 20)
Warning: `qplot()` was deprecated in ggplot2 3.4.0.
p1

Average Temperatures by Month

library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
p2 <- airquality %>%
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white")+
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))
p2

Boxplot of Average Temperatures by Month

p3 <- airquality %>%
  ggplot(aes(Month, Temp, fill = Month)) + 
  ggtitle("Temperatures") +
  xlab("Monthly Temperatures") +
  ylab("Frequency") +
  geom_boxplot() +
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))
p3 

Grayscale Boxplot display

p4 <- airquality %>%
  ggplot(aes(Month, Temp, fill = Month)) + 
  ggtitle("Monthly Temperature Variations") +
  xlab("Monthly Temperatures") +
  ylab("Frequency") +
  geom_boxplot()+
  scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))
p4

BoxPlot of Wind Speed Variations

Below is a box plot that illustrates wind speeds. The median wind speed for each occurrence can be observed from the mid line. The outliers on the plot indicate wind speeds that fall outside of the distribution pattern of the months sampled.

ggplot(airquality, aes(Month, Wind, fill = Month)) + 
  ggtitle("Wind") +
  xlab("Monthly Wind Speeds") +
  ylab("Frequency") +
  geom_boxplot() +scale_fill_brewer()

Histogram of Average Wind Speeds by Month

Below is a histogram that illustrate the average wind speed observed each month.

p2 <- airquality %>%
  ggplot(aes(x=Wind, fill=Month)) +
  geom_histogram(position="identity", alpha=0.7, binwidth = 5, color = "black")+
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))
p2