Air Quality Homework Assignment

Author

M. Tariq

Air Quality Assignment

View the data using the “head” function

The function, head, will only display the first 6 rows of the data set. Notice in the global environment to the right, there are 153 observations (rows)

head(airquality)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6

Calculate Summary Statistics

mean(airquality$Temp)
[1] 77.88235

Calculate Median, Standard Deviation, and Variance

median(airquality$Temp)
[1] 79
sd(airquality$Wind)
[1] 3.523001
var(airquality$Wind)
[1] 12.41154

Rename the Months from number to names

airquality$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September"

Now look at the summary statistics of the dataset

summary(airquality$Month)
   Length     Class      Mode 
      153 character character 

Month is a categorical variable with different levels, called factors.

airquality$Month<-factor(airquality$Month, 
                         levels=c("May", "June","July", "August",
                                  "September"))

Plot 1: Create a histogram categorized by Month

p1 <- airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity")+
  scale_fill_discrete(name = "Month", 
                      labels = c("May", "June","July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept", 
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")  #provide the data source
p1
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Plot 2: Improve the histogram of Average Temperature by Month

p2 <- airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white")+
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept", 
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")
p2

Plot 3: Create side-by-side boxplots categorized by Month

p3 <- airquality |>
  ggplot(aes(x = factor(Month), y = Temp, fill = factor(Month))) + 
  labs(
    x = "Months from May through September", 
    y = "Temperatures", 
    title = "Side-by-Side Boxplot of Monthly Temperatures",
    caption = "New York State Department of Conservation and the National Weather Service"
  ) +
  geom_boxplot() +
  scale_fill_discrete(name = "Month", labels = c("May", "June", "July", "August", "September"))
  
print(p3)

##Plot 4: Side by Side Boxplots in Gray Scale

p4 <- airquality |>
  ggplot(aes(Month, Temp, fill = Month)) + 
  labs(x = "Monthly Temperatures", y = "Temperatures", 
       title = "Side-by-Side Boxplot of Monthly Temperatures",
       caption = "New York State Department of Conservation and the National Weather Service") +
  geom_boxplot()+
  scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))
p4

Plot 5: Scatterplot of Ozone vs. Solar Radiation

p5 <- airquality |>
  ggplot(aes(x = Solar.R, y = Ozone)) +
  geom_point(aes(color = Month), alpha = 0.7) +
  labs(
    x = "Solar Radiation (langley)", 
    y = "Ozone Concentration (ppb)", 
    title = "Scatterplot of Ozone vs. Solar Radiation",
    caption = "Data source: New York State Department of Conservation and the National Weather Service"
  ) +
  scale_fill_grey(name = "Month") 
p5
Warning: Removed 42 rows containing missing values or values outside the scale range
(`geom_point()`).

brief essay here

I decided to create a scatter plot in order to show the correlation between solar radiation and ozone concentration. This scatter plot gives a visual summary of the correlation and shows how it differs by month. One insight that this plot provides is the seasonal variation of the relationship between solar radiation and ozone concentration, in areas of the plot with more distinct patterns and colors, we can clearly see how this relationship behaves and changes at different times of the year. This can allow us to predict and prepare certain kinds of weather such as heat waves. One noticeable feature of this plot is the fact that I finally learned how to use the themes. I’m still not sure if I did correctly but since it seems to running properly I’m just not going to touch it since I messed with it for like 2 hours. I picked this theme specifically because of the variance of blue in it, its not too overbearing where your eyes are overloaded with colors nor is it too close to where its difficult to differentiate data points.

Load the Library tidyverse

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ lubridate 1.9.3     ✔ tibble    3.2.1
✔ purrr     1.0.2     ✔ tidyr     1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(googledrive)

Load the Data in the global enviornment

data("airquality")