“— title:”Air Quality Assignment” author: “Z Griffin” format: html editor: visual —

Air Quality Assignment

Load the library

library(tidyverse)

Load the data set into rstudio

data("airquality")

Look at the Structure of the Data

View the data using the “head” function

Display only the first six rows of data

head(airquality)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6

Calculate Summary Statistics

Two different ways to calc mean for the variable temperature

mean(airquality$Temp)
[1] 77.88235
mean(airquality[,4])
[1] 77.88235

The second way is looking for the matrix [row, column], only giving it column 4 (temp), and using all rows

Calculate Median, Standard Deviantion, and Variance

median(airquality$Temp)
[1] 79
sd(airquality$Wind)
[1] 3.523001
var(airquality$Wind)
[1] 12.41154

Rename Months from Numbers to Names

airquality$Month[airquality$Month == 5]<- "May"
airquality$Month[airquality$Month == 6]<- "June"
airquality$Month[airquality$Month == 7]<- "July"
airquality$Month[airquality$Month == 8]<- "August"
airquality$Month[airquality$Month == 9]<- "September"

Now look at the summary statistics of the data set

see how Month has changed from to

summary(airquality$Month)
   Length     Class      Mode 
      153 character character 

Moth is a categorical variable with different levels, called factors.

This is one way to reorder the Months so they don’t default to alphabetical.

airquality$Month<- factor(airquality$Month, levels=c("May", "June", "July", "August", "September"))

Plot 1: Create a historgram categorized by month

Histogram of temperature by month.

  • fill month colors the histogram by month

  • scale_fill_discrete(name = “Month”…) puts the month names on the right as a legend in chronological order. Different way to order it than done above.

  • labs labels! like title, axes, and a caption for the data source

Plot 1 Code

p1 <- airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity")+
  scale_fill_discrete(name = "Month", labels =c("May", "June", "July", "August", "September")) +
  labs(x = "Monthly Temperatures from May - Sept",
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973", 
       caption = "New York State Department of Conservation and the National Weather Service") #data source!

Plot 1 Output

p1

Plot 2: Improve the histogram of Temp by Month

  • Outline the bars using color = “white”

  • add transparency using alpha

  • change binwidth

p2 <- airquality |>
  ggplot(aes(x=Temp, fill=Month)) +
  geom_histogram(position="identity", alpha=0.5, binwidth = 5, color = "white")+
  scale_fill_discrete(name = "Month", labels = c("May", "June", "July", "August", "September"))+
  labs(x = "Monthly Temperatures from May - Sept",
       y = "Frequency of Temps",
       title = "Histogram of Monthly Temperatures from May - Sept, 1973",
       caption = "New York State Department of Conservation and the National Weather Service")

Plot 2 Output

p2

Plot 3: Create side-by-side boxplots categorized by Month

p3 <- airquality |>
  ggplot(aes(Month, Temp, fill = Month)) +
  labs(x = "Months from May through September", y = "Temperature", 
       title = "Side-by-Side Boxplots of Monthly Temperatures",
       caption = "New York State Dept of Conservation and the National Weather Service") +
  geom_boxplot() +
  scale_fill_discrete(name = "Month", labels = c("May", "June","July", "August", "September"))

Plot 3 Output

p3

Plot 4: Side by Side Boxplots in Gray Scale

uses the same code as previously, except scale_fill_grey instead of **_discrete**

Plot 4 Code

p4 <- airquality |>
  ggplot(aes(Month, Temp, fill = Month)) +
  labs(x = "Months from May through September", y = "Temperature", 
       title = "Side-by-Side Boxplots of Monthly Temperatures",
       caption = "New York State Dept of Conservation and the National Weather Service") +
  geom_boxplot() +
  scale_fill_grey(name = "Month", labels = c("May", "June","July", "August", "September"))

Plot 4 Output

p4

Plot 5 Code:

p5 <- airquality |>
  ggplot(aes(Solar.R, Ozone, color = Month)) +
  geom_point(size = 2.5, alpha = 0.7) + 
  scale_x_continuous(breaks = seq(0, 350, by = 50)) +
  labs(x= "Solar Radiation (Langleys*)", y = "Ozone, parts per billion", 
       title = "Scatterplot of Ozone vs Solar Radiation",
       caption = "New York State Dept of Conservation and the National Weather Service
       *Solar Radiation in Langleys in the frequency band 4000-7700 Angstroms")

Plot 5 Output

note that 42 observations from the data set did not contain an ozone or a solar radiation reading and were thus automatically omitted from the scatterplot

p5

Write up

``` I made a scatter plot of ozone vs solar radiation, with each observation colored by what month it was from. There is likely little to no relationship between ozone and solar radiation, as most of the observations are on the lower end of the ozone scale no matter how much radiation was measured. However, since I colored the observations by month it it can be seen that the ozone is higher July and August. I learned how to change the size of the scatterplot dots using the ‘size’ command insize geom_point. I also learned how to change how often the scale on the axes were marked with scale_x_continuous. I tested out using geom_path to connect the observations in the order they appeared on the data table, but it was an absolute mess and did not help interpreting the graph at all.