The data set has 1461 rows (observations) and 10 columns (variables). This dataset deals with pollution in the U.S. Pollution in the U.S. has been well documented by the U.S. EPA but it is a pain to download all the data and arrange them in a format that interests data scientists. It contains about the four major pollutants (Nitrogen Dioxide, Sulphur Dioxide, Carbon Monoxide and Ozone) from 2000 - 2016. The data is from the National Morbidity and Mortality Air Pollution Study (NMMAPS). To make the plots manageable the data is limited to Chicago and 1997–2000.
library(ggplot2)
chic <- readr::read_csv("https://raw.githubusercontent.com/Z3tt/R-Tutorials/master/ggplot2/chicago-nmmaps.csv")
ggplot(chic, aes(x = date, y = temp)) +
geom_point(color = "darkgreen") +
labs(x = "Year", y = "Temperature (°F)" , title = "Temperatures in Chicago between 1997 and 2001 in Degrees Fahrenheit",
caption = "Data: NMMAPS") +
theme(axis.text = element_text(color = "darkblue", size = 12),
axis.text.x = element_text(face = "italic"))
The temperature follows the same trend every year. It gets extremely cold in winters with the temperature falling down to 0 degrees fahrenheit. While temperature has long been known as a catalyst for pollutants to be more airborne, it is unclear how an increase in temperature affects air pollution during heatwaves. It can be further analyzed how the temperature affects the levels of ozone. As in the studies it is found that the levels of ozone increase by more than 50% with increase in temperature.