Introduction

This project concerns the preliminary analysis and illustration of data related to fine particulate matter (PM2.5), an ambient air pollutant that is harmful to human health.

In the United States, the Environmental Protection Agency (EPA) is tasked with setting national ambient air quality standards for fine PM and for tracking the emissions of this pollutant into the atmosphere. Approximatly every 3 years, the EPA releases its database on emissions of PM2.5. This database is known as the National Emissions Inventory (NEI).

For each year and for each type of PM source, the NEI records how many tons of PM2.5 were emitted from that source over the course of the entire year.

Loading and Preparing the Data for Exploratory Analysis

For the purposes of this project, the present analysis concerns 1999, 2002, 2005, and 2008. To begin, we download and unzip the data. Loading the dataframe may take a few seconds as the file is quite large.

require(downloader)
## Loading required package: downloader
dataset_url <- "https://d396qusza40orc.cloudfront.net/exdata%2Fdata%2FNEI_data.zip"
download(dataset_url, dest = "data.zip", mode = "wb")
unzip("data.zip", exdir = "./")
NEI <- readRDS("summarySCC_PM25.rds")
SCC <- readRDS("Source_Classification_Code.rds")
findata <- with(NEI, aggregate(Emissions, by = list(year), sum))

Examining the Decrease in PM2.5 Emissions: 1999-2008

Total PM2.5 emissions have decreased every year since 1999.

plot(findata, type = "o", main = "Total PM2.5 Emissions", xlab = "Year", ylab = "PM2.5 Emissions", pch = 19, col = "darkblue", lty = 6)

Looking at Changes in PM2.5 Emissions in Baltimore County: 1999-2008

This pattern also holds for Baltimore County in particular, except in the year 2005.

sub1 <- subset(NEI, fips == "24510")
balt <- tapply(sub1$Emissions, sub1$year, sum)
plot(balt, type = "o", main = "Total PM2.5 Emissions in Baltimore County", xlab = "Year", ylab = "PM2.5 Emissions", pch = 18, col = "darkgreen", lty = 5)

Examining Fluctuations in PM2.5 Emissions in Baltimore County By Source: 1999-2008

In the next plot, we investigate this anomaly by examining the specific sources of emissions (point, nonpoint, onroad, nonroad) in Baltimore County for these years.

library(ggplot2)
sub2 <- subset(NEI, fips == "24510")
balt.sources <- aggregate(sub2[c("Emissions")], list(type = sub2$type, year = sub2$year), sum)
qplot(year, Emissions, data = balt.sources, color = type, geom= "line")+ ggtitle("Total PM2.5 Emissions in Baltimore County by Source Type") + xlab("Year") + ylab("PM2.5 Emissions")