Synopsis

Fine particulate matter (PM2.5) is an ambient air pollutant for which there is strong evidence that it is harmful to human health. In the United States, the Environmental Protection Agency (EPA) is tasked with setting national ambient air quality standards for fine PM and for tracking the emissions of this pollutant into the atmosphere. Approximatly every 3 years, the EPA releases its database on emissions of PM2.5. This database is known as the National Emissions Inventory (NEI). You can read more information about the NEI at the EPA National Emissions Inventory web site.

For each year and for each type of PM source, the NEI records how many tons of PM2.5 were emitted from that source over the course of the entire year. This data covers years for 1999, 2002, 2005, and 2008.

Analysis

Data Processing

Once the data is downloaded and placed in the proper directory, it is loaded into the program. Packages “dplyr”, “ggplot2”, “knitr”, “lemon”, “kableExtra”, and “tidyr” were used to organize and present the data.

#This assumes you already have the data in your directory
NEI <- readRDS("summarySCC_PM25.rds")
SCC <- readRDS("Source_Classification_Code.rds")

head(NEI,2)
##    fips      SCC Pollutant Emissions  type year
## 4 09001 10100401  PM25-PRI    15.714 POINT 1999
## 8 09001 10100404  PM25-PRI   234.178 POINT 1999

Total U.S. PM2.5 Emissions

Once the main data is loaded in, we can start dissecting it and analyzing various parts. Lets start by plotting the total emissions in the U.S. to see if there’s any discernable trends.

tot_by_year <- tapply(NEI$Emissions, NEI$year, sum)

options(scipen=999)
plot(names(tot_by_year),tot_by_year/1000, type = "l", xlab = "Year", ylab =expression(PM[2.5] ~ "Emissions (Per 1K Tons)"), main = expression("Total U.S." ~ PM[2.5] ~ "Emissions by Year"), lwd=5)

As we can see, total emissions have decreased significantly over time.

Total Baltimore City PM2.5 Emissions

#Need to create a subset of baltimore city. This is identified by the fips = 24510 which was given

Baltimore <- subset(NEI, fips =="24510")

#Need to group everything by year just like the first plot
bal_by_year <- tapply(Baltimore$Emissions, Baltimore$year, sum)



options(scipen=999)
plot(names(bal_by_year),bal_by_year/1000, type = "l", xlab = "Year", ylab =expression(PM[2.5] ~ "Emissions (Per 1K Tons)"), main = expression("Total Baltimore" ~ PM[2.5] ~ "Emissions by Year"), lwd=5)

Baltimore shows an overall decreasing trend, but saw a spike between 2002-2005.

Total Baltimore City PM2.5 Emissions by Type

Of the four types of sources indicated by the type (point, nonpoint, onroad, nonroad) variable, we can look at the differences by type through 1999-2008 for Baltimore City.

Baltype <- subset(NEI, fips =="24510")

bal_by_type <- aggregate(Baltype$Emissions, list(Baltype$type, Baltype$year),sum)
names(bal_by_type) <- c("Type","Year","Emissions")

c <- qplot(Year,Emissions/1000, data = bal_by_type, color = Type)+ geom_line(lwd=1.5) + ggtitle(expression("Baltimore" ~ PM[2.5] ~ "Emissions by Type and Year")) +
           xlab("Year") + ylab(expression(PM[2.5] ~ "Emissions (Per 1K Tons)")) + theme(plot.title = element_text(hjust = .5))

print(c)

As a result, we can determine every type except “Point” has shown a decreasing trend for Baltimore City.

Total Coal Combustion PM2.5 Emissions

# Across the United States, how have emissions from coal combustion-related sources changed from 1999-2008?
scc_id <- as.character(SCC[grep("Coal",SCC$Short.Name, ignore.case = TRUE),1])
#Combine the data sets
coaldata <- subset(NEI, NEI$SCC %in% scc_id)


# Subset the coal data
coal_by_year <- tapply(coaldata$Emissions, coaldata$year, sum)

options(scipen=999)
plot(names(coal_by_year),tot_by_year/1000, type = "l", xlab = "Year", ylab =expression(PM[2.5] ~ "Emissions (Per 1K Tons)"), main = expression("Total Coal Combustion" ~ PM[2.5] ~ "Emissions by Year"), lwd=5)

Total Auto PM2.5 Emissions in Baltimore

scc_id2 <- as.character(SCC[grep("On-Road",SCC$EI.Sector),1])

Baltimore2 <- subset(NEI, NEI$SCC %in% scc_id2 & NEI$fips == "24510")

bal_by_year2 <- tapply(Baltimore2$Emissions, Baltimore2$year, sum)

options(scipen=999)
plot(names(bal_by_year2),bal_by_year2, type = "l", xlab = "Year", ylab =expression(PM[2.5] ~ "Emissions (Tons)"), main = expression("Total Baltimore" ~ PM[2.5] ~ " Auto Emissions by Year"), lwd=5)

Total Emissions in Baltimore vs. Los Angeles

scc_id3 <- as.character(SCC[grep("On-Road",SCC$EI.Sector),1])

BaltLA <- subset(NEI, NEI$SCC %in% scc_id3 & NEI$fips %in% c("24510","06037"))

tot_by_year3 <- aggregate(Emissions ~ fips+year,BaltLA,sum)
tot_by_year3$fips <- ifelse(tot_by_year3$fips == "06037", "Los Angeles","Baltimore")
names(tot_by_year3) <- c("City","Year","Emissions")

options(scipen=999)
g <- qplot(Year,Emissions/1000, data = tot_by_year3, color = City)+ geom_line(lwd=1.5) + ggtitle(expression("Baltimore and LA" ~ PM[2.5] ~ "Emissions by Year")) +
  xlab("Year") + ylab(expression(PM[2.5] ~ "Emissions (Per 1K Tons)")) + theme(plot.title = element_text(hjust = .5))

print(g)

Los angeles has much higher emissions than Baltimore City and also shows a slight increasing trend. It would be interesting to see more recent data to identify additional trends.

Conclusion

Almost all of our results indicate emissions have decreased during the period 1999-2008. With more recent data, investigations could be done to determine the problem areas, but that is outside the scop of this analysis.