Fine particulate matter (PM2.5) is an ambient air pollutant for which there is strong evidence that it is harmful to human health. In the United States, the Environmental Protection Agency (EPA) is tasked with setting national ambient air quality standards for fine PM and for tracking the emissions of this pollutant into the atmosphere. Approximatly every 3 years, the EPA releases its database on emissions of PM2.5. This database is known as the National Emissions Inventory (NEI). You can read more information about the NEI at the EPA National Emissions Inventory web site.
For each year and for each type of PM source, the NEI records how many tons of PM2.5 were emitted from that source over the course of the entire year. This data covers years for 1999, 2002, 2005, and 2008.
Once the data is downloaded and placed in the proper directory, it is loaded into the program. Packages “dplyr”, “ggplot2”, “knitr”, “lemon”, “kableExtra”, and “tidyr” were used to organize and present the data.
#This assumes you already have the data in your directory
NEI <- readRDS("summarySCC_PM25.rds")
SCC <- readRDS("Source_Classification_Code.rds")
head(NEI,2)
## fips SCC Pollutant Emissions type year
## 4 09001 10100401 PM25-PRI 15.714 POINT 1999
## 8 09001 10100404 PM25-PRI 234.178 POINT 1999
Once the main data is loaded in, we can start dissecting it and analyzing various parts. Lets start by plotting the total emissions in the U.S. to see if there’s any discernable trends.
tot_by_year <- tapply(NEI$Emissions, NEI$year, sum)
options(scipen=999)
plot(names(tot_by_year),tot_by_year/1000, type = "l", xlab = "Year", ylab =expression(PM[2.5] ~ "Emissions (Per 1K Tons)"), main = expression("Total U.S." ~ PM[2.5] ~ "Emissions by Year"), lwd=5)
As we can see, total emissions have decreased significantly over time.
#Need to create a subset of baltimore city. This is identified by the fips = 24510 which was given
Baltimore <- subset(NEI, fips =="24510")
#Need to group everything by year just like the first plot
bal_by_year <- tapply(Baltimore$Emissions, Baltimore$year, sum)
options(scipen=999)
plot(names(bal_by_year),bal_by_year/1000, type = "l", xlab = "Year", ylab =expression(PM[2.5] ~ "Emissions (Per 1K Tons)"), main = expression("Total Baltimore" ~ PM[2.5] ~ "Emissions by Year"), lwd=5)
Baltimore shows an overall decreasing trend, but saw a spike between 2002-2005.
Of the four types of sources indicated by the type (point, nonpoint, onroad, nonroad) variable, we can look at the differences by type through 1999-2008 for Baltimore City.
Baltype <- subset(NEI, fips =="24510")
bal_by_type <- aggregate(Baltype$Emissions, list(Baltype$type, Baltype$year),sum)
names(bal_by_type) <- c("Type","Year","Emissions")
c <- qplot(Year,Emissions/1000, data = bal_by_type, color = Type)+ geom_line(lwd=1.5) + ggtitle(expression("Baltimore" ~ PM[2.5] ~ "Emissions by Type and Year")) +
xlab("Year") + ylab(expression(PM[2.5] ~ "Emissions (Per 1K Tons)")) + theme(plot.title = element_text(hjust = .5))
print(c)
As a result, we can determine every type except “Point” has shown a decreasing trend for Baltimore City.
# Across the United States, how have emissions from coal combustion-related sources changed from 1999-2008?
scc_id <- as.character(SCC[grep("Coal",SCC$Short.Name, ignore.case = TRUE),1])
#Combine the data sets
coaldata <- subset(NEI, NEI$SCC %in% scc_id)
# Subset the coal data
coal_by_year <- tapply(coaldata$Emissions, coaldata$year, sum)
options(scipen=999)
plot(names(coal_by_year),tot_by_year/1000, type = "l", xlab = "Year", ylab =expression(PM[2.5] ~ "Emissions (Per 1K Tons)"), main = expression("Total Coal Combustion" ~ PM[2.5] ~ "Emissions by Year"), lwd=5)
scc_id2 <- as.character(SCC[grep("On-Road",SCC$EI.Sector),1])
Baltimore2 <- subset(NEI, NEI$SCC %in% scc_id2 & NEI$fips == "24510")
bal_by_year2 <- tapply(Baltimore2$Emissions, Baltimore2$year, sum)
options(scipen=999)
plot(names(bal_by_year2),bal_by_year2, type = "l", xlab = "Year", ylab =expression(PM[2.5] ~ "Emissions (Tons)"), main = expression("Total Baltimore" ~ PM[2.5] ~ " Auto Emissions by Year"), lwd=5)
scc_id3 <- as.character(SCC[grep("On-Road",SCC$EI.Sector),1])
BaltLA <- subset(NEI, NEI$SCC %in% scc_id3 & NEI$fips %in% c("24510","06037"))
tot_by_year3 <- aggregate(Emissions ~ fips+year,BaltLA,sum)
tot_by_year3$fips <- ifelse(tot_by_year3$fips == "06037", "Los Angeles","Baltimore")
names(tot_by_year3) <- c("City","Year","Emissions")
options(scipen=999)
g <- qplot(Year,Emissions/1000, data = tot_by_year3, color = City)+ geom_line(lwd=1.5) + ggtitle(expression("Baltimore and LA" ~ PM[2.5] ~ "Emissions by Year")) +
xlab("Year") + ylab(expression(PM[2.5] ~ "Emissions (Per 1K Tons)")) + theme(plot.title = element_text(hjust = .5))
print(g)
Los angeles has much higher emissions than Baltimore City and also shows a slight increasing trend. It would be interesting to see more recent data to identify additional trends.
Almost all of our results indicate emissions have decreased during the period 1999-2008. With more recent data, investigations could be done to determine the problem areas, but that is outside the scop of this analysis.