Introduction

Fine particulate matter (PM2.5) is an ambient air pollutant for which there is strong evidence that it is harmful to human health. In the United States, the Environmental Protection Agency (EPA) is tasked with setting national ambient air quality standards for fine PM and for tracking the emissions of this pollutant into the atmosphere. Approximatly every 3 years, the EPA releases its database on emissions of PM2.5. This database is known as the National Emissions Inventory (NEI). More about this data can be found at this link

Reading Data

NEI <- readRDS("summarySCC_PM25.rds")
SCC <- readRDS("Source_Classification_Code.rds")

Required libraries

library(ggplot2)

Have total emissions from PM2.5 decreased in the United States from 1999 to 2008?

To get the total emissions for years 1999 to 2008, aggregate the total emissions based on years.

data_plot <- aggregate(Emissions ~ year,NEI, sum)
qplot(year,Emissions,data = data_plot, 
      geom = c("point","line"),
      main = "Total Emission over the years")

The Total Emissions have decreased each year, which is good news!

Have total emissions from PM2.5 decreased in the Baltimore City from 1999 to 2008?

# Subsetting data only for Baltimore City
subdat <- subset(NEI,NEI$fips == "24510")

# Aggregating the data based on years
data_plot <- aggregate(Emissions ~ year,subdat, sum)

# Plot the data
qplot(year,Emissions,data = data_plot,
      geom = c("point","line"),
      main = "Total Emission over the years in Baltimore City")

The total emissions in Baltimore City show a Varying pattern, in contrast to the overall graph. Interesting!

How have PM2.5 emmissions varied for the different sources in Baltimore City from 1999 to 2008?

# Converting the type as factor
subdat$type <- as.factor(subdat$type)

# Aggregate data based on years
data_plot <- aggregate(Emissions ~ type+year,subdat, sum)

# Plot the data
qplot(year,Emissions,data = data_plot,
      geom = c("point","path"),
      col = type, 
      main = "Total Emission from different sources over the years in Baltimore City")

Here is another intersting pattern! The highest PM2.5 emission source is Non point and the lowest emitting source is On Road Vehicles.

How have emissions from motor vehicle sources changed from 1999-2008 in Baltimore City?

# Taking SCC Ids for Coal Combustion related sources
scc_id <- as.character(SCC[grep("On-Road",SCC$EI.Sector),1])

# Subsetting the data only for Motor Vehicles & Baltimore city
subdat <- subset(NEI, NEI$SCC %in% scc_id & NEI$fips == "24510")

# Aggregate data based on years
data_plot <- aggregate(Emissions ~ year,subdat, sum)

# Plot the data
qplot(year,Emissions,data = data_plot,
      geom = c("point", "path"),
      main = "Emission From Motor Vehicle Sources")

Again, the emissions from motor vehicles have also decreased over the years.

Comparing emissions from motor vehicle sources in Baltimore City with emissions from motor vehicle sources in Los Angeles County?

# Taking SCC Ids for Coal Combustion related sources
scc_id <- as.character(SCC[grep("On-Road",SCC$EI.Sector),1])

# Subsetting the data only for Motor Vehicles & Baltimore city
subdat <- subset(NEI, NEI$SCC %in% scc_id & NEI$fips %in% c("24510","06037"))

# Aggregate data based on years and city
data_plot <- aggregate(Emissions ~ fips+year,subdat, sum)
data_plot$fips <- gsub("06037","Los Angeles County",data_plot$fips)
data_plot$fips <- gsub("24510","Baltimore City",data_plot$fips)

# Plot the data
qplot(year,Emissions,data = data_plot,
      geom = c("point", "path"), 
      col = fips,
      main = "Emissions from Baltomore City vs. Los Angeles")

Surprisingly, Baltimore has higher pollution levels than Los Angeles!