Fine particulate matter (PM2.5) is an ambient air pollutant for which there is strong evidence that it is harmful to human health. In the United States, the Environmental Protection Agency (EPA) is tasked with setting national ambient air quality standards for fine PM and for tracking the emissions of this pollutant into the atmosphere. Approximatly every 3 years, the EPA releases its database on emissions of PM2.5. This database is known as the National Emissions Inventory (NEI). More about this data can be found at this link
NEI <- readRDS("summarySCC_PM25.rds")
SCC <- readRDS("Source_Classification_Code.rds")
library(ggplot2)
To get the total emissions for years 1999 to 2008, aggregate the total emissions based on years.
data_plot <- aggregate(Emissions ~ year,NEI, sum)
qplot(year,Emissions,data = data_plot,
geom = c("point","line"),
main = "Total Emission over the years")
The Total Emissions have decreased each year, which is good news!
# Subsetting data only for Baltimore City
subdat <- subset(NEI,NEI$fips == "24510")
# Aggregating the data based on years
data_plot <- aggregate(Emissions ~ year,subdat, sum)
# Plot the data
qplot(year,Emissions,data = data_plot,
geom = c("point","line"),
main = "Total Emission over the years in Baltimore City")
The total emissions in Baltimore City show a Varying pattern, in contrast to the overall graph. Interesting!
# Converting the type as factor
subdat$type <- as.factor(subdat$type)
# Aggregate data based on years
data_plot <- aggregate(Emissions ~ type+year,subdat, sum)
# Plot the data
qplot(year,Emissions,data = data_plot,
geom = c("point","path"),
col = type,
main = "Total Emission from different sources over the years in Baltimore City")
Here is another intersting pattern! The highest PM2.5 emission source is Non point and the lowest emitting source is On Road Vehicles.
# Taking SCC Ids for Coal Combustion related sources
scc_id <- as.character(SCC[grep("On-Road",SCC$EI.Sector),1])
# Subsetting the data only for Motor Vehicles & Baltimore city
subdat <- subset(NEI, NEI$SCC %in% scc_id & NEI$fips == "24510")
# Aggregate data based on years
data_plot <- aggregate(Emissions ~ year,subdat, sum)
# Plot the data
qplot(year,Emissions,data = data_plot,
geom = c("point", "path"),
main = "Emission From Motor Vehicle Sources")
Again, the emissions from motor vehicles have also decreased over the years.
# Taking SCC Ids for Coal Combustion related sources
scc_id <- as.character(SCC[grep("On-Road",SCC$EI.Sector),1])
# Subsetting the data only for Motor Vehicles & Baltimore city
subdat <- subset(NEI, NEI$SCC %in% scc_id & NEI$fips %in% c("24510","06037"))
# Aggregate data based on years and city
data_plot <- aggregate(Emissions ~ fips+year,subdat, sum)
data_plot$fips <- gsub("06037","Los Angeles County",data_plot$fips)
data_plot$fips <- gsub("24510","Baltimore City",data_plot$fips)
# Plot the data
qplot(year,Emissions,data = data_plot,
geom = c("point", "path"),
col = fips,
main = "Emissions from Baltomore City vs. Los Angeles")
Surprisingly, Baltimore has higher pollution levels than Los Angeles!