A dashboard for analyzing US air pollutant emissions

E. Bertrand
27th June 2016

Project assignment for “Developing Data Products” course

Why this solution?

  • The US Environmental Protection Agency (EPA) publishes a yearly, cumulative file with the emissions of seven key air pollutants (carbon monoxide, ammonia, oxides of nitrogen, etc.), organized by State and Tier 1 source categories (highway vehicles, fuel comb.industrial, wildfires, etc.).

  • In order to analyze and compare basic trends of these pollutants in a visual way, an interactive dashboard has been developed, based on shiny technology:

    • It allows you to select dynamically the pollutant, the State, the range of years. Optionally, you can compare two different States.
    • It also allows you to split the data between the different sources for the purpose of understanding the reasons behind a trend.
  • But before the EPA file can be used in the dashboard, it must be cleaned and reorganized (see next slide).

Cleaning and tidying the original data

  • The original file has an untidy and complex structure: pollutants are in rows and years are in columns; too much details on source categories for a practical use in dashboards (Tier 1 has more than 15 levels); States names are abbreviated, etc. See a partial view of it (2 rows and 9 first colums):
    STATE_FIPS STATE_ABBR tier1_code tier1_description pollutant_code emissions90 emissions96 emissions97 emissions98
    01 AL 01 FUEL COMB. ELEC. UTIL. CO 6.869 8.069 8.047 8.122
    01 AL 05 METALS PROCESSING NH3 0.154 0.098 0.103 0.101
  • As a part of the shiny application, there is a R script -based on tidyr, dplyr and stringr packages- that solves these problems, producing a tidy dataset with a simpler structure of sources.See a partial view of it (2 rows):
    state year source CO NH3 NOX PM10 PM25 SO2 VOC
    Alabama 1996 1. Vehicles 2038.5439 5.3757 256.1197 11.4961 9.8120 13.3785 185.7999
    Alaska 2000 2. Fuel comb. 44.6603 0.0001 14.7646 5.1407 3.0814 5.5500 6.3981

Dashboard value: answering key questions

This plot can be produced by the dashboard. It shows the evolution of PM25 emissions in California since 1996.

Why did it suddenly grow in 2008-2010?



We can see the answer with one of the plots the dashboard can create.

After splitting emissions by source, it is clear that the wave of forest fires on those years explains the abrupt increase.

plot of chunk unnamed-chunk-4

plot of chunk unnamed-chunk-5

References and links [*]

[*] It has been reported that some external links from RPubs do not work properly (e.g. to github). You can bypass this problem clicking on the link with the right mouse-button, and choosing the “Open the link in a new window” option.