The PM2.5 is the tiny Particulate Matter in the air. It is an ambient air pollutant, and there is strong evidence that it is harmful to human health. In the U.S., the National Emissions Inventory database record the PM2.5 data in the United States every three years.
The overall goal of this assignment is to explore the data from 1999 to 2008 and answer questions. 1
- Have total emissions from PM2.5 decreased in the United States from 1999 to 2008?
Comments:
This plot is enough to answer the question. It is not efficient enough though to tell the story.
A good visualization work answers the question right to the target. By looking at the plot after optimization, we clearly see that this shows the change of pm2.5 emissions trends among the years.
Some critical changes from the original barplot (before optimized) to the after optimized plots:
Adding the trend line and arrow to show the downtrend. This is exactly what the question is asking for.
Put the years into the bars instead of the x-axis to make it easier to see the time trend.
Change the y-axis from scientific notation (i.e.“6e+06”) to numbers. Because on the one hand, it is more clear, and on the other hand, it is possible that some of the readers are not data professional.
Change the background color to lightblue. This is the color of the clear sky and may bring in the scene of air pollution of PM 2.5.
- Of the four types of sources indicated by the type (point, nonpoint, onroad, nonroad) variable, which of these four sources have seen decreases in emissions from 1999–2008? Which have seen increases in emissions from 1999–2008?
Comments:
By looking carefully at each group, we can answer the questions. However, it is not clear enough for a straight impression to answer the question.
There is also some extra information that is irrelevant to our question. For example, when we have the first look at the plot, we may get distracted by the big bars in NONPOINT group- while the question is not about which group contributes the most, it asks changes in each group.
The plots after optimization show the change in each group clearly. These plots answer the questions straight.
The most significant change from the original barplot (before optimized) to the after optimized plots is: change the plot from bar plots with all groups in, to four small plots with lines and points.
After doing this, we are able to see which group goes down and do not need to think about its proportion in the whole emission.
The data and questions are from a course(“Exploratory Data Analysis”) homework of “Data Science Specialization” in Coursera from Johns Hopkins University. The requirement of the project is to answer the questions by exploring the data set in whatever plots. After I have a better understanding of data science, I feel it is crucial for the plots to talk itself. Thus I improved the plots in ways to show better efficiency. Some of the optimize ideas are enlightened from the book “storytelling with data” by Cole Nussbaumer Knaflic.↩︎