Air Pollution Project Presentation

Yadu
05/01/2016

Objective

  • In the State of New York, for each respiratory disease, identify which pollutant had a stronger link to it in the year 2015.

Proposed Methodologies

  • Average the levels of each pollutant separately according to the county.
  • Render scatterplots displaying correlation of pollutant concentration with the percentage of the population with a particular respiratory disease.
  • Obtain the highest concentration of the 7 pollutants for each county.
  • Check the conditions so that ANOVA (analysis of variances) can be conducted relating the average percentage of each disease with the maximum average concentration of the pollutant.
  • Conduct ANOVA 4 times (once for percentage of kids with asthma, once for percentage of adults with asthma, once for percentage of the population with COPD, and once for the percentage of population with CV Disease) to examine the differences in means between each group of counties.

Correlation Plots of Percentage Diseased vs. Pollutant Concentration (Parts Per Million)

  • Negative correlation in most of the plots. plot of chunk unnamed-chunk-1

Maximum Pollutant Concentration in Each County

  • From the results, we can deduce that the atmosphere in most of the counties are highly concentrated in either carbon monoxide, ozone, or fine particulate matter.
  • Therefore, ANOVA hypothesis testing can be done utilizing 3 parameters - maximum concentration of carbon monoxide, maximum concentration of ozone, and maximum concentration of fine particulate matter.
Pollutant Number of Counties
Carbon monoxide 7
Ozone 19
Particulate matter 3

Boxplots for Groups of Population Affected by Air Pollutants

  • Variability does appear to be approximately constant across groups despite the fact that there are a few visible prominent outliers.
  • Normal distribution across each group because the mean and median for each group are almost equal.

Boxplots for Groups of Population Affected by Air Pollutants

plot of chunk unnamed-chunk-3

ANOVA Hypothesis Testing Results

  • From all of the ANOVA tests we accept the null hypotheses that the average percentage of kids with asthma, adults with asthma, population with COPD, and population with CV disease is the same for each group of counties.
  • Therefore we can conclude that not one of these air pollutants has a stronger impact on respiratory diseases. They all have the same impact.
Disease p-value
Asthma (Kids) 0.7897398
Asthma (Adults) 0.6184384
COPD 0.5578648
CV Disease 0.5404911

Challenge

  • Different data lengths owing to the fact that for some of the counties, data was displayed for only 6 or less of each of the pollutants.
Pollutant Number of Counties
Carbon monoxide 7
Nitric oxide (NO) 7
Nitrogen dioxide (NO2) 5
Oxides of nitrogen (NOx) 5
Ozone 26
Particulate matter 18
Sulfur dioxide 16

Extensions Beyond The Scope of the Project

  • Data collection for all 7 major pollutants in each county. This way, we can accurately predict which pollutant has a stronger impact on disease because it is possible that the group means that we would obtain would be significantly different.
  • Correlation with the proportion of people who smoke and also the number of vehicles used in the area. These two variables also impact the amount of pollution in the atmosphere. Therefore these two variables can be used to predict the concentration of each pollutant in the atmosphere which can in turn be used to predict the percentage of population with respiratory diseases.
  • Examine why the percentage of population with COPD and the percentage of population with CV disease varies inversely with air pollutant concentration.