This analysis seeks to investigate the results of efforts by the EPA to reduce emissions of fine particulates. Numerous studies have linked exposure to these particulates resulting in adverse health consequences. This analysis narrows it’s examination to emissions in the State of Texas.
“EPA issued the fine particle standards in 1997 after evaluating hundreds of health studies and conducting an extensive peer review process. The 1997 annual standard was established as a level of 15 micrograms per cubic meter (µg/m3), based on the 3-year average of annual mean PM2.5 concentrations. The 1997 24-hour standard was established as a level of 65 µg/m3, determined by the 3-year average of the annual 98th percentile concentrations. On September 21, 2006, EPA strengthened the 24-hour fine particle standard from the 1997 level of 65 µg/m3 to 35µg/m3, and retained the annual fine particle standard at 15µg/m3.”1
The EPA AirData website gives you access to air quality data collected at outdoor monitors across the United States, Puerto Rico, and the U. S. Virgin Islands.
The data comes primarily from the AQS (Air Quality System) database.
This particular analysis will look at PM2.5 FRM Mass (88101) data for the period when colection began in 1999, up until 2014. Complete data for 2014 obviously will not be available until 2015.
“Health studies have shown a significant association between exposure to fine particles and premature death from heart or lung disease. Fine particles can aggravate heart and lung diseases and have been linked to effects such as: cardiovascular symptoms; cardiac arrhythmias; heart attacks; respiratory symptoms; asthma attacks; and bronchitis. These effects can result in increased hospital admissions, emergency room visits, absences from school or work, and restricted activity days. Individuals that may be particularly sensitive to fine particle exposure include people with heart or lung disease, older adults, and children.”2
Data processing steps, including:
# library('tidyr')
library('dplyr')
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library('ggplot2')
# library(sqldf)
dfile <- vector()
durl <- vector()
# range of years to analyze
airYears <- seq(from = 1999, to = 2014, by = 1)
for (i in seq_along(airYears) ) {
dfile[i] <- paste('daily_88101_', airYears[i], '.zip', sep ='')
durl[i] <- paste('http://aqsdr1.epa.gov/aqsweb/aqstmp/airdata/', dfile[i], sep = '')
destf <- paste('data/', dfile[i], sep = '')
# check to see if file exists prior to download
if (!file.exists(destf) ) {
download.file(durl[i], destf)
cat('downloaded: ') ; cat(dfile[i]) ; cat(' from: ') ; print(durl[i])
}
}
# list of all daily_88101 files
daily <- list.files('data/', pattern = '^daily_88101.*zip$', full.names = TRUE)
pm <- data.frame()
# load data
for (i in seq_along(daily)) {
zipf <- unzip(daily[i], list=TRUE)[1]
# read from the compressed data file
pm <- read.table(unz(daily[i], zipf), header = TRUE, sep = ',')
# State.Code 48 is Texas
if (i == 1) {
# pm-all <- pm[i]
pm0 <- pm[pm$State.Code == 48, ]
} else {
# pm-all <- rbind(pm-all, pm[i])
pm0 <- rbind(pm0, pm[pm$State.Code == 48, ])
}
}
dim(pm0)
## [1] 79626 28
str(pm0)
## 'data.frame': 79626 obs. of 28 variables:
## $ State.Code : chr "48" "48" "48" "48" ...
## $ County.Code : int 29 29 29 29 29 29 29 29 29 29 ...
## $ Site.Num : int 34 34 34 34 34 34 34 34 34 34 ...
## $ Parameter.Code : int 88101 88101 88101 88101 88101 88101 88101 88101 88101 88101 ...
## $ POC : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Latitude : num 29.4 29.4 29.4 29.4 29.4 ...
## $ Longitude : num -98.5 -98.5 -98.5 -98.5 -98.5 ...
## $ Datum : Factor w/ 4 levels "NAD27","NAD83",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Parameter.Name : Factor w/ 1 level "PM2.5 - Local Conditions": 1 1 1 1 1 1 1 1 1 1 ...
## $ Sample.Duration : Factor w/ 3 levels "24 HOUR","1 HOUR",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Pollutant.Standard : Factor w/ 2 levels "PM25 24-hour 2006",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Date.Local : Factor w/ 5630 levels "1999-01-01","1999-01-02",..: 91 92 94 95 98 99 110 111 112 113 ...
## $ Units.of.Measure : Factor w/ 1 level "Micrograms/cubic meter (LC)": 1 1 1 1 1 1 1 1 1 1 ...
## $ Event.Type : Factor w/ 3 levels "Included","None",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ Observation.Count : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Observation.Percent: num 100 100 100 100 100 100 100 100 100 100 ...
## $ Arithmetic.Mean : num 17 17.7 15.5 8.5 17.3 16.9 9.4 11 18.3 22.8 ...
## $ X1st.Max.Value : num 17 17.7 15.5 8.5 17.3 16.9 9.4 11 18.3 22.8 ...
## $ X1st.Max.Hour : int 0 0 0 0 0 0 0 0 0 0 ...
## $ AQI : int 61 63 58 35 62 61 39 46 64 74 ...
## $ Method.Name : Factor w/ 22 levels "Andersen RAAS2.5-100 PM2.5 SAM w/WINS - GRAVIMETRIC",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ Local.Site.Name : Factor w/ 1524 levels ""," Davidsonville",..: 489 489 489 489 489 489 489 489 489 489 ...
## $ Address : Factor w/ 1826 levels "'G' STREET AND KANSAS AVE.",..: 526 526 526 526 526 526 526 526 526 526 ...
## $ State.Name : Factor w/ 54 levels "Alabama","Alaska",..: 45 45 45 45 45 45 45 45 45 45 ...
## $ County.Name : Factor w/ 684 levels "Ada","Adams",..: 34 34 34 34 34 34 34 34 34 34 ...
## $ City.Name : Factor w/ 969 levels "Akron","Albany",..: 520 520 520 520 520 520 520 520 520 520 ...
## $ CBSA.Name : Factor w/ 492 levels "","Aguadilla-Isabela-San Sebastian, PR",..: 310 310 310 310 310 310 310 310 310 310 ...
## $ Date.of.Last.Change: Factor w/ 51 levels "2013-06-29","2013-06-30",..: 1 1 1 1 1 1 1 1 1 1 ...
summary(pm0)
## State.Code County.Code Site.Num Parameter.Code
## Length:79626 Min. : 29 Min. : 1 Min. :88101
## Class :character 1st Qu.:113 1st Qu.: 32 1st Qu.:88101
## Mode :character Median :201 Median : 57 Median :88101
## Mean :224 Mean : 453 Mean :88101
## 3rd Qu.:355 3rd Qu.:1002 3rd Qu.:88101
## Max. :479 Max. :3010 Max. :88101
##
## POC Latitude Longitude Datum
## Min. :1.00 Min. :26.1 Min. :-106.6 NAD27 : 2116
## 1st Qu.:1.00 1st Qu.:29.7 1st Qu.: -97.9 NAD83 :66033
## Median :1.00 Median :31.8 Median : -96.9 UNKNOWN: 1218
## Mean :1.92 Mean :31.1 Mean : -97.9 WGS84 :10259
## 3rd Qu.:2.00 3rd Qu.:32.8 3rd Qu.: -95.3
## Max. :8.00 Max. :35.2 Max. : -93.8
##
## Parameter.Name Sample.Duration
## PM2.5 - Local Conditions:79626 24 HOUR :73915
## 1 HOUR : 3055
## 24-HR BLK AVG: 2656
##
##
##
##
## Pollutant.Standard Date.Local
## PM25 24-hour 2006:76571 2004-07-14: 74
## : 3055 2004-06-20: 72
## 2002-06-25: 70
## 2002-05-14: 69
## 2002-05-02: 68
## 2002-10-05: 68
## (Other) :79205
## Units.of.Measure Event.Type Observation.Count
## Micrograms/cubic meter (LC):79626 Included: 867 Min. : 1.00
## None :78759 1st Qu.: 1.00
## Excluded: 0 Median : 1.00
## Mean : 1.78
## 3rd Qu.: 1.00
## Max. :24.00
##
## Observation.Percent Arithmetic.Mean X1st.Max.Value X1st.Max.Hour
## Min. : 4.0 Min. : -0.4 Min. : 0.0 Min. : 0.000
## 1st Qu.:100.0 1st Qu.: 7.1 1st Qu.: 7.2 1st Qu.: 0.000
## Median :100.0 Median : 10.0 Median : 10.2 Median : 0.000
## Mean : 99.6 Mean : 11.2 Mean : 11.8 Mean : 0.433
## 3rd Qu.:100.0 3rd Qu.: 14.0 3rd Qu.: 14.5 3rd Qu.: 0.000
## Max. :100.0 Max. :131.0 Max. :716.0 Max. :23.000
##
## AQI
## Min. : 0.0
## 1st Qu.: 30.0
## Median : 42.0
## Mean : 43.1
## 3rd Qu.: 55.0
## Max. :190.0
## NA's :3055
## Method.Name
## R & P Model 2025 PM2.5 Sequential w/WINS - GRAVIMETRIC :42307
## R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC - Gravimetric:31608
## Met One BAM-1020 Mass Monitor w/VSCC - Beta Attenuation : 3055
## - : 2656
## Andersen RAAS2.5-100 PM2.5 SAM w/WINS - GRAVIMETRIC : 0
## Andersen RAAS2.5-300 PM2.5 SEQ w/WINS - GRAVIMETRIC : 0
## (Other) : 0
## Local.Site.Name Address
## DALLAS HINTON : 7882 1415 HINTON STREET : 7882
## Clinton : 5826 9525 1/2 Clinton Dr : 5826
## Fort Worth Northwest : 4219 3317 Ross Ave : 4219
## El Paso Chamizal : 4073 800 S San Marcial Street: 4073
## Haws Athletic Center : 3139 600 1/2 Congress St : 3139
## Corpus Christi Huisache: 2947 3810 Huisache Street : 2947
## (Other) :51540 (Other) :51540
## State.Name County.Name City.Name
## Texas :79626 Dallas :14844 Dallas :14844
## Alabama : 0 Harris :13235 Fort Worth :10664
## Alaska : 0 Tarrant:10814 El Paso : 9187
## Arizona : 0 El Paso: 9620 Not in a city : 8198
## Arkansas : 0 Nueces : 3998 Houston : 7811
## California: 0 Travis : 3379 Corpus Christi: 3998
## (Other) : 0 (Other):23736 (Other) :24924
## CBSA.Name Date.of.Last.Change
## Dallas-Fort Worth-Arlington, TX:26953 2013-06-29:64032
## Houston-Sugar Land-Baytown, TX :15705 2014-06-07:15594
## El Paso, TX : 9620 2013-06-30: 0
## Beaumont-Port Arthur, TX : 5374 2013-07-10: 0
## Corpus Christi, TX : 3998 2013-07-11: 0
## Austin-Round Rock, TX : 3782 2013-07-15: 0
## (Other) :14194 (Other) : 0
names(pm0)
## [1] "State.Code" "County.Code" "Site.Num"
## [4] "Parameter.Code" "POC" "Latitude"
## [7] "Longitude" "Datum" "Parameter.Name"
## [10] "Sample.Duration" "Pollutant.Standard" "Date.Local"
## [13] "Units.of.Measure" "Event.Type" "Observation.Count"
## [16] "Observation.Percent" "Arithmetic.Mean" "X1st.Max.Value"
## [19] "X1st.Max.Hour" "AQI" "Method.Name"
## [22] "Local.Site.Name" "Address" "State.Name"
## [25] "County.Name" "City.Name" "CBSA.Name"
## [28] "Date.of.Last.Change"
save(pm0, file = 'pm0.Rdata')
grep('Date', names(pm0), value = T)
## [1] "Date.Local" "Date.of.Last.Change"
pm0$Date.Local <- as.Date(as.character(pm0$Date.Local), '%Y-%m-%d')
summary(pm0$Date.Local)
## Min. 1st Qu. Median Mean 3rd Qu.
## "1999-01-01" "2001-11-18" "2004-02-02" "2005-04-24" "2008-05-01"
## Max.
## "2014-03-31"
pm0$Date.of.Last.Change <- as.Date(as.character(pm0$Date.of.Last.Change), '%Y-%m-%d')
summary(pm0$Date.of.Last.Change)
## Min. 1st Qu. Median Mean 3rd Qu.
## "2013-06-29" "2013-06-29" "2013-06-29" "2013-09-04" "2013-06-29"
## Max.
## "2014-06-07"
save(pm0, file = 'pm0.Rdata')
td0 <- pm0 %>%
mutate(
Station = paste(County.Code, Site.Num),
Date = as.Date(as.character(Date.Local), '%Y-%m-%d'),
Year = factor(format(Date, '%Y')),
PM25 = Arithmetic.Mean,
AQI = AQI,
) %>%
select(
Year, Date, Station, PM25, AQI
)
summary(td0)
## Year Date Station PM25
## 2002 :10896 Min. :1999-01-01 Length:79626 Min. : -0.4
## 2001 :10481 1st Qu.:2001-11-18 Class :character 1st Qu.: 7.1
## 2000 : 8541 Median :2004-02-02 Mode :character Median : 10.0
## 2003 : 7145 Mean :2005-04-24 Mean : 11.2
## 2004 : 6175 3rd Qu.:2008-05-01 3rd Qu.: 14.0
## 2005 : 5226 Max. :2014-03-31 Max. :131.0
## (Other):31162
## AQI
## Min. : 0.0
## 1st Qu.: 30.0
## Median : 42.0
## Mean : 43.1
## 3rd Qu.: 55.0
## Max. :190.0
## NA's :3055
Values of interest:
Air Quality Index (AQI)
a numerical index used for reporting severity of air pollution levels to the public. The AQI incorporates five criteria pollutants – ozone, particulate matter, carbon monoxide, sulfur dioxide and nitrogen dioxide – into a single index. AQI levels range from 0 (Good air quality) to 500 (Hazardous air quality). The higher the index, the higher the level of pollutants and the greater the likelihood of health effects. The AQI incorporates an additional index category – unhealthy for sensitive groups such as children and people with asthma – that ranges from 101 to 150. In addition, the AQI comes with detailed health cautions.
PM2.5 (Arithmetic.Mean)
a value measure of PM2.5 - Particulate Matter smaller than 2.5 micrometers. Arithmetic mean of 24-hour values, in micrograms per cubic meter. (Concentrations are rounded to the nearest 0.1 microgram for comparison with the 24-hour standard.) “Fine particle pollution or PM2.5 describes particulate matter that is 2.5 micrometers in diameter and smaller - 1/30th the diameter of a human hair.” 3
aiq <- td0$AQI # Air Quality index
mean(is.na(aiq), na.rm = TRUE) # percentage of missing values?
## [1] 0.03837
pm25 <- td0$PM25
summary(aiq) # summary air quality index
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 30.0 42.0 43.1 55.0 190.0 3055
summary(pm25) # summary PM2.5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.4 7.1 10.0 11.2 14.0 131.0
mean(pm25, na.rm = TRUE)
## [1] 11.22
Both the Mean 11.2224and the Median 10 of the PM2.5 measures are below the recommended value of 15µg/m3 for the period from 1999 to 2013.
# compare years 1999 to 2013 for changees in levels
td1999 <- td0[td0$Year == '1999', ]
td2013 <- td0[td0$Year == '2013', ]
summary(td1999)
## Year Date Station PM25
## 1999 :2203 Min. :1999-01-01 Length:2203 Min. : 2.5
## 2000 : 0 1st Qu.:1999-05-29 Class :character 1st Qu.: 7.4
## 2001 : 0 Median :1999-10-06 Mode :character Median :10.8
## 2002 : 0 Mean :1999-08-31 Mean :12.2
## 2003 : 0 3rd Qu.:1999-11-18 3rd Qu.:15.8
## 2004 : 0 Max. :1999-12-31 Max. :39.8
## (Other): 0
## AQI
## Min. : 10.0
## 1st Qu.: 31.0
## Median : 45.0
## Mean : 45.8
## 3rd Qu.: 59.0
## Max. :112.0
##
summary(td2013)
## Year Date Station PM25
## 2013 :4496 Min. :2013-01-01 Length:4496 Min. :-0.4
## 1999 : 0 1st Qu.:2013-04-03 Class :character 1st Qu.: 6.3
## 2000 : 0 Median :2013-07-01 Mode :character Median : 9.1
## 2001 : 0 Mean :2013-07-01 Mean :10.4
## 2002 : 0 3rd Qu.:2013-09-30 3rd Qu.:12.8
## 2003 : 0 Max. :2013-12-31 Max. :78.7
## (Other): 0
## AQI
## Min. : 2.0
## 1st Qu.: 26.0
## Median : 37.0
## Mean : 39.8
## 3rd Qu.: 52.0
## Max. :163.0
## NA's :1081
# WIP
# set labels
title <- expression(PM[2.5] * ' Readings for Texas from 1999 thru 2014')
ylabel <- expression(PM[2.5] * ' in Micrograms per ' * m^3)
xlabel <- 'Years'
# build plot
g <- ggplot(td0, aes(x = Date, y = PM25, fill = Year, color = Year) )
g <- g + ggtitle(title) + ylab(ylabel) + xlab(xlabel) + theme_bw()
g <- g + geom_boxplot(stat = 'boxplot', outlier.colour = NULL, position = 'dodge',
notch = TRUE, notchwidth = 0.5)
g <- g + geom_smooth(method = 'lm', se = FALSE, color = 'darkgreen', aes(group = 1) )
g <- g + geom_hline(aes(yintercept = 15), show_guide = T, color = 'red')
g <- g + geom_text(aes(x = td0$Date[1], y = 15, label = 'EPA Max 15µg/m3'), color = 'blue',
vjust = -0.5, hjust = 0.25, size = 3)
g
## Warning: position_dodge requires constant width: output may be incorrect
# set labels
title <- expression('AQI Readings for Texas from 1999 thru 2014')
ylabel <- expression('Air Quality Index')
xlabel <- 'Years'
# build plot
g <- ggplot(td0, aes(x = Date, y = AQI, fill = Year, color = Year) )
g <- g + ggtitle(title) + ylab(ylabel) + xlab(xlabel) + theme_bw()
g <- g + geom_boxplot(stat = 'boxplot', outlier.colour = NULL, position = 'dodge',
notch = TRUE, notchwidth = 0.5)
g <- g + geom_smooth(method = 'lm', se = FALSE, color = 'darkgreen', aes(group = 1) )
g
## Warning: Removed 3055 rows containing non-finite values (stat_boxplot).
## Warning: Removed 3055 rows containing missing values (stat_smooth).
## Warning: position_dodge requires constant width: output may be incorrect
It appears that the State of Texas, on average, is safely inside the EPA guidelines for PM 2.5 particulates. The data suggests that air quality in general is improving across the state.