Analysis of public health and economic impact of severe weather events in U.S.A

By Sentil Pillai

August 2014

Synopsis

This analysis explores the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database, which tracks characteristics of extreme weather events in the United States in the period 1950 to 2011. It includes when and where they occur, as well as estimates of any fatalities, injuries, and property and crop damage.

The top three types of extreme weather events that are most harmful with respect to population health (fatalities and injuries) are tornado, heat and thunderstorm.

The top three types of extreme weather events that have the greatest economic (property and crop damage) consequences are flood, hurricane and tornado.

Data Processing

Initialize, load libraries, session information

library(tcltk)
library(scales)
library(ggplot2)
library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
## Loading required package: DBI
## Loading required package: RSQLite.extfuns
sessionInfo()
## R version 3.0.3 (2014-03-06)
## Platform: x86_64-apple-darwin10.8.0 (64-bit)
## 
## locale:
## [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
## 
## attached base packages:
## [1] tcltk     stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] sqldf_0.4-7.1         RSQLite.extfuns_0.0.1 RSQLite_0.11.4       
## [4] DBI_0.2-7             gsubfn_0.6-5          proto_0.3-10         
## [7] ggplot2_1.0.0         scales_0.2.4          knitr_1.6            
## 
## loaded via a namespace (and not attached):
##  [1] chron_2.3-45     colorspace_1.2-4 digest_0.6.4     evaluate_0.5.5  
##  [5] formatR_0.10     grid_3.0.3       gtable_0.1.2     MASS_7.3-33     
##  [9] munsell_0.4.2    plyr_1.8.1       Rcpp_0.11.1      reshape2_1.4    
## [13] stringr_0.6.2    tools_3.0.3

Download the data file and the load the data

## Generated temporary file name and location
tmpFile <- tempfile(fileext = ".bz2")

## source data URL
fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"

## download the data set and to the temporary file 
download.file(fileURL,  destfile = tmpFile, method = "curl")

## unzip the temporary file, read the file
rawData <- read.csv(bzfile(tmpFile),header=TRUE)

dim(rawData)
## [1] 902297     37

The raw NOAA dataset shows a large number of observations ( ~900 thousand rows with 37 columns )

Subset the data required for the analysis

For the analysis only the seven columns (event type, injuries, fatalities, property damage, crop damage and exponents) and rows with value greater than zero are needed.

## subset the columns and the data rows
stormData <- sqldf('select EVTYPE, FATALITIES, INJURIES,
                           PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP
                      from rawData
                     where FATALITIES > 0 or INJURIES >0 
                           or CROPDMG > 0 or PROPDMG > 0')

str(stormData)
## 'data.frame':    254633 obs. of  7 variables:
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...

Transform the event type data for analysis

There are 985 distinct weather event types in the raw dataset. There are data entry mistakes and several variations in terminology. The following cleans the EVTYPE data and consolidates / categories / groups them into 20 new events types.

eventData <- sqldf('select EVTYPE, EVTYPE as newEventType, EventCount  
                      from (select EVTYPE, count(*) as EventCount
                              from stormData
                          group by EVTYPE)')

## remove all non alphabaltic characters
eventData <- sqldf(c('update eventData 
                     set newEventType = UPPER(TRIM(
                     REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
                     REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
                     REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
                     REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
                     REPLACE(REPLACE(REPLACE( newEventType
                     ,"0","")  ,"1","")  ,"2","")   ,"3","")   ,"4","") 
                     ,"5","")  ,"6","")  ,"7","")   ,"8","")   ,"9","") 
                     ,"(","")  ,")","")  ,"."," ")  ,"/"," ")  ,"&"," ")
                     ,"\"," ") ,"-"," ") ,"/"," ")  ,","," ")  ,"?"," ")
                     ,":"," ") ,","," ") ,"  "," ") )) '
                    , 'select * from main.eventData' ))

## create new event type TORNADO
eventData <- sqldf(c('update eventData set newEventType = "TORNADO" 
                       where newEventType like "%TORNADO%"'
                  , 'select * from main.eventData' ))
## create new event type WATERSPOUT 
eventData <- sqldf(c('update eventData set newEventType = "WATERSPOUT" 
                       where newEventType like "%SPOUT%"'
                  , 'select * from main.eventData' ))
## create new event type FLOOD
eventData <- sqldf(c('update eventData set newEventType = "FLOOD"
                       where newEventType like "%FLOOD%"
                         or  newEventType like "%FLD%"'
                  , 'select * from main.eventData' ))
## create new event type THUNDERSTORM
eventData <- sqldf(c('update eventData set newEventType = "THUNDERSTORM" 
                       where newEventType like "%THUNDERSTORM%"
                          or newEventType like "%TSTM%"
                          or newEventType like "%TROPICAL DEPRESSION%"
                          or newEventType like "%GUSTNADO%"'                     
                  , 'select * from main.eventData' ))
## create new event type HURRICANE
eventData <- sqldf(c('update eventData set newEventType = "HURRICANE" 
                       where newEventType like "%HURRICANE%"
                          or newEventType like "%TYPHOON%"'
                  , 'select * from main.eventData' ))
## create new event type HAIL
eventData <- sqldf(c('update eventData set newEventType = "HAIL" 
                       where newEventType like "%HAIL%"'
                  , 'select * from main.eventData' ))
## create new event type WIND
eventData <- sqldf(c('update eventData set newEventType = "WIND" 
                       where newEventType like "%WIND%"
                          or newEventType like "%DUST%"'
                  , 'select * from main.eventData' ))
## create new event type FIRE
eventData <- sqldf(c('update eventData set newEventType = "FIRE" 
                       where newEventType like "%FIRE%"'
                  , 'select * from main.eventData' ))
## create new event type RAIN
eventData <- sqldf(c('update eventData set newEventType = "RAIN" 
                       where newEventType like "%RAIN%"'
                  , 'select * from main.eventData' ))
## create new event type HEAT
eventData <- sqldf(c('update eventData set newEventType = "HEAT" 
                       where newEventType like "%HEAT%"
                          or newEventType like "%HOT%"
                          or newEventType like "%WARM%"'
                  , 'select * from main.eventData' ))
## create new event type WINTER
eventData <- sqldf(c('update eventData set newEventType = "WINTER"
                       where newEventType like "%COLD%" 
                          or newEventType like "%FREEZ%" 
                          or newEventType like "%WINT%"
                          or newEventType like "%FROST%"
                          or newEventType like "%GLAZE%"
                          or newEventType like "%SNOW%"
                          or newEventType like "%SLEET%"
                          or newEventType like "%ICY%"
                          or newEventType like "%ICE%"
                          or newEventType like "%BLIZZARD%"'
                  , 'select * from main.eventData' ))
## create new event type STORM
eventData <- sqldf(c('update eventData set newEventType = "STORM" 
                       where newEventType like "%STORM%" 
                         and newEventType != "THUNDERSTORM"'
                  , 'select * from main.eventData' ))
## create new event type COASTAL
eventData <- sqldf(c('update eventData set newEventType = "COASTAL"
                       where newEventType like "%CURRENT%"
                          or newEventType like "%SURF%"
                          or newEventType like "%COASTAL%"
                          or newEventType like "%SEICHE%"
                          or newEventType like "% SEA%"
                          or newEventType like "%SWELL%"
                          or newEventType like "%TIDE%"
                          or newEventType like "%WAVE%"
                          or newEventType like "%BEACH%"
                          or newEventType like "%TSUNAMI%"'
                  , 'select * from main.eventData' ))
## create new event type LANDSLIDE
eventData <- sqldf(c('update eventData set newEventType = "LANDSLIDE" 
                       where newEventType like "%MUD%" 
                          or newEventType like "%LAND%"'
                  , 'select * from main.eventData' ))
## create new event type FOG
eventData <- sqldf(c('update eventData set newEventType = "FOG" 
                       where newEventType like "%FOG%"'
                  , 'select * from main.eventData' ))
## create new event type FUNNEL CLOUD
eventData <- sqldf(c('update eventData set newEventType = "FUNNEL CLOUD" 
                       where newEventType like "%FUNNEL%"'
                  , 'select * from main.eventData' ))
## create new event type LIGHTNING
eventData <- sqldf(c('update eventData set newEventType = "LIGHTNING" 
                       where newEventType like "%LIGHTNING%"'
                  , 'select * from main.eventData' ))
## create new event type AVALANCHE
eventData <- sqldf(c('update eventData set newEventType = "AVALANCHE" 
                       where newEventType like "%AVALAN%"'
                  , 'select * from main.eventData' ))
## create new event type DROUGHT
eventData <- sqldf(c('update eventData set newEventType = "DROUGHT" 
                       where newEventType like "%DROUGHT%" 
                          or newEventType like "%DRY%"
                          or newEventType like "%DUST%"'
                  , 'select * from main.eventData' ))
## create new event type WATER
eventData <- sqldf(c('update eventData set newEventType = "WATER" 
                       where (newEventType like "%WATER%"
                          or  newEventType like "%WET%"
                          or  newEventType like "%PRECIP%")
                          and newEventType != "WATERSPOUT"'
                  , 'select * from main.eventData' ))
## create new event type OTHER, include all un categorized events
eventData <- sqldf(c('update eventData set newEventType = "OTHER"
                       where newEventType not in (
                              "TORNADO", "WATERSPOUT", "FLOOD"
                             ,"THUNDERSTORM","HURRICANE", "HAIL"
                             ,"WIND", "FIRE", "RAIN", "HEAT" 
                             ,"WINTER", "STORM","COASTAL", "LANDSLIDE"
                             ,"FOG",  "FUNNEL CLOUD", "LIGHTNING"
                             , "AVALANCHE", "DROUGHT", "WATER" )'
                  , 'select * from main.eventData' ))

## Examine the new event types
sqldf('select newEventType, count(EVTYPE) as countOfEVTYPE
      from eventData
      group by newEventType')
##    newEventType countOfEVTYPE
## 1     AVALANCHE             2
## 2       COASTAL            26
## 3       DROUGHT             2
## 4          FIRE             9
## 5         FLOOD            64
## 6           FOG             2
## 7  FUNNEL CLOUD             1
## 8          HAIL            19
## 9          HEAT            13
## 10    HURRICANE            11
## 11    LANDSLIDE             9
## 12    LIGHTNING             4
## 13        OTHER            29
## 14         RAIN            26
## 15        STORM            10
## 16 THUNDERSTORM            71
## 17      TORNADO            12
## 18        WATER             9
## 19   WATERSPOUT             4
## 20         WIND            68
## 21       WINTER            97

Transform the data for analysis of impact on human health.

Sum the number of human fatalities and injuries by grouping it by the new event type.

causalityData <- sqldf('select e.newEventType, 
                              sum(s.FATALITIES) TotalFATALITIES,
                              sum(s.INJURIES) TotalINJURIES,
                              sum(s.FATALITIES + s.INJURIES) totalCausality
                         from stormData s, eventData e
                        where (FATALITIES > 0 or INJURIES >0) 
                          and e.EVTYPE = s.EVTYPE
                     group by e.newEventType')

dim(causalityData)
## [1] 21  4
## Order by the total causality and subset the data greater than one thousand persons. 
causalityDataResult <- sqldf('select newEventType, 
                                  totalCausality 
                             from causalityData
                            where totalCausality > 999
                         order by totalCausality Desc')

## create factors for ploting newEventType
causalityDataResult$newEventType <- factor(causalityDataResult$newEventType, levels=causalityDataResult$newEventType)

dim(causalityDataResult)
## [1] 12  2

Transform the data for analysis of economic cost calculation.

The crop and property damage dollar amount have a exponent component. The exponent is de-coded and multiplied to the respective amount.

damageDollarData  <- sqldf('select CROPDMG, 
                                   CROPDMGEXP, 
                                   case upper(CROPDMGEXP) 
                                        when "B" then (CROPDMG * 1000000000 )
                                        when "M" then (CROPDMG * 1000000 )
                                        when "K" then (CROPDMG * 1000 )
                                        when "H" then (CROPDMG * 100 ) 
                                        when "2" then (CROPDMG * 100 ) 
                                        when "3" then (CROPDMG * 1000 )
                                        when "4" then (CROPDMG * 10000 )
                                        when "5" then (CROPDMG * 100000 )
                                        when "6" then (CROPDMG * 1000000 ) 
                                        when "7" then (CROPDMG * 10000000 ) 
                                                 else (CROPDMG * 1) 
                                   end as CROPtotal,
                                   PROPDMG,
                                   PROPDMGEXP,
                                   case upper(PROPDMGEXP) 
                                        when "B" then (PROPDMG * 1000000000 )
                                        when "M" then (PROPDMG * 1000000 )
                                        when "K" then (PROPDMG * 1000 )
                                        when "H" then (PROPDMG * 100 ) 
                                        when "2" then (PROPDMG * 100 ) 
                                        when "3" then (PROPDMG * 1000 )
                                        when "4" then (PROPDMG * 10000 )
                                        when "5" then (PROPDMG * 100000 )
                                        when "6" then (PROPDMG * 1000000 ) 
                                        when "7" then (PROPDMG * 10000000 ) 
                                                 else (PROPDMG * 1) 
                                   end as PROPtotal,
                                   EVTYPE
                              from stormData 
                             where CROPDMG > 0 or PROPDMG > 0')
dim(damageDollarData)
## [1] 245031      7

Sum the crop damage and property damage by grouping it by the new event type.

damageData   <- sqldf('select e.newEventType, 
                              sum(d.CROPtotal) TotalCrop,
                              sum(d.PROPtotal) TotalProperty,
                              sum(d.CROPtotal + d.PROPtotal) TotalDollar
                         from damageDollarData d, eventData e
                        where e.EVTYPE = d.EVTYPE
                     group by e.newEventType')
dim(damageData)
## [1] 21  4
## Order the economic damage data and subset it over one billion dollars. 
damageDataResult <- sqldf('select newEventType, 
                                (TotalDollar /  1000000000) as inBillionDollar
                           from damageData                            
                          where TotalDollar > 999999999
                       order by inBillionDollar Desc')

## create factors for ploting newEventType
damageDataResult$newEventType <- factor(damageDataResult$newEventType, levels = damageDataResult$newEventType)

dim(damageDataResult)
## [1] 11  2

Results

The top most extreme weather events which are most harmful to human health.

sqldf('select newEventType, totalCausality from causalityDataResult order by totalCausality desc')
##    newEventType totalCausality
## 1       TORNADO          97068
## 2          HEAT          12421
## 3  THUNDERSTORM          10273
## 4         FLOOD          10236
## 5        WINTER           7445
## 6     LIGHTNING           6048
## 7          WIND           3197
## 8       COASTAL           1705
## 9          FIRE           1698
## 10    HURRICANE           1468
## 11         HAIL           1386
## 12          FOG           1156

The above list displays the top most extreme weather events; accounting for more than 1,000 recorded fatalities and injuries to human population USA in the period of 1950 to 2011 recorded by NOAA. The following plots the data.

require(scales)

ggplot(causalityDataResult, aes(newEventType, totalCausality)) +
      labs(x="Weather event", y="Number of persons health effected"
           , title="Most harmfull weather events to US population during (1950-2011)") +
      theme(axis.text.x = element_text( angle = 30, hjust = 1)) +
      geom_bar(stat = "identity" , aes(fill = newEventType))  +
      scale_y_continuous(labels = comma )

plot of chunk causalityDataResult

The top most extreme weather events which have the greatest economic (property and crop damage) consequences.

sqldf('select newEventType, inBillionDollar  from damageDataResult order by inBillionDollar desc')
##    newEventType inBillionDollar
## 1         FLOOD         180.659
## 2     HURRICANE          90.873
## 3       TORNADO          59.021
## 4         STORM          56.375
## 5        WINTER          21.216
## 6          HAIL          19.024
## 7       DROUGHT          15.025
## 8  THUNDERSTORM          12.458
## 9          FIRE           8.905
## 10         WIND           7.058
## 11         RAIN           4.053

The above list displays the top most extreme weather events; which have the greatest economic (property and crop damage) cost above a billion dollars to USA in the period of 1950 to 2011 recorded by NOAA. The dollar amount is not adjusted for inflation over the 60 year period. The economic cost does not account other factors. The following plots the data.

require(scales)

ggplot(damageDataResult, aes(newEventType, inBillionDollar)) +
     labs(x="Weather event", y="Economic cost (crop and property) in billion dollars"
          , title="Most costly weather events to US ecomony during (1950-2011)") +
     theme(axis.text.x = element_text( angle = 30, hjust = 1)) +
     geom_bar(aes(fill = newEventType), stat = "identity" ) +
     scale_y_continuous(labels = dollar )

plot of chunk damageDataResult