Analysis of Severe Weather Event Health and Economic Impact in the United States

Synopsis

The goal of this report is to expose the results of the analysis of the NOAA Storm Database to answer the following questions about severe weather events:

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?

We found that events categories with the most impact for the population health, are not in general the same ones having the biggest economic consecuences.

Loading and Processing the Raw Data

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. The file can be downloaded from the course web site:

Storm Data [47Mb]

if( !file.exists("repdata-data-StormData.csv") ) {
    library(R.utils)
    if( !file.exists("repdata-data-StormData.csv.bz2") ) {
        download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                      "repdata-data-StormData.csv.bz2") 
        }
    bunzip2("repdata-data-StormData.csv.bz2")
}

rawdata <- read.csv("repdata-data-StormData.csv")

For the current analysis we need the following information:

Event type
Date and time of the event
Number of Fatalities
Number of Injuries
Property Damage Cost in US Dollars
Crop Damage Cost in US Dollars

getcost <- function(x,KMB) {
    factor_kmb <- rep( 1, length(x))
    factor_kmb[KMB=="K"] <- 1000
    factor_kmb[KMB=="M"] <- 1000000
    factor_kmb[KMB=="B"] <- 1000000000 
    x*factor_kmb
}

dates = sapply( rawdata$BGN_DATE, 
                function(x){strsplit(strsplit(as.character(x),"/")[[1]][3]," ")[[1]][1]} )

cleandata <- data.frame( type = rawdata$EVTYPE,
                         years = as.factor(dates),
                         time  = strptime( rawdata$BGN_DATE,"%m/%d/%Y %H:%M:%S" ),
                         fatalities = rawdata$FATALITIES,
                         injuries = rawdata$INJURIES, 
                         propcost = getcost(rawdata$PROPDMG, rawdata$PROPDMGEXP),
                         cropcost = getcost(rawdata$CROPDMG, rawdata$CROPDMGEXP) )

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete. This can be easily shown in the following graph:

hist(as.numeric(as.character(cleandata$years)), 
     col = "red",
     main = "Number of Weather Events per Year",
     xlab = "Number of Weather Events per Year" )

So, in order to have an anlysis independent of the recording effectiveness, we will focus in the last 15 complete years going from 1995 to 2010.

filter <- as.factor(1995:2010)
cleandata <- cleandata[cleandata$years %in% filter,]
sorttime <- order(cleandata$time, decreasing = TRUE)
cleandata <- cleandata[sorttime,]
hist(as.numeric(as.character(cleandata$years)), 
     col = "red",
     main = "Number of Weather Events per Year",
     xlab = "Number of Weather Events per Year")

Event Type Analysis and Processing

Looking at the following documentation:

National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ

We can see that the event types should comply to the following classification:

Astronomical Low Tide
Avalanche
Blizzard
Coastal Flood
Cold/Wind Chill
Debris Flow
Dense Fog
Dense Smoke
Drought
Dust Devil
Dust Storm
Excessive Heat
Extreme Cold/Wind Chill
Flash Flood
Flood
Frost/Freeze
Funnel Cloud
Freezing Fog
Hail
Heat
Heavy Rain
Heavy Snow
High Surf
High Wind
Hurricane (Typhoon)
Ice Storm
Lake-Effect Snow
Lakeshore Flood
Lightning
Marine Hail
Marine High Wind
Marine Strong Wind
Marine Thunderstorm Wind
Rip Current
Seiche
Sleet
Storm Surge/Tide
Strong Wind
Thunderstorm Wind
Tornado
Tropical Depression
Tropical Storm
Tsunami
Volcanic Ash
Waterspout
Wildfire
Winter Storm
Winter Weather

originaldbtypes <- length(levels(cleandata$type))

But on the other hand, the data extracted from the database shows that we have in reality 985 different event types in the dataset. In order to classify the raw data into the categories listed above, the following rules have been defined:

cleandata$type <- tolower(cleandata$type)
cleandata <- cleandata[- grep("summary", cleandata$type, perl=T),]
cleandata <- cleandata[- grep("monthly", cleandata$type, perl=T),]

# Astronomical Low Tide
cleandata$type[grep("low.*tide", cleandata$type, perl=T)]      <- "Astronomical Low Tide"
# Avalanche
cleandata$type[grep("avalan", cleandata$type, perl=T)]         <- "Avalanche"
# Blizzard
cleandata$type[grep("(blizzard|wintry mix)", 
                    cleandata$type, perl=T)]                   <- "Blizzard"
# Coastal Flood
cleandata$type[grep("(coastal|cstl|tidal|beach).*(flood|eros)",
                    cleandata$type, perl=T)]                   <- "Coastal Flood"
# Extreme Cold/Wind Chill
cleandata$type[grep("ex.*(cold|cool|chill)", 
                    cleandata$type, perl=T)]                   <- "Extreme Cold/Wind Chill"
# Cold/Wind Chill
cleandata$type[grep("(cold|cool|chill|hypothermia|low temp)", 
                    cleandata$type, perl=T)]                   <- "Cold/Wind Chill"
# Debris Flow
cleandata$type[grep("(debris|lands|mud|rock)", 
                    cleandata$type, perl=T)]                   <- "Debris Flow"
# Freezing Fog
cleandata$type[grep("freez.*fog", cleandata$type, perl=T)]     <- "Freezing Fog"
# Dense Fog
cleandata$type[grep("fog", cleandata$type, perl=T)]            <- "Dense Fog"
# Dense Smoke
cleandata$type[grep("smoke", cleandata$type, perl=T)]          <- "Dense Smoke"
# Drought
cleandata$type[grep("(drought|dry|driest)", 
                    cleandata$type, perl=T)]                   <- "Drought"
cleandata$type[grep("below normal precipitation", 
                    cleandata$type, perl=T)]                   <- "Drought"
# Dust Devil
cleandata$type[grep("devil", cleandata$type, perl=T)]          <- "Dust Devil"
# Dust Storm
cleandata$type[grep("dust", cleandata$type, perl=T)]           <- "Dust Storm"
# Excessive Heat
cleandata$type[grep("(ex|abnor).*(heat|warm)", 
                    cleandata$type, perl=T)]                   <- "Excessive Heat"
# Lake-Effect Snow
cleandata$type[grep("lake.*snow", cleandata$type, perl=T)]     <- "Lake-Effect Snow"
# Lakeshore Flood
cleandata$type[grep("lake.*flood", cleandata$type, perl=T)]    <- "Lakeshore Flood"
# Flash Flood
cleandata$type[grep("flash", cleandata$type, perl=T)]          <- "Flash Flood"
# Flood
cleandata$type[grep("(flood|dam|drown|high water|rising water|high sea|stream)",
                    cleandata$type, perl=T)]                   <- "Flood"
# Frost/Freeze
cleandata$type[grep("(frost|freeze)", cleandata$type, perl=T)] <- "Frost/Freeze"
# Funnel Cloud
cleandata$type[grep("(funnel|cloud)", cleandata$type, perl=T)] <- "Funnel"
# Marine Hail
cleandata$type[grep("marine.*hail", cleandata$type, perl=T)]   <- "Marine Hail"
# Marine High Wind
cleandata$type[grep("marine.*high.*wind", 
                    cleandata$type, perl=T)]                   <- "Marine High Wind"
# Marine Thunderstorm Wind
cleandata$type[grep("marine.*thunder.*wind", 
                    cleandata$type, perl=T)]                   <- "Marine Thunderstorm Wind"
# Marine Strong Wind
cleandata$type[grep("marine.*wind", cleandata$type, perl=T)]   <- "Marine Strong Wind"
# Hail
cleandata$type[grep("hail", cleandata$type, perl=T)]           <- "Hail"
# Heat
cleandata$type[grep("(heat|hot|high temperature|hyperthermia|warm|wet)",
                    cleandata$type, perl=T)]                   <- "Heat"
# High Surf
cleandata$type[grep("(surf|swells|wave)", 
                    cleandata$type, perl=T)]                   <- "High Surf"
# Hurricane (Typhoon)
cleandata$type[grep("(hurricane|typhoon)", 
                    cleandata$type, perl=T)]                   <- "Hurricane (Typhoon)"
# Ice Storm
cleandata$type[grep("(ice|drizzle|spray)", 
                    cleandata$type, perl=T)]                   <- "Ice Storm"
# Lightning
cleandata$type[grep("(lightning|ligntning)", 
                    cleandata$type, perl=T)]                   <- "Lightning"
# Rip Current
cleandata$type[grep("rip", cleandata$type, perl=T)]            <- "Rip Current"
# Seiche
cleandata$type[grep("seiche", cleandata$type, perl=T)]         <- "Seiche"
# Sleet
cleandata$type[grep("(sleet|glaze|icy)", 
                    cleandata$type, perl=T)]                   <- "Sleet"
# Storm Surge/Tide
cleandata$type[grep("(surge|tide)", cleandata$type, perl=T)]   <- "Storm Surge/Tide"
# Thunderstorm Wind
cleandata$type[grep("(thunder|tunder|tstm|downburst)", 
                    cleandata$type, perl=T)]                   <- "Thunderstorm Wind"
# Tornado
cleandata$type[grep("(tornado|gustnado)", 
                    cleandata$type, perl=T)]                   <- "Tornado"
# Tropical Depression
cleandata$type[grep("depression", cleandata$type, perl=T)]     <- "Tropical Depression"
# Tropical Storm
cleandata$type[grep("tropical", cleandata$type, perl=T)]       <- "Tropical Storm"
# Tsunami
cleandata$type[grep("tsunami", cleandata$type, perl=T)]        <- "Tsunami"
# Volcanic Ash
cleandata$type[grep("volcan", cleandata$type, perl=T)]         <- "Volcanic Ash"
# Waterspout
cleandata$type[grep("waterspout", cleandata$type, perl=T)]     <- "Waterspout"
# Wildfire
cleandata$type[grep("fire", cleandata$type, perl=T)]           <- "Wildfire"
# Winter Storm
cleandata$type[grep("winter.*storm", cleandata$type, perl=T)]  <- "Winter Storm"
# Winter Weather
cleandata$type[grep("winter", cleandata$type, perl=T)]         <- "Winter Weather"
# Heavy Rain
cleandata$type[grep("(rain|shower|precipitation)", 
                    cleandata$type, perl=T)]                   <- "Heavy Rain"
# Heavy Snow
cleandata$type[grep("snow", cleandata$type, perl=T)]           <- "Heavy Snow"
# Strong Wind
cleandata$type[grep("strong.*wind", cleandata$type, perl=T)]   <- "Strong Wind"
# High Wind
cleandata$type[grep("(wind|wnd)", cleandata$type, perl=T)]     <- "High Wind"

# Other/unclassified
cleandata$type[grep("^[a-z]", cleandata$type, perl=T)]         <- "Other/Unclassified"

cleandata$type <- as.factor(cleandata$type)

The new dataset categories are listed below:

levels(cleandata$type)

##  [1] "Astronomical Low Tide"    "Avalanche"               
##  [3] "Blizzard"                 "Coastal Flood"           
##  [5] "Cold/Wind Chill"          "Debris Flow"             
##  [7] "Dense Fog"                "Dense Smoke"             
##  [9] "Drought"                  "Dust Devil"              
## [11] "Dust Storm"               "Excessive Heat"          
## [13] "Extreme Cold/Wind Chill"  "Flash Flood"             
## [15] "Flood"                    "Freezing Fog"            
## [17] "Frost/Freeze"             "Funnel"                  
## [19] "Hail"                     "Heat"                    
## [21] "Heavy Rain"               "Heavy Snow"              
## [23] "High Surf"                "High Wind"               
## [25] "Hurricane (Typhoon)"      "Ice Storm"               
## [27] "Lake-Effect Snow"         "Lakeshore Flood"         
## [29] "Lightning"                "Marine Hail"             
## [31] "Marine High Wind"         "Marine Strong Wind"      
## [33] "Marine Thunderstorm Wind" "Other/Unclassified"      
## [35] "Rip Current"              "Seiche"                  
## [37] "Sleet"                    "Storm Surge/Tide"        
## [39] "Strong Wind"              "Thunderstorm Wind"       
## [41] "Tornado"                  "Tropical Depression"     
## [43] "Tropical Storm"           "Tsunami"                 
## [45] "Volcanic Ash"             "Waterspout"              
## [47] "Wildfire"                 "Winter Storm"            
## [49] "Winter Weather"

Results: Severe Weather Event Public Health Impact Analysis

Giving the number of event types to compare, the proposed strategy is to look first at fatalities and injuries from three different angles:

Total number of fatalities/injuries per event type in the entire time period.
Maximum number of fatalities/injuries per single event per type of event.
Number of events with non zero fatalities/injuries per type of event.

The top event types for each comparison will then be compared together in a single chart.

Total number of fatalities per event type in the entire time period

healthdata <- aggregate(cbind(fatalities, injuries) ~ type, data=cleandata, sum)
sortfat <- order(healthdata$fatalities, decreasing = TRUE)
head(healthdata[sortfat,])

##              type fatalities injuries
## 12 Excessive Heat       1958     6542
## 20           Heat       1036     1816
## 41        Tornado        958    15624
## 14    Flash Flood        883     1709
## 29      Lightning        704     4439
## 35    Rip Current        535      497

sortinj <- order(healthdata$injuries, decreasing = TRUE)
head(healthdata[sortinj,])

##                 type fatalities injuries
## 41           Tornado        958    15624
## 15             Flood        407     6855
## 12    Excessive Heat       1958     6542
## 40 Thunderstorm Wind        362     5186
## 29         Lightning        704     4439
## 20              Heat       1036     1816

If we consider the Total Number of Fatalities, Heat is the main event affecting health in the United State. Now looking at the Total Number of Injuries, Tornadoes and Floods need to be taken into consideration as well.

Maximum number of fatalities/injuries per single event per type of event

healthdata <- aggregate(cbind(fatalities, injuries) ~ type, data=cleandata, max)
sortfat <- order(healthdata$fatalities, decreasing = TRUE)
head(healthdata[sortfat,])

##              type fatalities injuries
## 20           Heat        583      230
## 12 Excessive Heat         99      519
## 41        Tornado         32      293
## 44        Tsunami         32      129
## 9         Drought         29       15
## 43 Tropical Storm         22      200

sortinj <- order(healthdata$injuries, decreasing = TRUE)
head(healthdata[sortinj,])

##                   type fatalities injuries
## 15               Flood         11      800
## 25 Hurricane (Typhoon)         15      780
## 12      Excessive Heat         99      519
## 41             Tornado         32      293
## 20                Heat        583      230
## 43      Tropical Storm         22      200

A similar picture is seen when looking at the maximum number of people affected per single event with the exemption of having Hurricanes as the second most impacting for injuries.

Number of events with non zero fatalities/injuries per type of event.

nonzero <- function(x) sum(x != 0)
healthdata <- aggregate(cbind(fatalities, injuries) ~ type, data=cleandata, nonzero)
sortfat <- order(healthdata$fatalities, healthdata$injuries, decreasing = TRUE)
head(healthdata[sortfat,])

##                 type fatalities injuries
## 29         Lightning        656     2383
## 14       Flash Flood        575      349
## 12    Excessive Heat        561      160
## 35       Rip Current        474      186
## 41           Tornado        379     1785
## 40 Thunderstorm Wind        309     2079

sortinj <- order(healthdata$injuries, healthdata$fatalities, decreasing = TRUE)
head(healthdata[sortinj,])

##                 type fatalities injuries
## 29         Lightning        656     2383
## 40 Thunderstorm Wind        309     2079
## 41           Tornado        379     1785
## 24         High Wind        210      470
## 14       Flash Flood        575      349
## 47          Wildfire         33      268

If we look now at the number of events with at least one fatality/injury, the results are significantly different: Lightning, Flash Flood and Thunderstorm Wind are the top event types for this type of analysis.

library(lattice)
filter <- cleandata$type %in% c( "Excessive Heat",
                                 "Flash Flood",
                                 "Flood",
                                 "Heat",
                                 "Hurricane (Typhoon)",
                                 "Lightning",
                                 "Thunderstorm Wind",
                                 "Tornado" )
# Plot data
xyplot( fatalities ~ time | type, 
        data = cleandata[filter,], 
        scales = list(y = list(log = 10)),
        xlab = "Years", 
        ylab = "Number of Fatalities",
        layout = c(4,2),
        panel = function(x, y, ...) {
            panel.xyplot(x, y, ...) ## First call the default panel function for 'xyplot'
            panel.abline(h = 10^0.5, lty = 2, col = "black") ## Add a horizontal line for reference
        } )

plot of chunk plothealth

xyplot( injuries ~ time | type, 
        data = cleandata[filter,], 
        scales = list(y = list(log = 10)),
        xlab = "Years", 
        ylab = "Number of Injuries",
        layout = c(4,2),
        panel = function(x, y, ...) {
            panel.xyplot(x, y, ...) ## First call the default panel function for 'xyplot'
            panel.abline(h = 10^1.5, lty = 2, col = "black") ## Add a horizontal line for reference
        } )

plot of chunk plothealth

First we need to note that the y axis is logarithmic. The main conclusions from this first section would be:

Heat and Excessive Heat have together the strongest impact in fatalities.
Tornadoes and Floods are the biggest contributors to injuries
Lightning and Thunderstorm Winds are common events causing fatalities/injuries but the number of people affected is significantly lower.

Results: Severe Weather Event Economic Impact Analysis

Following a similar approach to the one above, the proposed strategy is to look first at property and crop damage cost from three different angles:

Total cost of property/crop damage per event type in the entire time period.
Maximum cost of property/crop damage per single event per type of event.
Number of events with non zero cost of property/crop damage per type of event.

And then again the top event types for each comparison will be compared together in a single chart.

costdata <- aggregate(cbind(propcost, cropcost) ~ type, data=cleandata, sum)
sortprop <- order(costdata$propcost, decreasing = TRUE)
head(costdata[sortprop,])

##                   type  propcost  cropcost
## 15               Flood 1.366e+11 5.733e+09
## 25 Hurricane (Typhoon) 8.522e+10 5.495e+09
## 38    Storm Surge/Tide 4.780e+10 8.550e+05
## 41             Tornado 1.510e+10 2.652e+08
## 19                Hail 1.488e+10 2.617e+09
## 14         Flash Flood 1.415e+10 1.361e+09

sortcrop <- order(costdata$cropcost, decreasing = TRUE)
head(costdata[sortcrop,])

##                   type  propcost  cropcost
## 9              Drought 1.048e+09 1.389e+10
## 15               Flood 1.366e+11 5.733e+09
## 25 Hurricane (Typhoon) 8.522e+10 5.495e+09
## 19                Hail 1.488e+10 2.617e+09
## 17        Frost/Freeze 5.155e+06 1.576e+09
## 14         Flash Flood 1.415e+10 1.361e+09

If we consider the Total Property Damage Cost, Floods is the main impacting weather event in the United State followed by Hurricanes. Now looking the Total Crop Damage Cost is of course heavily impacted by Drought and Floods.

costdata <- aggregate(cbind(propcost, cropcost) ~ type, data=cleandata, max)
sortprop <- order(costdata$propcost, decreasing = TRUE)
head(costdata[sortprop,])

##                   type  propcost cropcost
## 15               Flood 1.150e+11 5.00e+08
## 38    Storm Surge/Tide 3.130e+10 7.50e+05
## 25 Hurricane (Typhoon) 1.693e+10 1.51e+09
## 43      Tropical Storm 5.150e+09 2.00e+08
## 21          Heavy Rain 2.500e+09 2.00e+08
## 19                Hail 1.800e+09 7.00e+07

sortcrop <- order(costdata$cropcost, decreasing = TRUE)
head(costdata[sortcrop,])

##                       type  propcost  cropcost
## 25     Hurricane (Typhoon) 1.693e+10 1.510e+09
## 9                  Drought 6.451e+08 1.000e+09
## 13 Extreme Cold/Wind Chill 2.500e+07 5.960e+08
## 15                   Flood 1.150e+11 5.000e+08
## 12          Excessive Heat 3.300e+06 4.924e+08
## 20                    Heat 3.800e+06 4.000e+08

A similar picture is seen when looking at the maximum damage cost per single event with the exemption of having Storm Surge/Tide as the second most impacting for property damages.

nonzero <- function(x) sum(x != 0)
costdata <- aggregate(cbind(propcost, cropcost) ~ type, data=cleandata, nonzero)
sortprop <- order(costdata$propcost, costdata$cropcost, decreasing = TRUE)
head(costdata[sortprop,])

##                 type propcost cropcost
## 40 Thunderstorm Wind    96559     4375
## 19              Hail    19927     8186
## 14       Flash Flood    18384     1840
## 41           Tornado    11047     1211
## 15             Flood     8745     1562
## 29         Lightning     8676       94

sortcrop <- order(costdata$cropcost, costdata$propcost, decreasing = TRUE)
head(costdata[sortcrop,])

##                 type propcost cropcost
## 19              Hail    19927     8186
## 40 Thunderstorm Wind    96559     4375
## 14       Flash Flood    18384     1840
## 15             Flood     8745     1562
## 41           Tornado    11047     1211
## 9            Drought      116      243

If we look now at the number of events with any economic impact, the results are significantly different: Hail, Flash Flood and Thunderstorm Wind are the top event types for this type of analysis.

library(lattice)
filter <- cleandata$type %in% c( "Drought",
                                 "Flood",
                                 "Hail",
                                 "Hurricane (Typhoon)",
                                 "Storm Surge/Tide",
                                 "Thunderstorm Wind",
                                 "Extreme Cold/Wind Chill",
                                 "Flash Flood" )
# Plot data
xyplot( propcost ~ time | type, 
        data = cleandata[filter,], 
        scales = list(y = list(log = 10)),
        xlab = "Years", 
        ylab = "Property Damage Cost in $",
        layout = c(4,2),
        panel = function(x, y, ...) {
            panel.xyplot(x, y, ...) ## First call the default panel function for 'xyplot'
            panel.abline(h = 10^6, lty = 2, col = "black") ## Add a horizontal line for reference
        } )

plot of chunk plotcost

xyplot( cropcost ~ time | type, 
        data = cleandata[filter,], 
        scales = list(y = list(log = 10)),
        xlab = "Years", 
        ylab = "Crop Damage Cost in $",
        layout = c(4,2),
        panel = function(x, y, ...) {
            panel.xyplot(x, y, ...) ## First call the default panel function for 'xyplot'
            panel.abline(h = 10^6, lty = 2, col = "black") ## Add a horizontal line for reference
        } )

plot of chunk plotcost

First we need to note that the y axis is logarithmic. The main conclusions from this second section would be:

Floods and Hurricanes have the strongest impact in Property Damage Cost.
Drought and Floods are the biggest contributors to Crop Damage Cost.
Hail and Thunderstorm Winds are common events causing Property and Crop Damage Costs but the impact amount is significantly lower.

Conclusions

We found that most harmful events for the population health are in general not the ones with the biggest economic consecuences. In particular, Heat and Excessive Heat fatalities are the most impacting in public health and have no significant economical impact in property and crop damages. A similar conclusion would apply to Tornadoes being the main event on total number of injuries in the time period.

On the other hand, Flood is the weather event with highest impact in Property and Crop Damage Cost overall. Of course Drought is the main contributor to Crop Damage cost and Hurricanes are the second biggest driver for Property Damage Cost.

Hail and Lightnings are common events with lower impact causing Property and Crop Damage Costs and Health impact respectively.

Finally, Thunderstom Wind and Flash Flood are common events with lower impact but affecting both health and economics.

Storm

Pablo Rojo

Tuesday, July 22, 2014