Storms and Other Severe Weather Events: Their Health and Economic Consequences

Synopsis

In this report we use data from the NOAA Storm Database to evaluate the health and economic consequences of different weather phenomena. The data set includes information about storms and other sever weather events such as: location, duration, injuries and fatalities, as well as damage to crops and properties. The record starts in 1950 and ends in November 2011. We first describe the process to load and clean the data. Next, a simple analysis is performed to determine which events had the greatest health and economic consequences between 1950 and 2011.

Loading and Processing the Raw Data

The data set is contained in a file compressed using the bzip2 algorithm to reduce its size. We begin by decompressing the file and reading it into the variable data. We use the fread function because it is faster than read.csv. You will need to install the R.utils package for the process to work correctly.

library(data.table)
data<-fread("repdata-data-StormData.csv.bz2")
sdata<-dim(data)

By examining data we can see that there are 902297observations with 37 variables each. The names of the variables are:

names(data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

We will not be using all variables in this analysis, so we subset data to remove the unnecessary columns.

library(stringr)
data <- data [,-c(1,5:7,9:20,30:37)]
data$YEAR<-str_sub(data$BGN_DATE, -12, -8)

NOAA’s documentation states 48 valid event types. However, there are 985 event types in the data set, meaning there are some typos and invalid names. We use a list of the valid event types to correct as many typos as possible.

library(stringdist)
## List of valid event types
evtype <- toupper(c("Astronomical Low Tide", "Avalanche", "Blizzard","Coastal Flood","Cold/Wind Chill", "Debris Flow","Dense Fog","Dense Smoke","Drought","Dust Devil","Dust Storm", "Excessive Heat","Extreme Cold/Wind Chill","Flash Flood", "Flood", "Frost/Freeze", "Funnel Cloud", "Freezing Fog","Hail","Heat","Heavy Rain","Heavy Snow","High Surf","High Wind", "Hurricane (Typhoon)","Ice Storm", "Lake-Effect Snow","Lakeshore Flood","Lightning","Marine Hail","Marine High Wind", "Marine Strong Wind","Marine Thunderstorm Wind", "Rip Current","Seiche","Sleet","Storm Surge/Tide","Strong  Wind","Thunderstorm Wind","Tornado","Tropical Depression","Tropical Storm","Tsunami","Volcanic Ash", "Waterspout","Wildfire", "Winter Storm", "Winter Weather"))
## We use amatch to correct typos
matched <- amatch(x = data$EVTYPE,table = evtype,maxDist = 9)
## Replace invalid names with valid ones
data$EVTYPE<- evtype[matched]

After correction, we end up with 49 types of events. The additional event type is NA, assigned to those entries for which a match was not encountered. They represent only 0.5668865% of the entries so we have chosen to ignore them in this analysis.

Next, we convert the property damage exponent (PROPDMGEXP) and crop damage exponent (CROPDMGEXP) from a letter to its corresponding numeric value. We have based our conversion on the work presented here

data$PROPDMGEXP<- as.factor(data$PROPDMGEXP)
## We substitute the exponent letter for its numeric equivalent
levels(data$PROPDMGEXP) <- list(levels(data$PROPDMGEXP), "0" = c("","-","?"),"1" = c("+"), "10" = "0":"8", "100" = c("h","H"), "1000" = c("K"), "1000000" = c("M","m"), "1000000000" = c("B"))
data$PROPDMGEXP <- as.numeric(as.character(data$PROPDMGEXP))
## Calculate the actual damage
data$PROPDMGTOT <- data$PROPDMG*data$PROPDMGEXP

data$CROPDMGEXP<- as.factor(data$CROPDMGEXP)
## We substitute the exponent letter for its numeric equivalent
levels(data$CROPDMGEXP) <- list(levels(data$CROPDMGEXP), "0" = c("","-","?"),"1" = c("+"), "10" = "0":"8", "100" = c("h","H"), "1000" = c("K"), "1000000" = c("M","m"), "1000000000" = c("B"))
data$CROPDMGEXP <- as.numeric(as.character(data$CROPDMGEXP))
## Calculate the actual damage
data$CROPDMGTOT <- data$CROPDMG*data$CROPDMGEXP

Results

Once the data has been cleaned and processed, we can proceed to evaluate the health and economic consequences of storms and other severe weather events in the USA between 1950 and 2011. There are several options to rank the impact of each event type. We have chosen to calculate the average effect per year and ranked them accordingly.

Economic consequences

There are two variables related to the economic consequences: crop damage and property damage. We first estimate the yearly average damage to crops and property for each event type.

## We first estimate the total impact of every event type for every available year
crop<-tapply(data$CROPDMGTOT, list(data$EVTYPE,data$YEAR), sum, na.rm=TRUE)
property<-tapply(data$PROPDMGTOT, list(data$EVTYPE,data$YEAR), sum, na.rm=TRUE)
## Then we calculate the yearly average impact on crops and property separately
avg_crop<-data.frame(avg_crop=rowMeans(crop, na.rm = TRUE))
avg_crop$evtype<- row.names(avg_crop)
avg_property<-data.frame(avg_property=rowMeans(property, na.rm = TRUE))
avg_property$evtype<- row.names(avg_property)
## Now we calculate the total impact (crop+property)
avg_totaldmg <- data.frame(evtype=avg_crop$evtype,total=avg_crop$avg_crop+avg_property$avg_property)
## We sort the data frame in descending order
avg_crop<-avg_crop[order(avg_crop$avg_crop, decreasing = TRUE),]
avg_property<-avg_property[order(avg_property$avg_property, decreasing = TRUE),]
avg_totaldmg<-avg_totaldmg[order(avg_totaldmg$total, decreasing = TRUE),]
## We remove the row names for aesthetic purposes
rownames(avg_crop) <- NULL
rownames(avg_property) <- NULL
rownames(avg_totaldmg) <- NULL

The events can be ranked based on their impact on crops, property, or a combination of both. The top ten event types based on crop damage and property damage separatedly are presented below. Table 1 and 2 show the top ten event types based on their average yearly economic impact to crops and property, respectively.

library(scales)
library(knitr)
library(kableExtra)
## We create the table for the crop damage
kable(head(avg_crop[,c(2,1)], n = 10), format="html",
      caption = "Table 1. Top 10 Event types by crop damage",full_width = F, col.names = c("Event Type", "Average crop damage per year ($)"),align=rep('c', 2)) %>% kable_styling("striped", full_width = F)
Table 1. Top 10 Event types by crop damage
Event Type Average crop damage per year ($)
DROUGHT 735399000
HURRICANE (TYPHOON) 460563800
FLASH FLOOD 345417955
FLOOD 302579682
ICE STORM 264321763
SEICHE 152328378
FROST/FREEZE 91325733
FUNNEL CLOUD 68051211
HAIL 53094345
HEAVY RAIN 42310674
## We create the table for the property damage
kable(head(avg_property[,c(2,1)], n = 10), format = "html",
      caption = "Table 2. Top 10 Event types by property damage", col.names = c("Event Type", "Average property damage per year ($)"),align=rep('c', 2)) %>% kable_styling("striped", full_width = F)
Table 2. Top 10 Event types by property damage
Event Type Average property damage per year ($)
HURRICANE (TYPHOON) 12122964333
FLOOD 7653316232
STORM SURGE/TIDE 2664718000
FLASH FLOOD 1166560149
TORNADO 918419947
SEICHE 660198251
WILDFIRE 464068239
TROPICAL STORM 406020555
WINTER STORM 352079329
THUNDERSTORM WIND 348313689

We now take a look at the rank based on the combined effect (crop+property) wchich we consider a more adequate way of ranking the economic consequences. Table 3 shows the top ten event types based on their total yearly average economic impact. Fig. 1 shows the same information in a barplot for the top five events. It is clear that the highest economic impact is produced by HURRICANE (TYPHOON) causing on average $12,583,528,133 in losses every year.

## We create the table for the combined damage
kable(head(avg_totaldmg, n = 10), format = "html",
      caption = "Table 3. Top 10 Event types by combined (crop+property) damage", col.names = c("Event Type", "Average economic damage per year ($)"),align=rep('c', 2)) %>% kable_styling("striped", full_width = F)
Table 3. Top 10 Event types by combined (crop+property) damage
Event Type Average economic damage per year ($)
HURRICANE (TYPHOON) 12583528133
FLOOD 7955895914
STORM SURGE/TIDE 2664765500
FLASH FLOOD 1511978104
TORNADO 925112923
SEICHE 812526628
DROUGHT 790814821
WILDFIRE 486323431
ICE STORM 484742106
TROPICAL STORM 442594029
barplot(height = avg_totaldmg$total[1:5],names.arg = avg_totaldmg$evtype[1:5],main = "Fig. 1 Top Five Events for Economic Consequences",xlab = "Event Type",ylab = "AVERAGE ECONOMIC CONSEQUENCES PER YEAR ($)",cex.names = 0.6)

Health consequences

The health consequences can also be evaluated based on two variables: fatalities and injuries. We begin by calculating the average number of deaths or injuries per year for each event type.

## We estimate the total number of fatalities and injuries for each year and event type
fatalities<-tapply(data$FATALITIES, list(data$EVTYPE,data$YEAR), sum, na.rm=TRUE)
injuries<-tapply(data$INJURIES, list(data$EVTYPE,data$YEAR), sum, na.rm=TRUE)
## Next, we calculate the yearly average
avg_fat<-data.frame(avg_fat=rowMeans(fatalities, na.rm = TRUE))
avg_fat$evtype<- row.names(avg_fat)
avg_inj<-data.frame(avg_inj=rowMeans(injuries, na.rm = TRUE))
avg_inj$evtype<- row.names(avg_inj)
## Then we calculate the combined effect (fatalities+injuries)
avg_total <- data.frame(evtype=avg_fat$evtype,total=avg_fat$avg_fat+avg_inj$avg_inj)
## We order the list in decreasing order
avg_fat<-avg_fat[order(avg_fat$avg_fat, decreasing = TRUE),]
avg_inj<-avg_inj[order(avg_inj$avg_inj, decreasing = TRUE),]
avg_total<-avg_total[order(avg_total$total, decreasing = TRUE),]
## We remove the row names for aesthetic purposes
rownames(avg_fat) <- NULL
rownames(avg_inj) <- NULL
rownames(avg_total) <- NULL

We now look at the top ten events based on the average number of fatalities per year (Table 4) and the average number of injuries (Table 5). We can see that the ranking changes depending on the variable used.

## We create the table for the fatalities
kable(head(avg_fat[,c(2,1)], n = 10), format="html",
      caption = "Table 4. Top 10 Event types by fatalities",full_width = F, col.names = c("Event Type", "Average fatalities per year"),align=rep('c', 2)) %>% kable_styling("striped", full_width = F)
Table 4. Top 10 Event types by fatalities
Event Type Average fatalities per year
EXCESSIVE HEAT 112.22222
TORNADO 90.85484
HEAT 79.71429
FLASH FLOOD 54.78947
LIGHTNING 43.05263
FLOOD 32.10526
RIP CURRENT 31.77778
HIGH WIND 14.15789
THUNDERSTORM WIND 13.40000
HURRICANE (TYPHOON) 12.00000
## We create the table for the injuries
kable(head(avg_inj[,c(2,1)], n = 10), format="html",
      caption = "Table 5. Top 10 Event types by injuries",full_width = F, col.names = c("Event Type", "Average injuries per year"),align=rep('c', 2)) %>% kable_styling("striped", full_width = F)
Table 5. Top 10 Event types by injuries
Event Type Average injuries per year
TORNADO 1473.61290
FLOOD 416.47368
EXCESSIVE HEAT 372.38889
LIGHTNING 275.42105
HURRICANE (TYPHOON) 212.83333
HEAT 175.92857
THUNDERSTORM WIND 163.46667
HIGH WIND 149.92982
ICE STORM 106.42105
FLASH FLOOD 94.84211

However, we consider that the health consequences should be evaluated based on the combined number of injuries and fatalities. That ranking is presented in Table 6 and Fig. 2. Based on the combined ranking, the weather event with the highest impact on human health is TORNADO causing on average 1564 injuries and fatalities every year.

## We create the table for the combined effect
kable(head(avg_total, n = 10), format="html",
      caption = "Table 6. Top 10 Event types by fatalities and injuries combined",full_width = F, col.names = c("Event Type", "Average fatalities and/or injuries per year"),align=rep('c', 2)) %>% kable_styling("striped", full_width = F)
Table 6. Top 10 Event types by fatalities and injuries combined
Event Type Average fatalities and/or injuries per year
TORNADO 1564.4677
EXCESSIVE HEAT 484.6111
FLOOD 448.5789
LIGHTNING 318.4737
HEAT 255.6429
HURRICANE (TYPHOON) 224.8333
THUNDERSTORM WIND 176.8667
HIGH WIND 164.0877
FLASH FLOOD 149.6316
ICE STORM 111.4737
barplot(height = avg_total$total[1:5],names.arg = avg_total$evtype[1:5],main = "Fig. 2 Top Five Events for Health Consequences",xlab = "Event Type",ylab = "AVERAGE INJURIES AND/OR FATALITIES PER YEAR",cex.names = 0.6)