Identification of the Hydrometereological Events that Caused the Most Harmful Effects and had the Greatest Economic Consequences

Synopsis

We analyzed the U.S. National Oceanic and Atmospheric Administration's storm database from the past 60 years. Our aim was to identify the hydrometereological events that caused the most harmful effects for human health (injuries and fatalities) and had the greatest economic consequences in terms of property damage. We found that the most harmful metereological event was the tornado, which caused over 90,000 direct injuries in the last 60 years. Likewise, most deadly events were caused also by tornadoes, with about 5,600 deaths during the evaluated period of time. Finally, flooding had the greatest economic consequences, with over 150 billion dollars in property damages.


Data Processing

In this section we describe (in words and code) how the data from the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database was loaded into R and processed for analysis. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The events in the database start in the year 1950 and end in November 2011. Data processing and analysis was done using R version 3.1.0 (R Foundation for Statistical Computing, Vienna, Austria).

The NOAA database was downloaded from here. This database came in the form of a comma-separated-value file compressed via de bzip2 algorithm to reduce its size. The file was saved as StormData.csv.bz2 in the working directory.

Two more files were downloaded, indicating how the variables in the dataset are constructed/defined:

We then unzipped the StormData.csv.bz2 dataset using the bunzip2 function from the R.utils R package and saved the unzipped dataset to a file named StormData.csv in the working directory. We then load the dataset into a data.frame storm.data containing all the 902,297 observation of 37 variables.

storm.data <- read.csv("StormData.csv", na.strings = "")

We decided to convert all blank spaces into NAs. After viewing the data.frame we noted that the variable EVTYPE had some backslashes, so we decided to convert these backslashes into forward slashes to avoid error while recoding levels.

storm.data$EVTYPE <- gsub("\\", "/", storm.data$EVTYPE, fixed = TRUE)

The major challenge we had for the analysis was the careless report of the events into the EVTYPE variable. Indeed, the NWS Manual specifies 48 permitted events (page 6) while we found 985 levels in EVTYPE. So, our first task was to recode these 985 levels of EVTYPE into the 48 permitted levels. When it was not possible to assign a permitted level to a particular level we coded this particular level as NA.

We also noted similar inconsistencies in the PROPDMGEXP variable, with only 3 permited levels (NWS Manual, page 12) and 18 levels in the dataset. So, our second task was to recode PROPDMGEXP to include only the permitted levels. When it was not possible to assign a permitted level to a particular level we coded this particular level as NA.

To increase the readability of this document we have decided to include all this recoding into a separate R script:

source("RecodingNWS.R")
## Loading required package: plyr

This script is published here in GitHub, along with the R script used as the basis of this report.

After fixing the levels of EVTYPE and PROPDMGEXP we recode these variables as factor and numeric variables, respectively:

storm.data$EVTYPE <- as.factor(storm.data$EVTYPE)
storm.data$PROPDMGEXP <- as.numeric(levels(storm.data$PROPDMGEXP))[storm.data$PROPDMGEXP]

For estimating the total economic damage we combined PROPDMGand PROPDMGEXP, creating a new variable PROPDMGTOTAL:

storm.data$PROPDMGTOTAL <- storm.data$PROPDMG * storm.data$PROPDMGEXP

Finally, we created a new data frame with the variables that were used for estimating the harmful effect and economic consequences of the hydrometereological events included in the NOAA dataset:

dataset <- storm.data[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMGTOTAL")]
dataset$PROPDMGTOTAL[is.na(dataset$PROPDMGTOTAL)] <- 0

As seen, we decided to recode the NAs into 0s, as these cells were originally empty, and we assumed that the reported event caused no property damage. We then removed the original data.frame:

rm(storm.data)

The dataset data.frame was used for analyzing the data and reporting the Results.


Results

We first summarized all number of harmful events (i.e., injuries and fatalities) and property damage estimates by event type. For this, we created a new R object named harmful using the ddply function from the plyr R package:

require(plyr)
harmful <- ddply(dataset, "EVTYPE", summarize, ALL.INJURIES = sum(INJURIES), 
    ALL.FATALITIES = sum(FATALITIES), ALL.PROPDMG = sum(PROPDMGTOTAL))

The harmful object includes the following variables:

The full table is shown here:

format(harmful, big.mark = ",", scientific = FALSE)
##                      EVTYPE ALL.INJURIES ALL.FATALITIES     ALL.PROPDMG
## 1     Astronomical Low Tide            0              0         320,000
## 2                 Avalanche          171            225       8,721,800
## 3                  Blizzard          805            101     659,913,950
## 4             Coastal Flood            7              6     459,107,060
## 5           Cold/Wind Chill           60            165       2,544,000
## 6               Debris Flow           55             44     327,258,100
## 7                 Dense Fog          342             18       9,674,000
## 8               Dense Smoke            0              0         100,000
## 9                   Drought           48             10   1,053,038,600
## 10               Dust Devil           43              2         719,130
## 11               Dust Storm          440             22       5,619,000
## 12           Excessive Heat        6,749          2,059       7,869,200
## 13  Extreme Cold/Wind Chill          260            316     133,290,400
## 14              Flash Flood        1,880          1,065  16,991,233,460
## 15                    Flood        6,794            482 150,129,365,500
## 16             Freezing Fog          735             63      15,337,500
## 17             Frost/Freeze          234             26      51,246,700
## 18             Funnel Cloud            3              0         199,600
## 19                     Hail        1,372             15  15,975,650,720
## 20                     Heat        2,479          1,114      12,257,050
## 21               Heavy Rain          280            101   3,238,397,690
## 22               Heavy Snow        1,162            149     979,442,740
## 23                High Surf          251            179     102,000,000
## 24                High Wind        1,476            295   5,992,380,960
## 25      Hurricane (Typhoon)        1,333            135  85,356,410,010
## 26                Ice Storm        2,208             96   3,950,832,310
## 27         Lake-Effect Snow            0              0      40,682,000
## 28          Lakeshore Flood            0              0       7,570,000
## 29                Lightning        5,231            817     933,732,280
## 30              Marine Hail            0              0           4,000
## 31         Marine High Wind            9              9       1,312,510
## 32       Marine Strong Wind           22             14         418,330
## 33 Marine Thunderstorm Wind           34             19       5,857,400
## 34              Rip Current          529            572         163,000
## 35                   Seiche            0              0         980,000
## 36                    Sleet            0              2       1,901,000
## 37         Storm Surge/Tide           45             28  47,965,274,000
## 38              Strong Wind          395            135     188,106,740
## 39        Thunderstorm Wind        9,510            714  10,970,557,630
## 40                  Tornado       91,407          5,661  58,593,098,230
## 41      Tropical Depression            0              0       1,737,000
## 42           Tropical Storm          383             66   7,714,390,550
## 43                  Tsunami          129             33     144,062,000
## 44             Volcanic Ash            0              0         500,000
## 45               Waterspout           29              3       9,564,200
## 46                 Wildfire        1,608             90   8,496,628,500
## 47             Winter Storm        1,353            217   6,749,497,250
## 48           Winter Weather          615             62      27,310,500
## 49                       NA           42             15       2,365,500

We then focused on answering the 2 main questions of this study.

1) Across the United States, which types of events are most harmful with respect to population health?

For answering this question we identified the events that had the 5 highest total number of injured people and total number of people who died as a direct consequence of the event. For the total number of injured people we created a Q1 R object in which we selected the 5 most harmful events:

Q1 <- order(harmful$ALL.INJURIES, decreasing = TRUE)[1:5]

The 5 most harmful events that caused injuries are shown in this table:

format(harmful[Q1, c(1, 2)], big.mark = ",")
##               EVTYPE ALL.INJURIES
## 40           Tornado       91,407
## 39 Thunderstorm Wind        9,510
## 15             Flood        6,794
## 12    Excessive Heat        6,749
## 29         Lightning        5,231

The results are shown in this plot:

barplot(harmful[Q1, 2], xlab = "Event Type", ylab = "Total No. Injuries", cex.lab = 1.5, 
    names.arg = harmful$EVTYPE[Q1])

plot of chunk injuries

For the total number of people who died we created a Q2 R object in which we selected the 5 most deadly events:

Q2 <- order(harmful$ALL.FATALITIES, decreasing = TRUE)[1:5]

The 5 most deadly events are shown in this table:

format(harmful[Q2, c(1, 3)], big.mark = ",")
##            EVTYPE ALL.FATALITIES
## 40        Tornado          5,661
## 12 Excessive Heat          2,059
## 20           Heat          1,114
## 14    Flash Flood          1,065
## 29      Lightning            817

These results are shown in this plot:

barplot(harmful[Q2, 3], xlab = "Event Type", ylab = "Total No. Fatalities", 
    cex.lab = 1.5, names.arg = harmful$EVTYPE[Q2])

plot of chunk fatalities

2) Across the United States, which types of events have the greatest economic consequences?

For answering this question we identified the events that caused the 5 highest property damage as a consequence of the event. For the total property we created a Q3 R object in which we selected these 5 events:

Q3 <- order(harmful$ALL.PROPDMG, decreasing = TRUE)[1:5]

The events that caused the highest property damage are shown in this table:

format(harmful[Q3, c(1, 4)], big.mark = ",", scientific = FALSE)
##                 EVTYPE     ALL.PROPDMG
## 15               Flood 150,129,365,500
## 25 Hurricane (Typhoon)  85,356,410,010
## 40             Tornado  58,593,098,230
## 37    Storm Surge/Tide  47,965,274,000
## 14         Flash Flood  16,991,233,460

These results are shown in this plot:

barplot(harmful[Q3, 4], xlab = "Event Type", ylab = "Total Property Damage", 
    cex.lab = 1.5, names.arg = harmful$EVTYPE[Q3])

plot of chunk damage


Conclusions

By far, the most harmful metereological event has been the tornado, which has caused over 90,000 injuries in the last 60 years. Thunderstorm wind, flood, excessive heat, and lighting were among the most harmful events following tornadoes. Likewise, most deadly events were caused also by tornadoes, with about 5,600 deaths in the past 60 years. Excessive heat, heat, flash flood, and lighting were among the most deadly events following tornadoes. Finally, flooding had the greatest economic consequences, with over 150 billion dollars in property damages. Other hydrometereological events that caused great property damage were hurricanes, tornadoes, storm surges/tides, and flash flooding.