Synopsis

We analyzed the U.S. National Oceanic and Atmospheric Administration’s storm database from the past 60 years. Our aim was to identify the hydrometereological events that caused the most harmful effects for human health (injuries and fatalities) and had the greatest economic consequences in terms of property damage. We found that the most harmful metereological event was the tornado, which caused over 90,000 direct injuries in the last 60 years. Similarly, most deadly events were caused also by tornadoes, with about 5,600 deaths during the evaluated period of time. Finally, flooding had the greatest economic consequences, with over 150 billion dollars in property damages.


Data Processing

In this section we describe (in words and code) how the data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database was loaded into R and processed for analysis. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The events in the database start in the year 1950 and end in November 2011. Data processing and analysis was done using R version 3.2.0 “Full of Ingredients” (R Foundation for Statistical Computing, Vienna, Austria).

library(knitr)
opts_chunk$set(message = FALSE, fig.width = 9)
library(R.utils)
library(readr)
library(dplyr)
library(stringr)
library(ggvis)

We first downloaded the NOAA dataset from the Johns Hopkins Reproducible Research course link on Coursera’s web site. The data for this assignment came in the form of a CSV file compressed via the bzip2 algorithm to reduce its size. Along with the dataset, 2 more files were downloaded, indicating how the variables in the dataset are defined: the National Weather Service Storm Data Documentation (referenced here as the NWS Manual) and the National Climatic Data Center Storm Events FAQ. We finally unzipped the Storm_Data.bz2 dataset and saved the unzipped dataset to a file named Storm_Data.csv in the working directory. We then loaded the dataset into a data frame named Data and selected the variables to be used.

# Data file
URL_data <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(URL_data, "Files/Storm_Data.bz2", method = "wget")
# Storm data documentation
URL_Manual <- "https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf"
download.file(URL_Manual, "Files/NMS_Manual.pdf", method = "wget")
# FAQ
URL_FAQ <- "https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf"
download.file(URL_FAQ, "Files/FAQ.pdf", method = "wget")
# Unzipping the dataset
bunzip2("Files/Storm_Data.bz2", "Files/Storm_Data.csv")
# Loading the dataset
Data <- read_csv("Files/Storm_Data.csv")
# Selecting the variables
Data <- Data %>%
  select(EVTYPE, PROPDMGEXP, PROPDMG, FATALITIES, INJURIES)

The major challenge we had for the analysis was the careless report of the events into the EVTYPE variable. Indeed, the NWS Manual specifies 48 events (page 6) while we found 985 levels in EVTYPE. So, our first task was to recode these 985 levels of EVTYPE into the predefined 48 events, using a combination of string replacements and regular expressions. When it was not possible to assign a predefined event to a particular level we coded this particular level as NA. We finally filter the dataset to exclude rows with NA values.

source("Files/Fix_EVTYPE.R")
Data <- Data %>%
  filter(EVTYPE != "NA")

The final R script used for recoding, as indicated in the previous R chunk, is available here. If the link doesn’t open directly please right click on it and select “Open link in new …”.

We also noted similar inconsistencies in the PROPDMGEXP variable, with only 3 predefined levels (NWS Manual, page 12) and 18 levels in the downloaded dataset. So, our second task was to filter and recode PROPDMGEXP to include only the predefined levels. For this, we filter to keep only those levels that made sense. i.e., m, M, K, and B. We then replaced these string values for the corresponding numerical values. Finally, for estimating the total economic damage we combined PROPDMGand PROPDMGEXP, creating a new variable PROPDMGTOTAL:

Data <- Data %>%
  filter(PROPDMGEXP %in% c("m", "M", "K", "B"))
Data$PROPDMGEXP <- Data$PROPDMGEXP %>%
  plyr::revalue(c("m" = 1e+6, "M" = 1e+6, "K" = 1e+3, "B" = 1e+9)) %>%
  as.numeric()
Data <- Data %>%
  mutate(PROPDMGTOTAL = PROPDMG * PROPDMGEXP)

This is the dataset we used for the data analysis. All datasets, scripts and associated files are available here.


Results

We first summarized all number of harmful events (i.e., injuries and fatalities) and property damage estimates by event type. For this, we created a summary table containing the following variables:

Harmful <- Data %>%
  group_by(EVTYPE) %>%
  summarize(ALL_INJURIES = sum(INJURIES),
            ALL_FATALITIES = sum(FATALITIES),
            ALL_PROPDMG = sum(PROPDMGTOTAL))
Harmful %>%
  mutate(ALL_PROPDMG = format(ALL_PROPDMG, big.mark = ",")) %>%
  kable(align = c("l", "c", "c", "c"))
EVTYPE ALL_INJURIES ALL_FATALITIES ALL_PROPDMG
Astronomical Low Tide 0 0 9,745,000
Avalanche 71 103 3,721,800
Blizzard 779 70 664,913,950
Coastal Flood 6 6 449,682,060
Cold/Wind Chill 177 188 245,579,450
Debris Flow 49 29 327,408,100
Dense Fog 709 51 22,829,500
Dense Smoke 0 0 100,000
Drought 28 4 1,053,038,600
Dust Devil 31 1 719,130
Dust Storm 184 5 5,619,000
Extreme Cold/Wind Chill 0 0 755,000
Flash Flood 1608 776 16,991,195,460
Flood 6757 410 150,113,968,500
Frost/Freeze 2055 62 3,999,037,010
Funnel Cloud 2 0 194,600
Hail 705 32 17,619,950,720
Heat 2208 435 20,125,750
Heavy Rain 137 50 3,253,891,190
Heavy Snow 876 75 1,027,749,740
High Surf 178 81 101,510,500
High Wind 1167 195 5,881,880,960
Hurricane (Typhoon) 1328 106 85,356,410,010
Lakeshore Flood 0 0 7,570,000
Lightning 3825 410 7,365,530,370
Marine High/Strong/Thunderstorm Wind 26 16 7,186,340
Rip Current 157 216 163,000
Seiche 0 0 980,000
Sleet 0 0 0
Storm Surge/Tide 13 24 47,965,474,000
Strong Wind 265 84 188,401,740
Thunderstorm Wind 2746 169 4,494,356,940
Tornado 90447 5588 56,941,932,180
Tropical Depression 0 0 1,737,000
Tropical Storm 380 56 7,714,390,550
Tsunami 129 33 144,062,000
Volcanic Ash 0 0 500,000
Watersprout 71 5 61,235,200
Wildfire 1320 79 8,496,628,500
Winter Weather 1438 144 6,776,307,750

We then focused on answering the 2 main questions of this study.


1) Across the United States, which types of events are most harmful with respect to population health?

For answering this question we identified the events that had the 5 highest total number of injured people and total number of people who died as a direct consequence of the event.

Events that injured people the most

The 5 most harmful events that caused injuries are shown in this table:

Most_Injuries <- Harmful %>%
  select(EVTYPE, ALL_INJURIES) %>%
  arrange(desc(ALL_INJURIES)) %>%
  head(5)
Most_Injuries %>%
  mutate(ALL_INJURIES = format(ALL_INJURIES, big.mark = ",")) %>%
  kable(align = c("l", "c"))
EVTYPE ALL_INJURIES
Tornado 90,447
Flood 6,757
Lightning 3,825
Thunderstorm Wind 2,746
Heat 2,208

The following plot shows the previous results:

Most_Injuries %>%
  ggvis(~EVTYPE, ~ALL_INJURIES) %>%
  layer_bars(fill = ~EVTYPE) %>%
  add_axis("y", title_offset = 60)

Events that killed people the most

The 5 most harmful events that caused fatalities are shown in this table:

Most_Fatalities <- Harmful %>%
  select(EVTYPE, ALL_FATALITIES) %>%
  arrange(desc(ALL_FATALITIES)) %>%
  head(5)
Most_Fatalities %>%
  mutate(ALL_FATALITIES = format(ALL_FATALITIES, big.mark = ",")) %>%
  kable(align = c("l", "c"))
EVTYPE ALL_FATALITIES
Tornado 5,588
Flash Flood 776
Heat 435
Flood 410
Lightning 410

The following plot shows the previous results:

Most_Fatalities %>%
  ggvis(~EVTYPE, ~ALL_FATALITIES) %>%
  layer_bars(fill = ~EVTYPE) %>%
  add_axis("y", title_offset = 60)


2) Across the United States, which types of events have the greatest economic consequences?

For answering this question we identified the events that had the 5 highest property damage costs as a consequence of the event, as shown in this table:

Most_Damage <- Harmful %>%
  select(EVTYPE, ALL_PROPDMG) %>%
  arrange(desc(ALL_PROPDMG)) %>%
  head(5)
Most_Damage %>%
  mutate(ALL_PROPDMG = format(ALL_PROPDMG, big.mark = ",")) %>%
  kable(align = c("l", "c"))
EVTYPE ALL_PROPDMG
Flood 150,113,968,500
Hurricane (Typhoon) 85,356,410,010
Tornado 56,941,932,180
Storm Surge/Tide 47,965,474,000
Hail 17,619,950,720

The following plot shows the previous results:

Most_Damage %>%
  ggvis(~EVTYPE, ~ALL_PROPDMG) %>%
  layer_bars(fill = ~EVTYPE) %>%
  add_axis("y", title_offset = 120)


Conclusions

By far, the most harmful metereological event has been the tornado, which has caused over 90,000 injuries in the last 60 years. Flood, lighting, thunderstorm wind, and heat were among the most harmful events following tornadoes. Similarly, most deadly events were caused also by tornadoes, with about 5,600 deaths in the past 60 years. Flash flood, heat, flood, and lightning were among the most deadly events following tornadoes. Finally, flooding had the greatest economic consequences, with over 150 billion dollars in property damages. Other hydrometereological events that caused great property damage were hurricanes, tornadoes, storm surges/tides, and hail.