1. Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

The objective of this research is to answer the following questions:

  1. Across the United States, which types of events are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

The results shows that tornadoes are the most harmful and floods the most economic damaging events.

2. Data Processing

The data analysed comes from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property and crop damage. So the numbers presented here are an estimate according the NOOA instructions.

Not all the columns from the data are important to this study, and in the process of load of the data, only the following columns are used:

2.1 Load the required libraries

library(memisc)
library(dplyr)
library(stringr)
library(ggplot2)

2.2 Downloads the data file and creates the folder structure

# Initializes variables:
raw.data.dir <- "data/raw"
tidy.data.dir <- "data/tidy"
data.file <- "data/raw/StormData.csv.bz2"
url.file <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"

# Verify the existence of data directory:
if(file.exists("data")){
  unlink("data", recursive = TRUE)   # Erases data directory and its files
}

# Creates new and empty data directories:
dir.create("data")
dir.create("data/tidy")
dir.create("data/raw")

# Downloading the storm data:
download.file(url = url.file, destfile = data.file, method = "wget")

2.3 Cleans the raw data

## Load the raw data file:
if(file.exists(data.file)) {
  data <- read.csv(data.file, stringsAsFactors = FALSE, strip.white = TRUE)  
} else {
  Stop("Data file not found.")
}

# Creates a function to convert PROPDMGEXP e CROPDMGEXP to numeric values:
convert.to.exp <- function(expn = "character") {
  cases(
    (expn == "B" | expn == "b") -> 9,
    (expn == "M" | expn == "m") -> 6,
    (expn == "K" | expn == "k") -> 3,
    (expn == "H" | expn == "h") -> 2,
    (!is.na(as.numeric(expn)))  -> as.numeric(expn),
    (is.na(as.numeric(expn)))   -> 0
    )
}

# Removes unused variables and transforms variables, using a dplyr chain:
data <- tbl_df(data) %>%
  select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, 
         CROPDMGEXP) %>%  # Select variables
  mutate(EVTYPE = toupper(EVTYPE)) %>%  # Transforms values of EVTYPE
  mutate(EVTYPE = str_trim(EVTYPE)) %>% # Removes spaces
  mutate(PROPDMGEXP2 = convert.to.exp(PROPDMGEXP)) %>%  # Inserts temp variables
  mutate(CROPDMGEXP2 = convert.to.exp(CROPDMGEXP)) %>%
  mutate(prop.damage = PROPDMG * 10 ^ PROPDMGEXP2) %>%
  mutate(crop.damage = CROPDMG * 10 ^ CROPDMGEXP2) %>%
  select(-PROPDMG, -PROPDMGEXP, -PROPDMGEXP2, -CROPDMG, -CROPDMGEXP, 
         -CROPDMGEXP2) %>% # Removes unused variables for analisys
  rename(event.type = EVTYPE) %>%   # Renames variables
  rename(fatalities = FATALITIES) %>%
  rename(injuries = INJURIES)

# Writes tidy data in a file:
write.csv(data, file = "data/tidy/severe-wheather-events.csv", row.names = FALSE)

The cleaned data is shown below:

# Print tidy data, ready for analisys:
print(data)
## Source: local data frame [902,297 x 5]
## 
##    event.type fatalities injuries prop.damage crop.damage
## 1     TORNADO          0       15       25000           0
## 2     TORNADO          0        0        2500           0
## 3     TORNADO          0        2       25000           0
## 4     TORNADO          0        2        2500           0
## 5     TORNADO          0        2        2500           0
## 6     TORNADO          0        6        2500           0
## 7     TORNADO          0        1        2500           0
## 8     TORNADO          0        0        2500           0
## 9     TORNADO          1       14       25000           0
## 10    TORNADO          0        0       25000           0
## ..        ...        ...      ...         ...         ...

3. Data Analysis

# Summarizes data by EVTYPE, creating data for analisys:
analisys.data <- data %>%
  group_by(event.type) %>%
  summarize(total.fatalities = sum(fatalities),
            total.injuries = sum(injuries),
            total.prop.damage = sum(prop.damage),
            total.crop.damage = sum(crop.damage)) %>%
  mutate(total.economic.damage = total.prop.damage + total.crop.damage)

# Creates a dataset of the 10 most fatal events:
most.fatal <- analisys.data %>%
  select(event.type, total.fatalities) %>%
  arrange(desc(total.fatalities)) %>%
  slice(1:10) %>%
  # Creates a new variable event.type as a factor, ordering by total.fatalities:
  mutate(event.type2 = reorder(event.type, total.fatalities)) 

# Creates a dataset of the 10 most injurious events:
most.injurious <- analisys.data %>%
  select(event.type, total.injuries) %>%
  arrange(desc(total.injuries)) %>%
  slice(1:10) %>%
  # Creates a new variable event.type as a factor, ordering by total.fatalities:
  mutate(event.type2 = reorder(event.type, total.injuries)) 

# Creates a dataset of the 10 most property damaging events:
most.prop.damaging <- analisys.data %>%
  select(event.type, total.prop.damage) %>%
  arrange(desc(total.prop.damage)) %>%
  slice(1:10) %>%
  # Converts currency to billion dollars:
  mutate(total.prop.damage = total.prop.damage / 10 ^ 9)  %>%
  # Creates a new variable event.type as a factor, ordering by total.damage:
  mutate(event.type2 = reorder(event.type, total.prop.damage))

# Creates a dataset of the 10 most crop damaging events:
most.crop.damaging <- analisys.data %>%
  select(event.type, total.crop.damage) %>%
  arrange(desc(total.crop.damage)) %>%
  slice(1:10) %>%
  # Converts currency to billions dollars:
  mutate(total.crop.damage = total.crop.damage / 10 ^ 9)  %>%
  # Creates a new variable event.type as a factor, ordering by total.damage:
  mutate(event.type2 = reorder(event.type, total.crop.damage))

# Creates a dataset of the 10 most economic damaging events:
most.economic.damaging <- analisys.data %>%
  select(event.type, total.economic.damage) %>%
  arrange(desc(total.economic.damage)) %>%
  slice(1:10) %>%
  # Converts currency to billions of dollars:
  mutate(total.economic.damage = total.economic.damage / 10 ^ 9)  %>%
  # Creates a new variable event.type as a factor, ordering by total.damage:
  mutate(event.type2 = reorder(event.type, total.economic.damage))

4. Results

The results are divided in five subsections:

4.1 The 10 most fatal weather events

According to the results found, the 10 most fatal weather events are:

print(most.fatal[, c("event.type", "total.fatalities")])
## Source: local data frame [10 x 2]
## 
##        event.type total.fatalities
## 1         TORNADO             5633
## 2  EXCESSIVE HEAT             1903
## 3     FLASH FLOOD              978
## 4            HEAT              937
## 5       LIGHTNING              816
## 6       TSTM WIND              504
## 7           FLOOD              470
## 8     RIP CURRENT              368
## 9       HIGH WIND              248
## 10      AVALANCHE              224

The bar graph of the 10 most fatal events is presented below:

# Creates and prints the graph of 10 most fatal events:
graph.most.fatal <- ggplot(most.fatal, aes(x = event.type2, 
                                           y = total.fatalities)) +
  geom_bar(stat = "identity", colour = "black", fill = "red") +
  xlab("Events") + ylab("Estimated number of fatalities") + ylim(0, 6000) +
  ggtitle("10 Most Fatal Weather Events") + coord_flip()
print(graph.most.fatal)

4.2 The 10 most injurious weather events

According to the results found, the 10 most injurious weather events are:

print(most.injurious[, c("event.type", "total.injuries")])
## Source: local data frame [10 x 2]
## 
##           event.type total.injuries
## 1            TORNADO          91346
## 2          TSTM WIND           6957
## 3              FLOOD           6789
## 4     EXCESSIVE HEAT           6525
## 5          LIGHTNING           5230
## 6               HEAT           2100
## 7          ICE STORM           1975
## 8        FLASH FLOOD           1777
## 9  THUNDERSTORM WIND           1488
## 10              HAIL           1361

The bar graph of the 10 most injurious events is presented below:

# Creates and prints the graph of 20 most injurious events
graph.most.injurious <- ggplot(most.injurious, aes(x = event.type2,
                                                   y = total.injuries)) +
  geom_bar(stat = "identity", colour = "black", fill = "orange") +
  xlab("Events") + ylab("Estimated number of injuries") +
  ggtitle("10 Most Injurious Weather Events") + coord_flip()
print(graph.most.injurious)

4.3 The 10 most property damaging weather events

According to the results found, the 10 most property damaging weather events are:

print(most.prop.damaging[, c("event.type", "total.prop.damage")])
## Source: local data frame [10 x 2]
## 
##           event.type total.prop.damage
## 1              FLOOD        144.657710
## 2  HURRICANE/TYPHOON         69.305840
## 3            TORNADO         56.947381
## 4        STORM SURGE         43.323536
## 5        FLASH FLOOD         16.822724
## 6               HAIL         15.735268
## 7          HURRICANE         11.868319
## 8     TROPICAL STORM          7.703891
## 9       WINTER STORM          6.688497
## 10         HIGH WIND          5.270046

Values are presented in billions of US Dollars.

4.4 The 10 most crop damaging weather events

print(most.crop.damaging[, c("event.type", "total.crop.damage")])
## Source: local data frame [10 x 2]
## 
##           event.type total.crop.damage
## 1            DROUGHT         13.972566
## 2              FLOOD          5.661968
## 3        RIVER FLOOD          5.029459
## 4          ICE STORM          5.022113
## 5               HAIL          3.025954
## 6          HURRICANE          2.741910
## 7  HURRICANE/TYPHOON          2.607873
## 8        FLASH FLOOD          1.421317
## 9       EXTREME COLD          1.312973
## 10      FROST/FREEZE          1.094186

Values are presented in billions of US Dollars.

4.5 The 10 most economic damaging weather events

According to the results found, the 10 most economic damaging weather events are:

print(most.economic.damaging[, c("event.type", "total.economic.damage")])
## Source: local data frame [10 x 2]
## 
##           event.type total.economic.damage
## 1              FLOOD            150.319678
## 2  HURRICANE/TYPHOON             71.913713
## 3            TORNADO             57.362334
## 4        STORM SURGE             43.323541
## 5               HAIL             18.761222
## 6        FLASH FLOOD             18.244041
## 7            DROUGHT             15.018672
## 8          HURRICANE             14.610229
## 9        RIVER FLOOD             10.148404
## 10         ICE STORM              8.967041

Values are presented in billions of US Dollars. Note that economic damage values are the sum of property and crop damages.

The bar graph of the 10 most economic damaging events is presented below:

# Creates and prints the graph of 20 most economic damaging events
graph.most.economic.damaging <- ggplot(most.economic.damaging, 
                                       aes(x = event.type2,
                                           y = total.economic.damage)) +
  geom_bar(stat = "identity", colour = "black", fill = "yellow2") +
  xlab("Events") + 
  ylab("Estimated number of economic damages (in billions of US Dollars)") +
  ggtitle("10 Most Economic Damaging Weather Events") + coord_flip()
print(graph.most.economic.damaging)

5. Limitations of this research

The accuracy of the data imposes limitations to the results:

  1. Event though the event types were standardized by NOOA, several event names don’t obey the standards. For example, the event “HEAT” appears in fourteen ways in the database:
heat.names <- grep("HEAT", analisys.data$event.type, value = TRUE)
print(heat.names)
##  [1] "DROUGHT/EXCESSIVE HEAT" "EXCESSIVE HEAT"        
##  [3] "EXCESSIVE HEAT/DROUGHT" "EXTREME HEAT"          
##  [5] "HEAT"                   "HEAT DROUGHT"          
##  [7] "HEAT WAVE"              "HEAT WAVE DROUGHT"     
##  [9] "HEAT WAVES"             "HEAT/DROUGHT"          
## [11] "HEATBURST"              "RECORD HEAT"           
## [13] "RECORD HEAT WAVE"       "RECORD/EXCESSIVE HEAT"

The same occurs in other weather events, like “WIND” and “THUNDERSTORM”.

  1. Some events are categorized with two or more event types, like “WINTER STORM/HIGH WINDS” and “HEAVY SNOW/HIGH WINDS/FREEZING”.

  2. The estimates have been done since 1950, maybe using different procedures.

  3. The database doesn’t take in account the inflation rate for economic damage estimates.