Synopsis

This document seeks to identify the climatic events that had the greatest impact in the economy and health of the American population between 1996 and 2011, by analysing the U.S. National Oceanic and Atmospheric Administration’s database.

The exploratory analysis of data suggests that tornados and excessive heat had the greatest impact on health, while floods and hurricanes/typhoons had the greatest impact in the economy.

Data Processing

In this section we import, tidy and explore the U.S. National Oceanic and Atmospheric Administration’s (NOAA) database.

First, we load the R libraries that will be used throughout the process.

library(dplyr, verbose = FALSE, quietly = TRUE)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(lubridate, verbose = FALSE, quietly = TRUE)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
library(knitr, verbose = FALSE, quietly = TRUE)
library(tidyr, verbose = FALSE, quietly = TRUE)
library(ggplot2, verbose = FALSE, quietly = TRUE)

Load the data

We need to download the U.S. National Oceanic and Atmospheric Administration’s (NOAA) database, available at this link.

More information about this database can be found in this document from the National Weather Service Storm Data and in the FAQ.

if (!file.exists("storm.csv.bz2")) {
  download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "storm.csv.bz2")
}

data <- read.csv("storm.csv.bz2", stringsAsFactors = FALSE)

Clean the data

We select from the database the variables that will be used during the analysis:

  • EVTYPE: Climatic event type.
  • BGN_DATE: Start date of the climatic event.
  • FATALITIES: Number of deaths caused by the climact evet.
  • INJURIES: Number of injured caused by the climatic event.
  • PROPDMG: Property damage caused by the climatic event.
  • PROPDMGEXP: Multiplier of property damage.
  • CROPDMG: Crops damage caused by the climatic event..
  • CROPDMGEXP: Multiplier of crops damage.
sel <- select(data, EVTYPE, BGN_DATE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

We correct the start date of the climatic event and select the events that happened since 1996. Although there is data available since 1950, we will only consider data from 1996 onwards because information regarding climatic events is incomplete prior to that year.

sel$BGN_DATE <- mdy_hms(sel$BGN_DATE)
sel <- filter(sel, BGN_DATE >= ymd("1996-01-01"))

Property damage and crops damage is expressed in the conjunction of two variables. The first one determines a value, and the second one the multiplier, which is given by the expressions ā€œKā€, ā€œMā€ and ā€œBā€. These expressions represent thousands, millions and billions, respectively. Through the following transformations, we will make variables PROPDMG and CROPDMG contain all the information about the amount of economic damage caused by the climatic event, expressed in billions of dollars.

sel$PROPDMG <- ifelse(sel$PROPDMGEXP == "K", sel$PROPDMG * 1000, sel$PROPDMG)
sel$PROPDMG <- ifelse(sel$PROPDMGEXP == "M", sel$PROPDMG * 1000000, sel$PROPDMG)
sel$PROPDMG <- ifelse(sel$PROPDMGEXP == "B", sel$PROPDMG * 1000000000, sel$PROPDMG)
sel$PROPDMG <- sel$PROPDMG / 1000000000

sel$CROPDMG <- ifelse(sel$CROPDMGEXP == "K", sel$CROPDMG * 1000, sel$CROPDMG)
sel$CROPDMG <- ifelse(sel$CROPDMGEXP == "M", sel$CROPDMG * 1000000, sel$CROPDMG)
sel$CROPDMG <- ifelse(sel$CROPDMGEXP == "B", sel$CROPDMG * 1000000000, sel$CROPDMG)
sel$CROPDMG <- sel$CROPDMG / 1000000000

Exploratory data analysis

In order to determine the impact on health and economy of the different climatic events, we sum up the damage generated by the total number of the same type of climatic events that occurred between 1996 and 2011 in the US. The data frame results will contain information about injuries and fatalities per event. We also aggregate these two variables in health, which summarises the total amount of affected people in each event. The economic damage is expressed by the variables property and crop, aggregated in economic.

results <- sel %>%
  group_by(EVTYPE) %>%
  summarise(injuries = sum(INJURIES), fatalities = sum(FATALITIES), property = sum(PROPDMG), crop = sum(CROPDMG)) %>%
  mutate(health = injuries + fatalities, economic = property + crop)

From this new data frame the main climatic events that have an impact on population health emerge.

table <- results %>%
  select(EVTYPE, injuries, fatalities, health) %>%
  transform(injuries = round(injuries, 2), fatalities = round(fatalities, 2), health = round (health, 2)) %>%
  arrange(desc(health)) %>%
  top_n(10, health)

kable(table, col.names = c("Weather event", "Injuries", "Fatalities", "Total health damage"))
Weather event Injuries Fatalities Total health damage
TORNADO 20667 1511 22178
EXCESSIVE HEAT 6391 1797 8188
FLOOD 6758 414 7172
LIGHTNING 4141 651 4792
TSTM WIND 3629 241 3870
FLASH FLOOD 1674 887 2561
THUNDERSTORM WIND 1400 130 1530
WINTER STORM 1292 191 1483
HEAT 1222 237 1459
HURRICANE/TYPHOON 1275 64 1339

This same information can be appreciated through the following plot:

table <- table %>%
  select(EVTYPE, injuries, fatalities) %>%
  gather(-EVTYPE, key = "var",  value = "value") %>%
  arrange(var)

plot <- ggplot(table, aes(x = reorder(EVTYPE, value), y = value)) + 
  geom_bar(aes(fill = var), stat = "identity", position = "stack") +
  coord_flip() +
  labs(title = "Health impact of weather events", x = "weather event", y = "people affected") +
  scale_fill_manual(name = "", values=c("black", "gray"), labels = c("Fatalities", "Injuries"))

plot

Following the same procedure, the main climatic events that have an impact on population economy emerge:

table2 <- results %>%
  select(EVTYPE, property, crop, economic) %>%
  transform(property = round(property, 2), crop = round(crop, 2), economic = round (economic, 2)) %>%
  arrange(desc(economic)) %>%
  top_n(10, economic)

kable(table2, col.names = c("Weather event", "Property damage", "Crop damage", "Total economic damage"))
Weather event Property damage Crop damage Total economic damage
FLOOD 143.94 4.97 148.92
HURRICANE/TYPHOON 69.31 2.61 71.91
STORM SURGE 43.19 0.00 43.19
TORNADO 24.62 0.28 24.90
HAIL 14.60 2.48 17.07
FLASH FLOOD 15.22 1.33 16.56
HURRICANE 11.81 2.74 14.55
DROUGHT 1.05 13.37 14.41
TROPICAL STORM 7.64 0.68 8.32
HIGH WIND 5.25 0.63 5.88

This information can build the following plot:

table2 <- table2 %>%
  select(EVTYPE, property, crop) %>%
  gather(-EVTYPE, key = "var",  value = "value") %>%
  arrange(var)

plot <- ggplot(table2, aes(x = reorder(EVTYPE, value), y = value)) + 
  geom_bar(aes(fill = var), stat = "identity", position = "stack") +
  coord_flip() +
  labs(title = "Economic impact of weather events", x = "weather event", y = "billions of dollars in damage") +
  scale_fill_manual(name = "", values=c("black", "gray"), labels = c("Crop", "Property"))

plot

Results

As a result of the exploratory analysis it can be concluded that tornados are the type of climatic event that had the greatest impact on American population health between 1996 and 2011. The impact of tornados is significantly higher than the second type of event in the ranking, which is excessive heat. Even though tornados were the type of event that produced more injuries, the larger amount of fatalities were caused by excessive heat.

Regarding economic damage, floods are the type of climatic event that have a greater impact, followed by hurricanes/typhoons. If only property damage is considered, floods are the more damaging, while crops are more affected by droughts than by any other type of climatic event.