Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

We will investigate which weather events have had the greatest consequences for both population health (Tornados) and the economy (Floods) in the United States during the period from 1950-2011.

Data Processing

The raw data for the analysis was obtained from the url below.

url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, destfile = "project2_data.bz2")
data <- read.csv(file = "project2_data.bz2")

This is a large dataset and we only require certain columns to answer the questions we are interested in so let’s subset the dataframe.

library(dplyr)
colnames(data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
data <- data %>% 
  select(REFNUM, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
head(data)
##   REFNUM  EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1      1 TORNADO          0       15    25.0          K       0           
## 2      2 TORNADO          0        0     2.5          K       0           
## 3      3 TORNADO          0        2    25.0          K       0           
## 4      4 TORNADO          0        2     2.5          K       0           
## 5      5 TORNADO          0        2     2.5          K       0           
## 6      6 TORNADO          0        6     2.5          K       0

Conversion of DMG & EXP columns to actual numbers

For both property and crop damage a number is provided along with an exponent which should be applied to the number i.e. “H” for hundred, “T” for thousand etc.

unique(data$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
unique(data$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M

Some of the the exponent keys are provided in lower case so let’s convert them all to uppercase for consistency.

data$PROPDMGEXP <- toupper(data$PROPDMGEXP)
data$CROPDMGEXP <- toupper(data$CROPDMGEXP)
unique(data$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "+" "0" "5" "6" "?" "4" "2" "3" "H" "7" "-" "1" "8"
unique(data$CROPDMGEXP)
## [1] ""  "M" "K" "B" "?" "0" "2"

Then convert property exponent codes to numeric values and create a new column with the value of damage to property for each event.

# Map property damage alphanumeric exponents to numeric values.
PROPDMGEXP <- c("\"\"", "-", "+", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "H", "K", "M", "B")
prop_numeric <- c(10^0, 10^0, 10^0, 10^0, 10^1, 10^2, 10^3, 10^4, 10^5, 10^6, 10^7, 10^8, 10^9, 10^2, 10^3, 10^6, 10^9)
prop_dmg <- data.frame(PROPDMGEXP, prop_numeric)
library(dplyr)
data <- full_join(data, prop_dmg, by = "PROPDMGEXP") %>% 
  mutate(Property_Damage = PROPDMG*prop_numeric)
head(data) #sanity check
##   REFNUM  EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1      1 TORNADO          0       15    25.0          K       0           
## 2      2 TORNADO          0        0     2.5          K       0           
## 3      3 TORNADO          0        2    25.0          K       0           
## 4      4 TORNADO          0        2     2.5          K       0           
## 5      5 TORNADO          0        2     2.5          K       0           
## 6      6 TORNADO          0        6     2.5          K       0           
##   prop_numeric Property_Damage
## 1         1000           25000
## 2         1000            2500
## 3         1000           25000
## 4         1000            2500
## 5         1000            2500
## 6         1000            2500

Now let’s do the same to calculate the damage to crops from each event and convert any NAs to zero so we can calculate the total economic damage caused by each event later.

# Map crop damage alphanumeric exponents to numeric values
CROPDMGEXP <- c("\"\"", "?", "0", "2", "K", "M", "B")
crop_numeric <- c(10^0, 10^0, 10^0, 10^2, 10^3, 10^6, 10^9)
crop_dmg <- data.frame(CROPDMGEXP, crop_numeric)
data <- full_join(data, crop_dmg, by = "CROPDMGEXP") %>% 
  mutate(Crop_Damage = CROPDMG*crop_numeric)
data[is.na(data)] <- 0 #replace NAs with 0

Results

Across the United States, which types of events are most harmful with respect to population health?

library(dplyr); library(ggplot2)
A <- data %>% 
  group_by(EVTYPE) %>% 
  summarise(Total_Fatalities = sum(FATALITIES)) %>% 
  arrange(desc(Total_Fatalities)) %>% 
  slice(1:20) %>% 
  ggplot(aes(x=reorder(EVTYPE, Total_Fatalities), y=Total_Fatalities)) +
  geom_bar(stat="identity", fill = "red") +
  coord_flip() + theme_minimal() +
  ggtitle("Top 20 event types causing fatalities across the United States between 1950 and 2011") +
  xlab("Event type") + ylab("Number of fatalities")
B <- data %>% 
  group_by(EVTYPE) %>% 
  summarise(Total_Injuries = sum(INJURIES)) %>% 
  arrange(desc(Total_Injuries)) %>% 
  slice(1:20) %>% 
  ggplot(aes(x=reorder(EVTYPE, Total_Injuries), y=Total_Injuries)) +
  geom_bar(stat="identity", fill = "orange") +
  coord_flip() + theme_minimal() +
  ggtitle("Top 20 event types causing injuries across the United States between 1950 and 2011") +
  xlab("Event type") + ylab("Number of injuries")
ggpubr::ggarrange(A,B, nrow = 2, labels = c("A", "B"))

Across the United States tornados have caused both the most fatalities (A) and the most injuries (B). Excessive heat and heat both figure prominently in the ranks of causes of fataltities and injuries, as do floods and flash floods. Lightning and TSTM wind are also harmful for population health.

Across the United States, which types of events have the greatest economic consequences?

library(dplyr); library(ggplot2)
data %>% 
  mutate(Economic_damage = Property_Damage + Crop_Damage) %>% #add together costs incurred to property and crops
  group_by(EVTYPE) %>% 
  summarise(Total_damage = sum(Economic_damage)) %>% #get total cost for each event type
  arrange(desc(Total_damage)) %>% #sort by Total damage per event type
  slice(1:20) %>% #take the top 20
  ggplot(aes(x=reorder(EVTYPE, Total_damage), y=Total_damage)) + #do a bar plot
  geom_bar(stat="identity", fill = "green") +
  coord_flip() + theme_minimal() +
  ggtitle("Economic consequences of each of the top 20 event types across the US (1950 and 2011)") +
  xlab("Event type") + ylab("Damage in USD")

The event type with the greatest economic consequences, to property and crops combined, is floods. The cost due to floods is over twice the amount over the next most expensive event types, hurricaine/typhoon and tornado.

Summary

Tornados are the weather event with the greatest risk to public health both in terms of injuries caused and fatalities. However, tornados rank third for economic consequence, dwarfed by the cost incurred due to damage to crops and property by flooding.