Executive Summary

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This analysis leverages the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, a dataset which tracks the characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. However, more recent years should be considered more complete.

This analysis will focus on the following questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

Data Processing

Load the dataset

echo = TRUE

# First, we run the following line of code to clean up the memory from any previous R sessions. 
rm(list=ls(all=TRUE))

# Load the raw dataset
df <- read.csv("repdata-data-StormData.csv")

View the dataset

echo = TRUE

# Let's look at a few rows of the df...
head(df)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6

Question 1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

We begin by adding the numbers in the ‘FATALITIES’ AND ‘INJURIES’ columns to get the ‘CASUALTIES’ for each event.

echo = TRUE

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# Add numbers in 'FATALITIES' and 'INJURIES' columns
df$CASUALTIES <- df$FATALITIES+df$INJURIES

# Let's sort the df by 'CASUALTIES' descending
df <- df %>% arrange(-CASUALTIES)

We then aggregate the casualties by event type and sort this list by total casualties per event in descending order. To make our plots manageable, we filter for events where the total casualties are greater than 1000.

echo = TRUE

# Aggregate the casualties by event type
casualties_per_event <- aggregate(df$CASUALTIES, by=list(df$EVTYPE), sum)
names(casualties_per_event) <- c("EVTYPE", "TOTAL_CASUALTIES")
casualties_per_event <- casualties_per_event %>% arrange(-TOTAL_CASUALTIES)

# Filter for events where the total casualties is greater than 1000
casualties_per_event <- casualties_per_event[casualties_per_event$TOTAL_CASUALTIES >= 1000,]

Results

The column chart generated below clearly indicates that tornadoes are far and away the most responsible event for human casualties, followed by excessive heat and thunderstorm winds.

# Plot a column chart of Total Casualties vs. Event Type

library(ggplot2)
library(scales)

# png("total_casualties_by_event_type.png")

ggplot(data=casualties_per_event, aes(x = reorder(EVTYPE, -TOTAL_CASUALTIES), y = TOTAL_CASUALTIES)) + geom_bar(position="dodge",stat="identity",color="blue") + xlab("Event Type") + ylab("Number of Casualities") + theme(axis.text.x = element_text(angle = 270,size=15,vjust=0.5,color="blue"), axis.text.y = element_text(size=15,color="blue"),plot.title = element_text(size = 25, face = "bold",hjust=0.5),axis.title = element_text(size=15,face="bold")) + ggtitle("Total Casualties by Event Type") + geom_text(aes(label=TOTAL_CASUALTIES), position=position_dodge(width=0.5), size=5, vjust=-2)+scale_y_continuous(label=comma)

# dev.off()

Question 2: Across the United States, which types of events have the greatest economic consequences?

We begin our analysis using the raw data dataframe, ‘df’, previously loaded.

Data Processing

The first task is to recode all of the multiplier codes in the property damage (PROPDMGEXP) and crop damage (CROPDMGEXP) columns. For example, a “K” in either of these columns indicates that the number in either the PROPDMG or CROPDMG column should be multiplied by 1000. Other values might include “B” for billions and “H” for hundreds.

echo = TRUE

library(dplyr)

# Recode all multipliers for the 'PROPDMG' column

df <- df %>% mutate(PROPDMGMULT = case_when(PROPDMGEXP == "K" ~ 1E3,PROPDMGEXP == "k" ~ 1E3,PROPDMGEXP == "M" ~ 1E6, PROPDMGEXP == "m" ~ 1E6,PROPDMGEXP == "B" ~ 1E9,PROPDMGEXP == "b" ~ 1E9, PROPDMGEXP == "" ~ 1, TRUE ~ 10))

# Recode all multipliers for the 'CROPDMG' column

df <- df %>% mutate(CROPDMGMULT = case_when(CROPDMGEXP == "K" ~ 1E3,CROPDMGEXP == "k" ~ 1E3,CROPDMGEXP == "M" ~ 1E6, CROPDMGEXP == "m" ~ 1E6,CROPDMGEXP == "B" ~ 1E9,CROPDMGEXP == "b" ~ 1E9, CROPDMGEXP == "" ~ 1,TRUE ~ 10))

Calculate the total damage.

# Calculate amount of total damage

df$TOTAL_DAMAGE <- df$PROPDMG*df$PROPDMGMULT + df$CROPDMG*df$CROPDMGMULT

We then aggregate the total damage by event type and sort this list by total damage per event in descending order. To make our plots manageable, we divide the damage numbers by 1E9 and filter for events where the total damages are greater than $5B.

# Aggregate the total damage by event type
total_damage_per_event <- aggregate(df$TOTAL_DAMAGE, by=list(df$EVTYPE), sum)
names(total_damage_per_event) <- c("EVTYPE", "TOTAL_DAMAGE")

# The total damage numbers are big, so divide by 1E9 to get numbers in $B.
total_damage_per_event$TOTAL_DAMAGE <- round(total_damage_per_event$TOTAL_DAMAGE/1E9,1)

total_damage_per_event <- total_damage_per_event %>% arrange(-TOTAL_DAMAGE)

# Filter for events where the total damage is greater than $5B
total_damage_per_event <- total_damage_per_event[total_damage_per_event$TOTAL_DAMAGE >= 5,]

Results

The column chart generated below clearly indicates that floods are far and away the most responsible event for property and crop damages, followed by hurricanes/typhoons and tornadoes.

# Plot a column chart of Total Casualties vs. Event Type

library(ggplot2)

# png("total_damages_by_event_type.png")

ggplot(data=total_damage_per_event, aes(x = reorder(EVTYPE, -TOTAL_DAMAGE), y = TOTAL_DAMAGE)) + geom_bar(position="dodge",stat="identity",color="blue") + xlab("Event Type") + ylab("Total Damages ($B)") + theme(axis.text.x = element_text(angle = 270,size=15,vjust=0.5,color="blue"), axis.text.y = element_text(size=15,color="blue"),plot.title = element_text(size = 25, face = "bold",hjust=0.5),axis.title = element_text(size=15,face="bold")) + ggtitle("Total Damages by Event Type") + geom_text(aes(label=TOTAL_DAMAGE), position=position_dodge(width=0.5), size=5, vjust=-2)+scale_y_continuous(label=comma)

# dev.off()

Summary

Our analysis of this dataset provided by the U.S. National Oceanic and Atmospheric Administration’s (NOAA) shows that tornadoes are the largest culprit for human casualties and that floods have the largest negative impact on an economy in that they represent the largest amounts in terms of property and crop damages.