The following anlysis aims to determine which extreme weather events are the most impactful in terms of both public health and economics. Using data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, wich contains storm data for the years 1950-2011. We measure public health impact as the total number of injuries and fatalities resulting from a weather event, and the economic impact as the total estimated cost of crop and property damage. From these data, we find that tornadoes have the largest impact on public health, while flooding has the largest economic impact.

Data Processing

We first download the data from the link provided to the /data folder in the working directory, and then read the data in to R. We then want to determine how many records contain missing data.

if(!dir.exists("./data")){
    dir.create("./data")
}

download.file('https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2', "./data/NOAA.bz2")

data <- read.csv("./data/NOAA.bz2")

pct_na <- mean(is.na(data))

5.23% of observations contain missing values. This is a relatively low percentage, so we will ignore missing values for this analysis.

Our first step in cleaning the data is to address the PROPDMGEXP and CROPDMGEXP variables. These fields are multiplied by the cost estimate for the respective damage to arrive at the total cost estimate. This field is currently a character representing the multipliler (M for million, B for billion, etc). We will standardize this field by setting all entries to upper case, then create a new field that represents the numeric value of the character field.

## convert all 'EXP' variables to upper case
data$PROPDMGEXP <- toupper(data$PROPDMGEXP)
data$CROPDMGEXP <- toupper(data$CROPDMGEXP)


data$PROPDMGMULT <- with(data, ifelse(PROPDMGEXP == 'K', 1000, ifelse(PROPDMGEXP == 'M', 1000000, ifelse(PROPDMGEXP == 'B', 1000000000,ifelse(PROPDMGEXP == 'H', 100, NA)))))

data$CROPDMGMULT <- with(data, ifelse(CROPDMGEXP == 'K', 1000, ifelse(CROPDMGEXP == 'M', 1000000, ifelse(CROPDMGEXP == 'B', 1000000000,ifelse(CROPDMGEXP == 'H', 100, NA)))))

Next we will create a summary table that calculates the total injuries, fatalities, property damage cost, and crop damage cost for each weather event type. This table will be used as the basis for all analysis.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
data_sum <- summarize(group_by(data, EVTYPE), injuries = sum(INJURIES), fatalities = sum(FATALITIES), property_damage = sum(PROPDMGMULT * PROPDMG, na.rm = TRUE), crop_damage = sum(CROPDMGMULT * CROPDMG, na.rm = TRUE))

head(data_sum[order(-data_sum$injuries), ], 20)
## # A tibble: 20 x 5
##                EVTYPE injuries fatalities property_damage crop_damage
##                <fctr>    <dbl>      <dbl>           <dbl>       <dbl>
##  1            TORNADO    91346       5633     56937160480   414953110
##  2          TSTM WIND     6957        504      4484928440   554007350
##  3              FLOOD     6789        470    144657709800  5661968450
##  4     EXCESSIVE HEAT     6525       1903         7753700   492402000
##  5          LIGHTNING     5230        816       928659280    12092090
##  6               HEAT     2100        937         1797000   401461500
##  7          ICE STORM     1975         89      3944927810  5022113500
##  8        FLASH FLOOD     1777        978     16140811510  1421317100
##  9  THUNDERSTORM WIND     1488        133      3483121140   414843050
## 10               HAIL     1361         15     15732267220  3025954450
## 11       WINTER STORM     1321        206      6688497250    26944000
## 12  HURRICANE/TYPHOON     1275         64     69305840000  2607872800
## 13          HIGH WIND     1137        248      5270046260   638571300
## 14         HEAVY SNOW     1021        127       932589140   134653100
## 15           WILDFIRE      911         75      4765114000   295472800
## 16 THUNDERSTORM WINDS      908         64      1735954850   190654700
## 17           BLIZZARD      805        101       659213950   112060000
## 18                FOG      734         62        13155500           0
## 19   WILD/FOREST FIRE      545         12      3001829500   106796830
## 20         DUST STORM      440         22         5549000     3100000

We see several similar weather events when we view the top 20 events by injury count. For example, TSTM WIND and Thunderstorm Wind refer to the same event. There are also several heat related events. For this analysis, the following events will be combined:
* TSTM WIND, THUNDERSTORM WIND
* HEAT, EXCESSIVE HEAT, HEAT WAVE, EXTREME HEAT
* BLIZZARD, HEAVY SNOW
* FLOOD, FLASH FLOOD
* HIGH WIND, HIGH WINDS
* FOG, DENSE FOG

## combine EVTYPE as described
data$EVTYPE[data$EVTYPE == "THUNDERSTORM WIND"] <- "TSTM WIND"

data$EVTYPE[data$EVTYPE == "HEAT"] <- "EXTREME HEAT"
data$EVTYPE[data$EVTYPE == "EXCESSIVE HEAT"] <- "EXTREME HEAT"
data$EVTYPE[data$EVTYPE == "HEAT WAVE"] <- "EXTREME HEAT"

data$EVTYPE[data$EVTYPE == "BLIZZARD"] <- "HEAVY SNOW"

data$EVTYPE[data$EVTYPE == "FLASH FLOOD"] <- "FLOOD"

data$EVTYPE[data$EVTYPE == "DENSE FOG"] <- "FOG"

## re-create data_sum table with revised EVTYPE
data_sum <- summarize(group_by(data, EVTYPE), injuries = sum(INJURIES), fatalities = sum(FATALITIES), property_damage = sum(PROPDMGMULT * PROPDMG, na.rm = TRUE), crop_damage = sum(CROPDMGMULT * CROPDMG, na.rm = TRUE))

Viewing the top 20 events by property damage reveals additional event types that can be combined

head(data_sum[order(-data_sum$property_damage), ], 20)
## # A tibble: 20 x 5
##                        EVTYPE injuries fatalities property_damage
##                        <fctr>    <dbl>      <dbl>           <dbl>
##  1                      FLOOD     8566       1448    160798521310
##  2          HURRICANE/TYPHOON     1275         64     69305840000
##  3                    TORNADO    91346       5633     56937160480
##  4                STORM SURGE       38         13     43323536000
##  5                       HAIL     1361         15     15732267220
##  6                  HURRICANE       46         61     11868319010
##  7                  TSTM WIND     8445        637      7968049580
##  8             TROPICAL STORM      340         58      7703890550
##  9               WINTER STORM     1321        206      6688497250
## 10                  HIGH WIND     1137        248      5270046260
## 11                RIVER FLOOD        2          2      5118945500
## 12                   WILDFIRE      911         75      4765114000
## 13           STORM SURGE/TIDE        5         11      4641188000
## 14                  ICE STORM     1975         89      3944927810
## 15             HURRICANE OPAL        1          1      3172846000
## 16           WILD/FOREST FIRE      545         12      3001829500
## 17  HEAVY RAIN/SEVERE WEATHER        0          0      2500000000
## 18         THUNDERSTORM WINDS      908         64      1735954850
## 19 TORNADOES, TSTM WIND, HAIL        0         25      1600000000
## 20                 HEAVY SNOW     1826        228      1591803090
## # ... with 1 more variables: crop_damage <dbl>

The following events will be combined using the same process as before:
* RIVER FLOOD, FLOOD, FLASH FLOOD/FLOOD, FLASH FLOODING
* HURRICANE OPAL, HURRICANE
* THUNDERSTORM WINDS, TSTM WIND, SEVERE THUNDERSTORM

## combine EVTYPE as described
data$EVTYPE[data$EVTYPE == "RIVER FLOOD"] <- "FLOOD"
data$EVTYPE[data$EVTYPE == "FLASH FLOOD/FLOOD"] <- "FLOOD"
data$EVTYPE[data$EVTYPE == "FLASH FLOODING"] <- "FLOOD"

data$EVTYPE[data$EVTYPE == "HURRICANE OPAL"] <- "HURRICANE/TYPHOON"
data$EVTYPE[data$EVTYPE == "HURRICANE"] <- "HURRICANE/TYPHOON"
data$EVTYPE[data$EVTYPE == "TYPHOON"] <- "HURRICANE/TYPHOON"

data$EVTYPE[data$EVTYPE == "THUNDERSTORM WINDS"] <- "TSTM WIND"
data$EVTYPE[data$EVTYPE == "SEVERE THUNDERSTORM"] <- "TSTM WIND"

## re-create data_sum table with revised EVTYPE
data_sum <- summarize(group_by(data, EVTYPE), injuries = sum(INJURIES), fatalities = sum(FATALITIES), property_damage = sum(PROPDMGMULT * PROPDMG, na.rm = TRUE), crop_damage = sum(CROPDMGMULT * CROPDMG, na.rm = TRUE))

The data is now sufficiently cleaned for the purposes of this analysis.

Results

Public Health Impacts

In assessing the public health impacts of extreme weather events, we consider the total injuries and fatalities for each event type. A new variable is created that combines the total injury and fatality count for each weather event to represent the total health impact of the event. We then plot the top five weather events based on this new variable

## load required packages
library(reshape2)
library(ggplot2)

## filter and sort data by health impact
health_total <- with(data_sum, data.frame(EVTYPE, injuries, fatalities, total_impact = injuries + fatalities))
health_total <- head(health_total[order(-health_total$total_impact), ], 5)

## format data for plotting
health_melted <- melt(health_total, id = c("EVTYPE", "total_impact"))

## plot by total health impact
ggplot(health_melted, aes(EVTYPE, value, fill = variable)) + geom_col(position = position_stack(reverse = TRUE)) + aes(reorder(EVTYPE, -total_impact)) + xlab("Weather Event Type") + ggtitle("Public Health Impact of Extreme Weather Events", subtitle = '1950 - 2011') + theme(legend.title = element_blank()) + ylab("Total Injuries and Fatalities")

The above chart shows that tornadoes cause significantly more injuries than any other extreme weather event. Tornadoes have caused 96979 injuries and fatalities since 1950. Extreme Heat has caused the second most, at only 12197. Floods, Thunderstorms, and Lightning round out the top five. It should be noted that while Extreme heat has a much lower total incident count that tornadoes, it appears to have the highest percentage of incidents result in a fatality compared to the other events in the top five.

Economic Impacts

Next we will analyze the economic impact of extreme weather. The overall economic impact of a weather event will be considered the sum of the estimated cost of crop damages and property damages resulting from the event. We will create a column chart following the same process used to analyze public health.

## filter and sort data by economic impact
econ_total <- with(data_sum, data.frame(EVTYPE, property_damage, crop_damage, total_damage = property_damage + crop_damage))
econ_total <- head(econ_total[order(-econ_total$total_damage), ], 5)

## format data for plotting
melted <- melt(econ_total, id = c("EVTYPE", "total_damage"))

## plot total economic impact
ggplot(melted, aes(EVTYPE, value/1000000000, fill = variable)) + geom_col(position = position_stack(reverse = TRUE)) + aes(reorder(EVTYPE, -total_damage)) + xlab("Weather Event Type") + ggtitle("Economic Impact of Extreme Weather Events", subtitle = '1950 - 2011') + theme(legend.title = element_blank(), axis.text.x = element_text(size = 7)) + ylab("Estimated Cost (in billions)")

The distribution of economic impacts are more evenly spread than what was seen with public health impacts. Flooding is responsible for the highest estimated cost at $178.6 billion. Of the remaining event types in the top five, neither Tornado or Storm Surge reported any economic losses from crop damages. Overall, a majority of the economic impact of extreme weather is due to property damage

Conclusion

Based on the above analysis, it appears that tornadoes and flooding are the most impactful extreme weather events in terms of both public health and economic impacts. These events not only hold the top ranking in health and economic impact, repsectively, they are also the only two events that rank in the top five in both health and economic impacts.