Measuring the Economic and Human Impact of Weather Events in the United States, 1996-2011

Synopsis

The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Injuries and Fatalities

Based on total amounts from 1996 to 2011, (and by weighting each injury as 10 percent of one fatality,) tornadoes are the deadliest event type in the United States. However, if you order the event types by average population harm per event, it turns out you are nearly 15 times as likely to be harmed if you find yourself dealing with a tsunami than with a tornado. But, tsunamis occur far less often, which is why they do not show up in the top ten most dangerous types in the first analysis.

Economic Impact

Based on total property and crop damage amounts from 1996 to 2011, flooding is the costliest event type in the United States. However, if you order the event types by average damage per event, it turns out each hurricane is more than 130 times as costly as each flooding incident. Although they occur far less often, that disparity in terms of average economic impact means that hurricanes are still the second-costliest event type overall.

Data Processing

Reading in the Data

Check for the file in your working directory. If it does not already exist, download the noaa.csv.bz2 file. Use read.csv to read in the data.

destfile <- "noaa.csv.bz2"
fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"

if(!file.exists(destfile)) {
        download.file(fileURL, destfile, method = "curl")
}

noaa <- read.csv(destfile)

Selecting Appropriate Variables

In order to answer the two research questions, you only need to keep the BGN_DATE EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, and CROPDMGEXP variables. Use the dplyr package to select only those columns.

library(dplyr)
noaa <- tbl_df(noaa)
noaa <- select(noaa, BGN_DATE, EVTYPE, FATALITIES:CROPDMGEXP)

Cleaning Event Type Variable

Next, clean up the EVTYPE column. You’ll notice that there are 985 different factors, but according to the National Weather Service, there are actually just 48 different types of recognized storms.

n_distinct(noaa$EVTYPE)
## [1] 985

Note that these 48 event types were standardized after 1996. So filter by all dates on or after 1996-01-01.

noaa$BGN_DATE <- as.POSIXct(strptime(noaa$BGN_DATE, "%m/%d/%Y %T"))
noaa <- filter(noaa, BGN_DATE >= "1996-01-01")

Before filtering by the list of 48 event types mentioned above, first clean some of the data entry errors. Use the toupper function to standardize the variable.

noaa$EVTYPE <- toupper(noaa$EVTYPE)

Then, clean up some of the abbreviations and extraneous notations.

library(stringr)
noaa$EVTYPE <- str_trim(noaa$EVTYPE)
noaa$EVTYPE <- gsub("TSTM", "THUNDERSTORM", noaa$EVTYPE)
noaa$EVTYPE <- gsub("G[0-9][0-9]", "", noaa$EVTYPE)
noaa$EVTYPE <- gsub("/MIX", "", noaa$EVTYPE)
noaa$EVTYPE <- gsub("WINDS", "WIND", noaa$EVTYPE)
noaa$EVTYPE <- gsub("FLD", "FLOOD", noaa$EVTYPE)
noaa$EVTYPE <- gsub("FLOODING", "FLOOD", noaa$EVTYPE)
noaa$EVTYPE <- gsub("/FOREST ", "", noaa$EVTYPE)
noaa$EVTYPE <- gsub("/TYPHOON", " (TYPHOON)", noaa$EVTYPE)
noaa$EVTYPE <- gsub("/HAIL", "", noaa$EVTYPE)

Using the PDF linked above, copy the event types, and save them to a list vector labeled types. Then filter by types. Check to see that you have ~48 distinct types.

types <- toupper(c("Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood", "Cold/Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke", "Drought", "Dust Devil", "Dust Storm", "Excessive Heat", "Extreme Cold/Wind Chill", "Flash Flood", "Flood", "Frost/Freeze", "Funnel Cloud", "Freezing Fog", "Hail", "Heat", "Heavy Rain", "Heavy Snow", "High Surf", "High Wind", "Hurricane (Typhoon)", "Ice Storm", "Lake-Effect Snow", "Lakeshore Flood", "Lightning", "Marine Hail", "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind", "Rip Current", "Seiche", "Sleet", "Storm Surge/Tide", "Strong Wind", "Thunderstorm Wind", "Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather"))

noaa <- filter(noaa, EVTYPE %in% types)
noaa$EVTYPE <- as.factor(noaa$EVTYPE)
n_distinct(noaa$EVTYPE)
## [1] 47

Calculating Harm to Population Health

Create a new table called noaaPopHealth. Summarize by event type, and add two variables. The first, total.harm, adds the number of fatalities and injuries (weighted by 10 percent). The second, avg.harm, takes total.harm and divides it by the number of instances of each event to calculate the average impact.

noaaPopHealth <- noaa %>%
        group_by(EVTYPE) %>%
        summarise(total.harm = sum(FATALITIES, INJURIES*.1), avg.harm = total.harm / length(EVTYPE)) %>%
        rename(event.type = EVTYPE)

Calculating Economic Impact

Prepare to adjust the property damage and crop damage variables by their exponentials. Do so, first, by replacing letters with numbers in the exponential columns.

noaa$PROPDMGEXP <- ifelse(noaa$PROPDMGEXP == "K", 1000, ifelse(noaa$PROPDMGEXP == "M", 1000000, ifelse(noaa$PROPDMGEXP == "B", 1000000000, 1)))
noaa$CROPDMGEXP <- ifelse(noaa$CROPDMGEXP == "K", 1000, ifelse(noaa$CROPDMGEXP == "M", 1000000, ifelse(noaa$CROPDMGEXP == "B", 1000000000, 1)))

Create a new table called noaaEcon. Summarize by event type, and add two variables. The first, total.econdamage, adds the total amounts of property damage and crop damage. The second, avg.econdamage, takes total.econdamage and divides it by the number of instances of each event to calculate the average amount of damage.

noaaEcon <- noaa %>%
        group_by(EVTYPE) %>%
        summarise(prop.econdamage = sum(PROPDMG*PROPDMGEXP), avg.propdamage = prop.econdamage / length(EVTYPE), crop.econdamage = sum(CROPDMG*CROPDMGEXP), avg.cropdamage = crop.econdamage / length(EVTYPE), total.econdamage = sum(prop.econdamage, crop.econdamage), avg.econdamage = total.econdamage / length(EVTYPE)) %>%
        rename(event.type = EVTYPE)

Results

Which types of events are most harmful with respect to population health?

In order to save space, isolate the top ten event types by total and average harm.

topPopHealth <- noaaPopHealth %>%
        arrange(desc(total.harm)) %>%
        top_n(10, total.harm)
event.type total.harm avg.harm
TORNADO 3578 0.1545
EXCESSIVE HEAT 2436 1.471
FLOOD 1090 0.04494
LIGHTNING 1065 0.08066
FLASH FLOOD 1054 0.02067
THUNDERSTORM WIND 888.4 0.004208
RIP CURRENT 360.9 0.8354
HEAT 359.2 0.5017
HIGH WIND 343.3 0.01724
WINTER STORM 320.2 0.02829
topavgPopHealth <- noaaPopHealth %>%
        arrange(desc(avg.harm)) %>%
        top_n(10, avg.harm)
event.type total.harm avg.harm
TSUNAMI 45.9 2.295
HURRICANE (TYPHOON) 191.5 2.176
EXCESSIVE HEAT 2436 1.471
RIP CURRENT 360.9 0.8354
AVALANCHE 238.6 0.6312
HEAT 359.2 0.5017
MARINE STRONG WIND 16.2 0.3375
COLD/WIND CHILL 96.2 0.1785
TORNADO 3578 0.1545
HIGH SURF 105 0.1446

Then, using the ggplot2 package, plot both of them together, one on top of the other.

library(ggplot2)
library(gridExtra)

ggtotalharm <- ggplot(topPopHealth, aes(x=reorder(event.type, total.harm), y=total.harm)) + geom_bar(stat="identity") + coord_flip() + xlab("") + ylab("Total Population Harm (Fatalities + Injuries*.1)")

ggavgharm <- ggplot(topavgPopHealth, aes(x=reorder(event.type, avg.harm), y=avg.harm)) + geom_bar(stat="identity") + coord_flip() + xlab("") + ylab("Average Population Harm (Fatalities + Injuries*.1) per Event")

grid.arrange(ggtotalharm, ggavgharm, nrow=2, top="Top Ten Harmful Event Types, by Total and Average Population Harm")

Which types of events have the greatest economic consequences?

In order to save space, isolate the top ten event types by total and average economic impact.

topEcon <- noaaEcon %>%
        arrange(desc(total.econdamage)) %>%
        top_n(10, total.econdamage)
event.type total.econdamage avg.econdamage
FLOOD 1.489e+11 6141521
HURRICANE (TYPHOON) 7.191e+10 817201282
TORNADO 2.49e+10 1075424
HAIL 1.707e+10 82186
FLASH FLOOD 1.656e+10 324644
DROUGHT 1.441e+10 5924236
THUNDERSTORM WIND 8.93e+09 42303
TROPICAL STORM 8.32e+09 12199687
WILDFIRE 8.163e+09 1955139
HIGH WIND 5.882e+09 295425
topavgEcon <- noaaEcon %>%
        arrange(desc(avg.econdamage)) %>%
        top_n(10, avg.econdamage)
event.type total.econdamage avg.econdamage
HURRICANE (TYPHOON) 7.191e+10 817201282
STORM SURGE/TIDE 4.642e+09 31365122
TROPICAL STORM 8.32e+09 12199687
TSUNAMI 144082000 7204100
FLOOD 1.489e+11 6141521
DROUGHT 1.441e+10 5924236
WILDFIRE 8.163e+09 1955139
ICE STORM 3.658e+09 1946732
TORNADO 2.49e+10 1075424
FROST/FREEZE 1.105e+09 822536

To best plot these, use the reshape2 package to melt the data frames. This allows you to create a stacked bar chart, showing the impact each storm has on properties and crops.

library(reshape2)
topEconMelt <- topEcon[,c(1:2, 4)]
topEconMelt <- melt(topEconMelt, id = "event.type")

topavgEconMelt <- topavgEcon[,c(1, 3, 5)]
topavgEconMelt <- melt(topavgEconMelt, id = "event.type")

Using the ggplot2 package, first plot total economic impact.

ggplot(topEconMelt, aes(x=reorder(event.type, value), y=value/1000000)) + 
        geom_bar(stat="identity", aes(fill = variable)) + 
        xlab("") + 
        ylab("Total Economic Impact (Millions USD)") + 
        scale_fill_discrete(labels = c("Property", "Crops"), name = "Damage") +
        theme(axis.text.x = element_text(angle = 45, size=8, hjust = 1, vjust = 1)) + 
        ggtitle("Top Ten Costly Event Types, by Total Damage Amounts")

Next, plot average economic impact.

ggplot(topavgEconMelt, aes(x=reorder(event.type, value), y=value/1000000)) + 
        geom_bar(stat="identity", aes(fill = variable)) + 
        xlab("") + 
        ylab("Average Economic Impact (Millions USD)") + 
        scale_fill_discrete(labels = c("Property", "Crops"), name = "Damage") +
        theme(axis.text.x = element_text(angle = 45, size=8, hjust = 1, vjust = 1)) + 
        ggtitle("Top Ten Costly Event Types, by Average Damage Amounts")