This analysis seeks to determine which types of storm events caused the largest health (injuries and fatalities) and financial (property damage and farm crop damage) costs. The project starts with a version of the US National Weather Service (NWS) severe storm database covering January 1950 through November 2011, but records prior to January 2000 are removed from the working dataset. This time frame was chosen because the NWS did not standardize on the currently 48 event types (via a drop-down selector) until the year 2000. Despite this standardization, the dataset required a large amount of processing to convert over 160 event types into the 48 NWS standard event types. This corrected dataset reveals that tornadoes and excessive heat are the most harmful to human health. Regarding financial costs, floods and hurricanes/typhoons cause the most property damage, and drought and floods cause the most crop damage.
## Always display code
knitr::opts_chunk$set(echo = TRUE)
## Disable scientific notation display
options(scipen = 999)
## Load libraries for working with dates, data frames, and plotting
library(knitr)
library(lubridate)
library(dplyr)
library(ggplot2)
library(gridExtra)
library(scales)
First, the US National Weather Service (NWS) storm event data is downloaded and loaded into the working environment.
## Download the compressed CSV file, if it's not already stored locally
if(!file.exists("./data")) {dir.create("./data")}
ZipUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if(!file.exists("./data/stormData.csv.bz2")) {
download.file(ZipUrl, dest = "./data/stormData.csv.bz2")
}
## Only read in the CSV if data is not currently loaded
if (!"storms" %in% ls()) {
storms <- read.csv("data/stormData.csv.bz2")
}
The database contains a large of columns describing the location of the storms, their physical scale, and other fields which are not necessary to determine which events cause the most injuries and deaths, and the most property and farm crop damage. Therefeore, we can remove these columns to free up some memory.
## Eliminate most of the columns describing the size, latitude, longitude,
## county, etc. for the storms
columnsToKeep <- c("BGN_DATE", "EVTYPE", "FATALITIES", "INJURIES",
"PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
storms <- storms[, columnsToKeep]
According to NOAA’s Storm Event FAQ, the National Weather Service (NWS) did not standardize on the current 48 event types until January 1996. In addition, early storm event records are less complete than newer records. Due to these issues, including data before 1996 would likely to biased results. Although the NWS standardized on these event types in 1996, this event type field still allowed for manual data entry until January 2000 when it was converted into a selection box, in an effort to reduce inconsistencies (also from the Storm Event FAQ. Therefore, this analysis only uses data from January 2000 onward.
Also, the event type field consisted of a combination of uppercase and lowercase letters which makes it difficult to aggregate health and economic data based on the event type. To make it easier and more accurate to aggregate, the event type field was converted to all uppercase letters.
## In order to select only certain years of data, we need to extract the year
## from beginning date field
storms$BGN_DATE <- as.Date(storms$BGN_DATE, "%m/%d/%Y %H:%M:%S")
storms$YEAR <- year(storms$BGN_DATE)
## Subset the data to only include events from January 2000 through November
## 2011
stormsY2K <- storms[storms$YEAR > 1999, ]
## Convert Event Type to a standard format
stormsY2K$EVTYPE <- as.character(stormsY2K$EVTYPE)
stormsY2K$EVTYPE <- toupper(stormsY2K$EVTYPE)
Despite the event type standardization efforts, there are still many more than 48 event types in the dataset. Using the event type descriptions in the NWS Directive 10-1605, we converted as many erroneous event types as possible into the 48 standard types.
## Convert non-standard event types into the standard types
stormsY2K$EVTYPE <- sub("^FOG", "DENSE FOG", stormsY2K$EVTYPE)
stormsY2K$EVTYPE <- sub(".*TSTM.*", "THUNDERSTORM WIND", stormsY2K$EVTYPE)
stormsY2K$EVTYPE <- sub("LAKE EFFECT", "LAKE-EFFECT", stormsY2K$EVTYPE)
stormsY2K$EVTYPE <- sub("^WIND.*", "HIGH WIND", stormsY2K$EVTYPE)
stormsY2K$EVTYPE <- sub("^NON TSTM WIND", "STRONG WIND", stormsY2K$EVTYPE)
stormsY2K$EVTYPE <- sub("^NON-SEVERE WIND DAMAGE","HIGH WIND",stormsY2K$EVTYPE)
stormsY2K$EVTYPE <- sub(".*WATERSPOUT.*", "WATERSPOUT", stormsY2K$EVTYPE)
stormsY2K$EVTYPE[grep(pattern = "BLOWING SNOW", stormsY2K$EVTYPE)] <- "BLIZZARD"
stormsY2K$EVTYPE[grep(pattern = "COASTAL FLOODING|ASTRONOMICAL HIGH TIDE|TIDAL FLOODING|EROSION", stormsY2K$EVTYPE)] <- "COASTAL FLOOD"
stormsY2K$EVTYPE[grep(pattern = "LANDSLIDE|MUDSLIDE|MUD SLIDE|LANDSLUMP", stormsY2K$EVTYPE)] <- "DEBRIS FLOW"
stormsY2K$EVTYPE[grep(pattern = "DUST DEV.*|WHIRLWIND", stormsY2K$EVTYPE)] <- "DUST DEVIL"
stormsY2K$EVTYPE[grep(pattern = "BLOWING DUST", stormsY2K$EVTYPE)] <- "DUST STORM"
stormsY2K$EVTYPE[grep(pattern = "^THUNDERSTORM|MICROBURST|SEVERE THUNDERSTORMS|GUSTY THUNDERSTORM.*", stormsY2K$EVTYPE)] <- "THUNDERSTORM WIND"
stormsY2K$EVTYPE[grep(pattern = "DROUGHT|DRY|DRIEST|RECORD LOW RAINFALL", stormsY2K$EVTYPE)] <- "DROUGHT"
stormsY2K$EVTYPE[grep(pattern = "EXTREME HEAT|EXCESSIVE HEAT|RECORD HEAT|UNSEASONABLY HOT", stormsY2K$EVTYPE)] <- "EXCESSIVE HEAT"
stormsY2K$EVTYPE[grep(pattern = "^COLD.*|EXTREME COLD|RECORD COLD|PROLONG COLD|WINDCHILL|COLD WIND CHILL|UNSEASONABLY COLD|UNUSUALLY COLD|UNSEASONAL LOW TEMP", stormsY2K$EVTYPE)] <- "EXTREME COLD/WIND CHILL"
stormsY2K$EVTYPE[grep(pattern = "FLASH FLOOD|DROWNING|URBAN/SML STREAM FLD", stormsY2K$EVTYPE)] <- "FLASH FLOOD"
stormsY2K$EVTYPE[grep(pattern = "^FROST|^FREEZE|HARD FREEZE", stormsY2K$EVTYPE)] <- "FROST/FREEZE"
stormsY2K$EVTYPE[grep(pattern = "FUNNEL CLOUD.*", stormsY2K$EVTYPE)] <- "FUNNEL CLOUD"
stormsY2K$EVTYPE[grep(pattern = "SMALL HAIL|NON SEVERE HAIL", stormsY2K$EVTYPE)] <- "HAIL"
stormsY2K$EVTYPE[grep(pattern = "PROLONG WARMTH|RECORD WARMTH|UNSEASONABLY WARM.*|UNUSUALLY WARM|WARM WEATHER|VERY WARM", stormsY2K$EVTYPE)] <- "HEAT"
stormsY2K$EVTYPE[grep(pattern = "HEAVY RAIN.*|RECORD RAINFALL|^RAIN|EXTREMELY WET|ABNORMALLY WET", stormsY2K$EVTYPE)] <- "HEAVY RAIN"
stormsY2K$EVTYPE[grep(pattern = "^SNOW", stormsY2K$EVTYPE)] <- "HEAVY SNOW"
stormsY2K$EVTYPE[grep(pattern = "SURF|HIGH SEAS|HIGH WATER|ROUGH SEAS", stormsY2K$EVTYPE)] <- "HIGH SURF"
stormsY2K$EVTYPE[grep(pattern = "HURRICANE|TYPHOON", stormsY2K$EVTYPE)] <- "HURRICANE (TYPHOON)"
stormsY2K$EVTYPE[grep(pattern = "^RIP CURRENTS", stormsY2K$EVTYPE)] <- "RIP CURRENT"
stormsY2K$EVTYPE[grep(pattern = "SLEET STORM", stormsY2K$EVTYPE)] <- "SLEET"
stormsY2K$EVTYPE[grep(pattern = "^STORM SURGE", stormsY2K$EVTYPE)] <- "STORM SURGE/TIDE"
stormsY2K$EVTYPE[grep(pattern = "^STRONG WINDS|GUSTY WIND.*|WND", stormsY2K$EVTYPE)] <- "STRONG WIND"
stormsY2K$EVTYPE[grep(pattern = "TORN.*", stormsY2K$EVTYPE)] <- "TORNADO"
stormsY2K$EVTYPE[grep(pattern = "VOLC.*", stormsY2K$EVTYPE)] <- "VOLCANIC ASH"
stormsY2K$EVTYPE[grep(pattern = "WILD.*|SMOKE|BRUSH FIRE", stormsY2K$EVTYPE)] <- "WILDFIRE"
stormsY2K$EVTYPE[grep(pattern = "RECORD SNOW.*", stormsY2K$EVTYPE)] <- "WINTER STORM"
stormsY2K$EVTYPE[grep(pattern = "WINTER WEATHER.*|WINTRY MIX|GLAZE|FREEZING RAIN|FREEZING DRIZZLE|BLACK ICE|ICE ON ROAD|ICY ROADS|ICE/SNOW|LIGHT SNOW|MODERATE SNOW|PATCHY ICE|MIXED PRECIPITATION", stormsY2K$EVTYPE)] <- "WINTER WEATHER"
After standardizing the event types, we investigated which storm events cause the most human fatalities and injuries. Some storms can cause property/crop damage without any injuries or deaths. In order to analyze the health impacts, we only need the storm events that caused a fatality or injury.
stormsY2K$totalTrauma <- stormsY2K$FATALITIES + stormsY2K$INJURIES
health <- stormsY2K %>%
filter(totalTrauma!=0) %>%
select(-(PROPDMG:CROPDMGEXP))
Now, we can calculate the total number of fatalities and injuries for each storm event type. Since some events, such as Hail, are more likely to cause injuries than deaths, the fatalities and injuries were calculated separately. The fatality and injury totals were then sorted in descending order to determine which were most harmful to human health. To keep the results accessible, only the 10 most harmful storm event types are presented here.
## Calculate Fatalities by Event Type
fatalities <- aggregate(FATALITIES ~ EVTYPE, health, FUN = sum)
fatalities <- arrange(fatalities, desc(FATALITIES))
## Calculate 10 Most Deadly Events
leadingDeaths <- slice(fatalities, 1:10)
leadingDeaths <- arrange(leadingDeaths, desc(FATALITIES))
leadingDeaths$EVTYPE <- factor(leadingDeaths$EVTYPE,
levels = leadingDeaths$EVTYPE[order(leadingDeaths$FATALITIES)])
## Display the 10 Most Lethal Storm Event Types in a Table
knitr::kable(leadingDeaths, col.names = c("Event Types", "Total Fatalities"),
align = c("c", "c"),
caption = "Table 1: Storm events causing the most deaths in the U.S.",
format.args = list(big.mark = ','))
| Event Types | Total Fatalities |
|---|---|
| TORNADO | 1,193 |
| EXCESSIVE HEAT | 1,013 |
| FLASH FLOOD | 613 |
| LIGHTNING | 466 |
| RIP CURRENT | 462 |
| EXTREME COLD/WIND CHILL | 266 |
| FLOOD | 266 |
| THUNDERSTORM WIND | 261 |
| HEAT | 231 |
| AVALANCHE | 179 |
## Calculate Injuries by Event Type
injuries <- aggregate(INJURIES ~ EVTYPE, health, FUN = sum)
injuries <- arrange(injuries, desc(INJURIES))
## Calculate 10 Most Injurious Events
leadingInjuries <- slice(injuries, 1:10)
leadingInjuries <- arrange(leadingInjuries, desc(INJURIES))
leadingInjuries$EVTYPE <- factor(leadingInjuries$EVTYPE,
levels = leadingInjuries$EVTYPE[order(leadingInjuries$INJURIES)])
## Display the 10 Most Injurious Storm Event Types in a Table
knitr::kable(leadingInjuries, col.names = c("Event Types", "Total Injuries"),
align = c("c", "c"),
caption = "Table 2: Storm events causing the most injuries in the U.S.",
format.args = list(big.mark = ','))
| Event Types | Total Injuries |
|---|---|
| TORNADO | 15,213 |
| EXCESSIVE HEAT | 3,708 |
| THUNDERSTORM WIND | 3,218 |
| LIGHTNING | 2,993 |
| HURRICANE (TYPHOON) | 1,291 |
| HEAT | 1,225 |
| WILDFIRE | 1,199 |
| FLASH FLOOD | 840 |
| HIGH WIND | 738 |
| HAIL | 545 |
Starting with the same standardized results as the health analysis, we evaluated which storm events are the most costly in terms of property and farm crop damage. Some storms can cause injuries or deaths with very little damage to property or crops. In order to analyze the economic impacts, we only need the storm events that caused property or crop damage.
stormsY2K$totalDamage <- stormsY2K$PROPDMG + stormsY2K$CROPDMG
econCosts <- stormsY2K %>%
filter(stormsY2K$totalDamage!=0) %>%
select(-(FATALITIES:INJURIES))
To determine the dollar amount for each event, we need to multiply two columns: one ending in “DMG” which contains a numeric value, and one ending in “DMGEXP” which contains a letter designating a multiplier for the value in “DMG”. We need to do this separately for property damage and crop damage. The multipliers are:
K equals one thousand (1x10^3)
M equals one million (1x10^6)
B equals one billion (1x10^9)
As an example, PROPDMG = “15” multiplied by PROPDMGEXP = “B” means that event caused $15 billion in property damages.
## Convert property damage to US dollar values
econCosts$PROPERTYDMGDOLLAR <- with(econCosts,
ifelse(PROPDMGEXP == "K", PROPDMG * 1000,
ifelse(PROPDMGEXP =="M", PROPDMG * 1e+06,
ifelse(PROPDMGEXP == "B", PROPDMG * 1e+09,
PROPDMG * 1 ))))
## Convert farm crop damage to US dollar values
econCosts$CROPDMGDOLLAR <- with(econCosts,
ifelse(CROPDMGEXP == "K", CROPDMG * 1000,
ifelse(CROPDMGEXP =="M", CROPDMG * 1e+06,
ifelse(CROPDMGEXP == "B", CROPDMG * 1e+09,
CROPDMG * 1 ))))
Since the property damage costs are now all in dollars, we can calculate the total dollar value of property damage and crop damage for each storm event type. Because some events, such as “droughts”, cause crop damage not property damage, the property damage and crop damage amounts were calculated separately. The property and crop damage totals were then sorted in descending order to determine which were most expensive in terms of monetary value. To keep the results readable, only the 10 most harmful storm event types are presented here.
## Calculate Property Damage by Event Type
propCost <- aggregate(PROPERTYDMGDOLLAR ~ EVTYPE, econCosts, FUN = sum)
propCost <- arrange(propCost, desc(PROPERTYDMGDOLLAR))
## Calculate 10 Most Expensive Events (in Terms of Property Damage)
mostPropDamage <- slice(propCost, 1:10)
mostPropDamage <- arrange(mostPropDamage, desc(PROPERTYDMGDOLLAR))
mostPropDamage$EVTYPE <- factor(mostPropDamage$EVTYPE,
levels = mostPropDamage$EVTYPE[order(mostPropDamage$PROPERTYDMGDOLLAR)])
## Create Table of 10 Event Types Resulting in the Most Property Damage
knitr::kable(mostPropDamage, col.names = c("Event Types", "Property Damage Costs (USD)"),
align = c("c", "c"),
caption = "Table 3: Storm events causing the most property damage in the U.S.",
format.args = list(big.mark = ','))
| Event Types | Property Damage Costs (USD) |
|---|---|
| FLOOD | 134,691,074,080 |
| HURRICANE (TYPHOON) | 72,349,646,010 |
| STORM SURGE/TIDE | 47,812,123,000 |
| TORNADO | 19,460,679,560 |
| HAIL | 11,988,605,920 |
| FLASH FLOOD | 11,883,390,310 |
| TROPICAL STORM | 7,194,930,550 |
| WILDFIRE | 7,114,062,000 |
| THUNDERSTORM WIND | 5,562,267,020 |
| HIGH WIND | 4,947,560,420 |
## Calculate Crop Damage by Event Type
farmCost <- aggregate(CROPDMGDOLLAR ~ EVTYPE, econCosts, FUN = sum)
farmCost <- arrange(farmCost, desc(CROPDMGDOLLAR))
## Calculate 10 Most Expensive Events (in Terms of Crop Damage)
mostFarmDamage <- slice(farmCost, 1:10)
mostFarmDamage <- arrange(mostFarmDamage, desc(CROPDMGDOLLAR))
mostFarmDamage$EVTYPE <- factor(mostFarmDamage$EVTYPE,
levels = mostFarmDamage$EVTYPE[order(mostFarmDamage$CROPDMGDOLLAR)])
## Create Table of 10 Event Types Resulting in the Most Farm Crop Damage
knitr::kable(mostFarmDamage, col.names = c("Event Types", "Farm Crop Damage Costs (USD)"),
align = c("c", "c"),
caption = "Table 4: Storm events causing the most farm crop damage in the U.S.",
format.args = list(big.mark = ','))
| Event Types | Farm Crop Damage Costs (USD) |
|---|---|
| DROUGHT | 9,135,585,000 |
| FLOOD | 4,221,934,400 |
| HURRICANE (TYPHOON) | 3,056,852,800 |
| HAIL | 1,789,986,200 |
| FROST/FREEZE | 1,147,436,000 |
| FLASH FLOOD | 904,860,500 |
| THUNDERSTORM WIND | 697,778,600 |
| HIGH WIND | 498,363,000 |
| EXCESSIVE HEAT | 492,402,000 |
| TROPICAL STORM | 412,311,000 |
The four tables above separately show the 10 storm types that caused the most deaths, injuries, property damage, and farm crop damage. These results will be summarized in the two figures below.
The Health Impact Analysis above generated two lists, leadingDeaths and leadingInjuries, which contain the 10 event types leading to the most deaths and the most injuries, respectively. The plot below shows Tornadoes and Excessive Heat cause the most fatalities and the most injuries. Also, there are many storm events, Flash Floods, Lightning, Thunderstorm Winds, and Heat, which are major causes of both deaths and injuries.
## Create plots of fatalities and injuries
## Event Type Descriptions are Too Long to display horizontally, so we
## need to arrange them at an angle
fatalityPlot <- ggplot(data = leadingDeaths, aes(x = reorder(EVTYPE, -FATALITIES), y = FATALITIES), weight = FATALITIES)+
geom_bar(stat = "identity", color = "red", fill = "red", width = 0.5) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("Severe Weather Type") + scale_y_continuous("Number of Fatalities", labels = comma) +
ggtitle("Total Fatalities Caused\n by Severe Weather\n Events in the U.S.\n from 2000 - 2011")
injuryPlot <- ggplot(data = leadingInjuries, aes(x = reorder(EVTYPE, -INJURIES), y = INJURIES), weight = INJURIES)+
geom_bar(stat = "identity", color = "blue", fill = "blue", width = 0.5) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("Severe Weather Type") + scale_y_continuous("Number of Injuries", labels = comma) +
ggtitle("Total Injuries Caused\n by Severe Weather\n Events in the U.S.\n from 2000 - 2011")
grid.arrange(fatalityPlot, injuryPlot, ncol = 2)
The Financial Impact Analysis above generated two lists, mostPropDamage and mostFarmDamage, which contain the 10 event types leading to the greatest property damage and the most farm crop damage, respectively. The plot below shows Floods and Hurricanes (Typhoons) cause the most property damage, whereas Droughts and Floods cause the most damage to farm crops. Similar to the health impacts above, there are many storm events, Hail, Thunderstorm Winds, Flash Floods, and High Winds, which are major causes of property and crop losses. It’s interesting to note that Floods, which caused the most property damage, have led to over 14 times the economic cost of Droughts, which caused the most farm crop damage.
## Create plots of property damage and farm crop damage
## Event Type Descriptions are Too Long to display horizontally, so we
## need to arrange them at an angle
propertyPlot <- ggplot(data = mostPropDamage, aes(x = reorder(EVTYPE, -PROPERTYDMGDOLLAR), y = PROPERTYDMGDOLLAR/1e6), weight = PROPERTYDMGDOLLAR)+
geom_bar(stat = "identity", color = "red", fill = "red", width = 0.5) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("Severe Weather Type") + scale_y_continuous("Cost of Property Damage\n (Million USD)", labels = comma) +
ggtitle("Total Property Costs Caused\n by Severe Weather\n Events in the U.S.\n from 2000 - 2011")
farmPlot <- ggplot(data = mostFarmDamage, aes(x = reorder(EVTYPE, -CROPDMGDOLLAR), y = CROPDMGDOLLAR/1e6), weight = CROPDMGDOLLAR)+
geom_bar(stat = "identity", color = "green", fill = "green", width = 0.5) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("Severe Weather Type") + scale_y_continuous("Cost of Farm Crop Damage\n (Million USD)", labels = comma) +
ggtitle("Total Farm Crop Costs Caused\n by Severe Weather\n Events in the U.S.\n from 2000 - 2011")
grid.arrange(propertyPlot, farmPlot, ncol = 2)
This analysis investigated the costs, both in terms of human health and financial costs, resulting from different types of severe storm events in the US. Based on National Weather Service (NWS) data from January 2000 through November 2011, Tornadoes and Excessive Heat caused the largest health impact (most fatalities and injuries). Using this same data, Floods and Hurricanes (Typhoons) caused the most property damage, while Droughts and Floods caused the most farm crop damage.
These results seem reasonable, and offer possible areas for future research. The following are some suggestions for later research:
Are the leading causes of human health and financial damage highly dependent on the formula used to correct the event types to fit within the standard 48 event types used by the NWS?
Do the most expensive events occur every year during one or two months and in a small number of states?
Have the most damaging event types changed in each state over time?
Have the most severe events shifted earlier/later in the year in each state possibly due to climate change?
Although this analysis provides some basic insight regarding the costs of severe storms in the US, the conclusions we can draw from it are limited. The human costs and financial costs cannot be compared without assigning dollar values to the costs of injuries and deaths. This would require performing very involved data collection on the costs of first aid, hospital care, funeral services, etc. as well as possibly coming creating a statistical value(s) of human life. Also, this analysis is fairly uninformative for policymakers or insurance companies because it is aggregated across the entire country and over the entire time period from 2000 - 2011. Despite these limitations, the analysis still providesa solid starting point for continuing research and analysis.