Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. This paper analyses estimates of fatalities, injuries, and property damage of major storms and weather events in the United States using the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database (download link; see also the documentation [PDF document] and the National Climatic Data Center Storm Events FAQ [PDF document]). The analysis that follows indicates that across the US the weather events that have the largest annual impact on popultion health (injuries + fatalities) are: floods, ice storms and blizzards. Those that have the greatest economic consequences through property and crop damages are floods, hurricanes and ice storms.
The data for this analysis were loaded into R from the original compressed (.bzip2) comma-separated-value file (.csv) available here: Storm Data. Running of the provided R code assumes that this file has been downloaded and is in the current working directory of your computer and that the following R packages are installed: R.utils, lubridate and ggplot2.
library(R.utils)
# Assuming the datafile is in the current working directory. Load it from either
# the previously generated R data object 'StormData.Rda' or from the compressed
# .bz2 file if the former does not exist:
if ("StormData.Rda" %in% dir()) {
d <- readRDS("StormData.Rda")
} else {
bunzip2("repdata-data-StormData.csv.bz2", "StormData.csv", overwrite = T, remove = F)
d <- read.csv("StormData.csv", sep = ",", strip.white = T, nrows = 902300)
saveRDS(d, "StormData.Rda")
# Get rid of the large, redundant .csv file:
file.remove("StormData.csv")
}
There were two main data quality issues which needed to be resolved before analysis could procceed:
library(lubridate)
# Remove rows with invalid EXP variable, or 'exponential', entries. Assume ''
# means actual value, no exponent.
d <- d[!d$PROPDMGEXP %in% c("+", "-", "?", "H", "h", 0:9), ]
d <- d[!d$CROPDMGEXP %in% c("?", 0:9), ]
d <- d[!d$PROPDMG == 0 & !d$CROPDMG == 0, ]
# Convert EVTYPE values to standard list of 48 provided in the documentation:
d$EVTYPE <- toupper(d$EVTYPE)
evtypes <- toupper(as.character(c("Astronomical Low Tide", "Avalanche", "Blizzard",
"Coastal Flood", "Cold/Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke",
"Drought", "Dust Devil", "Dust Storm", "Excessive Heat", "Extreme Cold/Wind Chill",
"Flash Flood", "Flood", "Frost/Freeze", "Funnel Cloud", "Freezing Fog", "Hail",
"Heat", "Heavy Rain", "Heavy Snow", "High Surf", "High Wind", "Hurricane (Typhoon)",
"Ice Storm", "Lake-Effect Snow", "Lakeshore Flood", "Lightning", "Marine Hail",
"Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind", "Rip Current",
"Seiche", "Sleet", "Storm Surge/Tide", "Strong Wind", "Thunderstorm Wind", "Tornado",
"Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout",
"Wildfire", "Winter Storm", "Winter Weather")))
evTable <- as.data.frame(matrix(c("BLIZZARD", "BLIZZARD", "COASTAL FLOODING", "COASTAL FLOOD",
"COLD AIR TORNADO", "TORNADO", "DROUGHT", "DROUGHT", "DRY MICROBURST", "THUNDERSTORM WIND",
"DUST STORM", "DUST STORM", "DUST STORM/HIGH WINDS", "DUST STORM", "EXCESSIVE HEAT",
"EXCESSIVE HEAT", "EXTREME COLD", "EXTREME COLD/WIND CHILL", "FLASH FLOOD", "FLASH FLOOD",
"FLASH FLOOD/FLOOD", "FLASH FLOOD", "FLASH FLOODING", "FLASH FLOOD", "FLASH FLOODING/FLOOD",
"FLASH FLOOD", "FLOOD", "FLOOD", "FLOOD/FLASH FLOOD", "FLASH FLOOD", "FLOODING",
"FLOOD", "FLOODS", "FLOOD", "FOREST FIRES", "EXCESSIVE HEAT", "FREEZE", "FROST/FREEZE",
"FROST/FREEZE", "FROST/FREEZE", "GLAZE ICE", "ICE STORM", "GUSTNADO", "THUNDERSTORM WIND",
"GUSTY WINDS", "HIGH WIND", "HAIL", "HAIL", "HAIL 100", "HAIL", "HAIL/WIND",
"HAIL", "HAIL/WINDS", "HAIL", "HEAT", "EXCESSIVE HEAT", "HEAT WAVE", "EXCESSIVE HEAT",
"HEAT WAVE DROUGHT", "DROUGHT", "HEAVY RAIN", "HEAVY RAIN", "HEAVY RAIN/HIGH SURF",
"HIGH SURF", "HEAVY RAINS", "HEAVY RAIN", "HEAVY RAINS/FLOODING", "FLOOD", "HEAVY SNOW",
"HEAVY SNOW", "HEAVY SNOW/HIGH WINDS & FLOOD", "HEAVY SNOW", "HIGH WIND", "HIGH WIND",
"HIGH WINDS", "HIGH WIND", "HIGH WINDS HEAVY RAINS", "HEAVY RAIN", "HIGH WINDS/COLD",
"EXTREME COLD/WIND CHILL", "HURRICANE", "HURRICANE (TYPHOON)", "HURRICANE ERIN",
"HURRICANE (TYPHOON)", "HURRICANE FELIX", "HURRICANE (TYPHOON)", "HURRICANE OPAL",
"HURRICANE (TYPHOON)", "HURRICANE OPAL/HIGH WINDS", "HURRICANE (TYPHOON)", "HURRICANE/TYPHOON",
"HURRICANE (TYPHOON)", "ICE JAM FLOODING", "FLOOD", "ICE STORM", "ICE STORM",
"LANDSLIDE", "HEAVU RAIN", "LIGHTNING", "LIGHTNING", "RIVER FLOOD", "FLOOD",
"RIVER FLOODING", "FLOOD", "SEVERE THUNDERSTORM WINDS", "THUNDERSTORM WIND",
"SEVERE THUNDERSTORMS", "THUNDERSTORM WIND", "SMALL HAIL", "HAIL", "SNOW", "HEAVY SNOW",
"STORM SURGE", "STORM SURGE/TIDE", "STORM SURGE/TIDE", "STORM SURGE/TIDE", "STRONG WIND",
"STRONG WIND", "THUDERSTORM WINDS", "THUNDERSTORM WIND", "THUNDERSTORM HAIL",
"THUNDERSTORM WIND", "THUNDERSTORM WIND", "THUNDERSTORM WIND", "THUNDERSTORM WINDS",
"THUNDERSTORM WIND", "THUNDERSTORM WINDS HAIL", "HAIL", "THUNDERSTORM WINDS LIGHTNING",
"LIGHTNING", "THUNDERSTORM WINDS/ FLOOD", "FLOOD", "THUNDERSTORM WINDS/HAIL",
"HAIL", "THUNDERSTORM WINDSS", "THUNDERSTORM WIND", "THUNDERSTORMS", "THUNDERSTORM WIND",
"THUNDERSTORMS WIND", "THUNDERSTORM WIND", "THUNDERSTORMS WINDS", "THUNDERSTORM WIND",
"TORNADO", "TORNADO", "TORNADO F0", "TORNADO", "TORNADOES", "TORNADO", "HAIL",
"TORNADO", "TROPICAL STORM", "TROPICAL STORM", "TROPICAL STORM DEAN", "TROPICAL STORM",
"TROPICAL STORM GORDON", "TROPICAL STORM", "TROPICAL STORM JERRY", "TROPICAL STORM",
"TSTM WIND", "THUNDERSTORM WIND", "TSTM WIND/HAIL", "HAIL", "TSUNAMI", "TSUNAMI",
"TYPHOON", "HURRICANE (TYPHOON)", "URBAN FLOOD", "FLOOD", "URBAN FLOODING", "FLOOD",
"URBAN/SML STREAM FLD", "FLOOD", "WILD/FOREST FIRE", "EXCESSIVE HEAT", "WILD/FOREST FIRES",
"EXCESSIVE HEAT", "WILDFIRE", "EXCESSIVE HEAT", "WILDFIRES", "EXCESSIVE HEAT",
"WIND DAMAGE", "HIGH WIND", "WINDS", "HIGH WIND", "WINTER STORM", "WINTER STORM",
"WINTER STORM HIGH WINDS", "WINTER STORM", "WINTER STORMS", "WINTER STORM"),
dimnames = list(NULL, c("key", "result")), ncol = 2, byrow = T))
d$EVTYPE <- evTable[match(d$EVTYPE, evTable$key), "result"]
d$year <- mdy_hms(d$BGN_DATE)
Total economic damage was calculated as the sum of crop and property damages and total health impact was calculated as the sum of injuries and fatalities for each event.
expTable <- as.data.frame(matrix(c("", 1, "K", 1000, "k", 1000, "m", 1e+06, "M",
1e+06, "b", 1e+09, "B", 1e+09), ncol = 2, byrow = T))
expTable[, 2] <- as.numeric(as.character(expTable[, 2]))
d$propertyDamages <- d$PROPDMG * expTable[match(d$PROPDMGEXP, expTable[, 1]), 2]
d$cropDamages <- d$CROPDMG * expTable[match(d$CROPDMGEXP, expTable[, 1]), 2]
d$totalDamages <- d$propertyDamages + d$cropDamages
d$healthImpact <- d$FATALITIES + d$INJURIES
# WORK HERE! Need to make this annually, currently I'm just adding up the totals
totals <- aggregate(cbind(totalDamages, healthImpact) ~ EVTYPE + year(year), data = d,
"sum")
Figure 1 shows the rankings of weather event type according to total health impact (injuries + fatalities, log transformed) accross the US. Weather event types are ranked in decreasing order (from top to bottom) based on their mean annual total health impact. Box and whiskers indicate the quartial ranges of the data, the vertical line is the median of the data for that event type and dots are outliers.
library(ggplot2)
totalByHealth <- with(totals, reorder(EVTYPE, healthImpact, mean))
health1 <- tolower(as.character(totalByHealth[order(totalByHealth)][25]))
health2 <- tolower(as.character(totalByHealth[order(totalByHealth)][24]))
health3 <- tolower(as.character(totalByHealth[order(totalByHealth)][23]))
ggplot(totals, aes(x = totalByHealth, y = healthImpact)) + geom_boxplot() + scale_y_log10() +
coord_flip() + ylab("Injuries + fatalities (log10)") + xlab("Weather event") +
ggtitle("Annual US health impacts by weather event")
Figure 2 shows the rankings of weather event type according to total economic impact (crop + property damage) accross the US.
totalByDamages <- with(totals, reorder(EVTYPE, totalDamages, mean))
damage1 <- tolower(as.character(totalByDamages[order(totalByDamages)][25]))
damage2 <- tolower(as.character(totalByDamages[order(totalByDamages)][24]))
damage3 <- tolower(as.character(totalByDamages[order(totalByDamages)][23]))
ggplot(totals, aes(x = totalByDamages, y = totalDamages)) + geom_boxplot() + scale_y_log10() +
coord_flip() + ylab("Damages in US dollars (log10)") + xlab("Weather event") +
ggtitle("Annual US economic damages by weather event")