According to storm data collected from the National Climatic Data Center (NCDC), tornadoes are both the most deadly weather event for people and one of the top 10 most economically damaging storm types. After tornadoes, the next most dangerous type of weather for public health is excessive heat. In terms of property and crop damage potential, floods are responsible for almost 1/3 of all weather-related economic losses.

Data Processing

First, the National Climatic Data Center (NCDC) Storm Data must be downloaded from the provided URL:

download.file(url = "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "rawdata.bz2")

For the purposes of our analysis here, the source file is renamed “rawdata.bz2”. R’s read.csv function can directly extract and read this compressed file type:

rd <- read.csv(file = "rawdata.bz2")

Next, we can use the sqldf package to perform a SQL query to get total fatalities by storm type (as classified by the EVTYPE column in the original data set):

# Load required package
library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
## Loading required package: DBI
# Perform SQL queries
fatal <- sqldf("select distinct(EVTYPE), sum(FATALITIES)
                from rd
                group by EVTYPE")
## Loading required package: tcltk
# Change column names
names(fatal) <- c("type", "deaths")

# Sort by number of deaths
fatal <- fatal[order(fatal$deaths, decreasing = TRUE), ]

# Calculate what fraction of all storm-related deaths are due to each unique EVTYPE
fatal$frac <- with(fatal, deaths/sum(deaths))

# Get cumulative fractions
fatal$cfrac <- cumsum(fatal$frac)

# Only keep the top 10 EVTYPES
topf <- fatal[1:10, ]

# Add "other" row
topf <- rbind(topf,
              data.frame(type   = "OTHER",
                         deaths = sum(rd$FATALITIES) - sum(topf$deaths),
                         frac   = 1 - sum(topf$frac),
                         cfrac  = 1))

# Round off frac and cfrac columns
topf$frac  <- round(topf$frac,  2)
topf$cfrac <- round(topf$cfrac, 2)

Determining economic damage is more complicated because rather than using the actual raw number, the original data set uses two columns indicating (1) the numeric value of the damage (columns PROPDMG and CROPDMG) and (2) the order of magnitude of the damage, expressed in 10^3 increments (e.g. “K” for thousands, “M” for millions, “B” for billions, etc.). Consequently, damages must first be translated into the correct total numeric value, which is accomplished in this analysis using the function numD defined below:

numD <- function(d, e) {
  
  # d - numeric damage value vector
  # e - character string exponent vector (e.g. "K", "M", etc.)
  
  # Predefine result
  result <- rep(0, length(d))
  
  # Multiply damages by matching exponents for K, M, and B (all other types of
  # exponents are negligible)
  result[e == "K"] <- d[e == "K"] * 1e3
  result[e == "M"] <- d[e == "M"] * 1e6
  result[e == "B"] <- d[e == "B"] * 1e9
  
  # Return result
  return(result)
}

Having defined the numD function, we can now (1) apply it to all of the property and crop damage vectors in the rd data frame, and (2) use the same kind of SQL query applied to fatalities to get the top 10 most economically damaging storm types:

rd$pDam <- numD(d = rd$PROPDMG, e = rd$PROPDMGEXP) +
           numD(d = rd$CROPDMG, e = rd$CROPDMGEXP)

# Perform SQL queries
pdam <- sqldf("select distinct(EVTYPE), sum(pDam)
                from rd
                group by EVTYPE")

# Change column names
names(pdam) <- c("type", "cost")

# Sort by cost
pdam <- pdam[order(pdam$cost, decreasing = TRUE), ]

# Calculate what fraction of costs due to each unique EVTYPE
pdam$frac <- with(pdam, cost/sum(cost))

# Get cumulative fractions
pdam$cfrac <- cumsum(pdam$frac)

# Only keep the top 10 EVTYPES
topd <- pdam[1:10, ]

# Add "other" row
topd <- rbind(topd,
              data.frame(type  = "OTHER",
                         cost  = sum(rd$pDam) - sum(topd$cost),
                         frac  = 1 - sum(topd$frac),
                         cfrac = 1))

# Round off frac and cfrac columns
topd$frac  <- round(topd$frac,  2)
topd$cfrac <- round(topd$cfrac, 2)

# Finally, convert to billions of dollars
topd$cost <- topd$cost / 1e9

Results

(1) What weather events are most harmful with respect to population health?

The top 10 most harmful storm types with respect to population health (as measured by the number of fatalities reported in the NCDC Storm Data) are shown in Figure 1 and Table 1. The most significant single source of weather related deaths is tornadoes, followed by excessive heat. All other sources of weather-related deaths are < 6% of the total reported fatalities, but taken together they’re a substantial (20%) portion of the total.

# Load ggplot2
library(ggplot2)

# Construct plot
ggplot(topf, aes(type, deaths)) + geom_bar(stat = "identity") + labs(x = "EVTYPE", y = "Deaths", title = "Figure 1: Fatalities by Storm Type") + theme(axis.text.x = element_text(angle = 30, hjust = 1))

knitr::kable(topf, caption = "Table 1: Ten most harmful storm types as measured by number of fatalities. 'frac' is the fraction of all weather-related deaths attributed to the given EVTYPE value, and 'cfrac' is the cumulative summation of the 'frac' value.")
Table 1: Ten most harmful storm types as measured by number of fatalities. ‘frac’ is the fraction of all weather-related deaths attributed to the given EVTYPE value, and ‘cfrac’ is the cumulative summation of the ‘frac’ value.
type deaths frac cfrac
826 TORNADO 5633 0.37 0.37
124 EXCESSIVE HEAT 1903 0.13 0.50
151 FLASH FLOOD 978 0.06 0.56
271 HEAT 937 0.06 0.62
453 LIGHTNING 816 0.05 0.68
846 TSTM WIND 504 0.03 0.71
167 FLOOD 470 0.03 0.74
572 RIP CURRENT 368 0.02 0.77
343 HIGH WIND 248 0.02 0.78
19 AVALANCHE 224 0.01 0.80
1 OTHER 3064 0.20 1.00

(2) What weather events are most harmful with respect to economic losses?

The top 10 most harmful storm types with respect to economic damages (as measured by the combined property and crop damages reported in the NCDC Storm Data) are shown in Figure 2 and Table 2. The most significant single source of economic damages is flooding, followed by hurricanes and tornadoes. Combined, these three storm types account for approximately 60% of all economic losses from weather-related events. Like with fatalities, there is a substantial tail of damages that fall into the “other” category.

# Load ggplot2
library(ggplot2)

# Construct plot
ggplot(topd, aes(type, cost)) + geom_bar(stat = "identity") + labs(x = "EVTYPE", y = "Cost (billions of dollars)", title = "Figure 2: Property and Crop Damages by Storm Type") + theme(axis.text.x = element_text(angle = 30, hjust = 1))

knitr::kable(topd, caption = "Table 2: Ten most harmful storm types as measured by property and crop damages. 'frac' is the fraction of all weather-related costs attributed to the given EVTYPE value, and 'cfrac' is the cumulative summation of the 'frac' value.")
Table 2: Ten most harmful storm types as measured by property and crop damages. ‘frac’ is the fraction of all weather-related costs attributed to the given EVTYPE value, and ‘cfrac’ is the cumulative summation of the ‘frac’ value.
type cost frac cfrac
167 FLOOD 150.319678 0.32 0.32
393 HURRICANE/TYPHOON 71.913713 0.15 0.47
826 TORNADO 57.340614 0.12 0.59
656 STORM SURGE 43.323541 0.09 0.68
241 HAIL 18.752904 0.04 0.72
151 FLASH FLOOD 17.562129 0.04 0.75
91 DROUGHT 15.018672 0.03 0.79
385 HURRICANE 14.610229 0.03 0.82
577 RIVER FLOOD 10.148404 0.02 0.84
422 ICE STORM 8.967041 0.02 0.86
1 OTHER 68.416573 0.14 1.00