According to storm data collected from the National Climatic Data Center (NCDC), tornadoes are both the most deadly weather event for people and one of the top 10 most economically damaging storm types. After tornadoes, the next most dangerous type of weather for public health is excessive heat. In terms of property and crop damage potential, floods are responsible for almost 1/3 of all weather-related economic losses.
First, the National Climatic Data Center (NCDC) Storm Data must be downloaded from the provided URL:
download.file(url = "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "rawdata.bz2")
For the purposes of our analysis here, the source file is renamed “rawdata.bz2”. R’s read.csv function can directly extract and read this compressed file type:
rd <- read.csv(file = "rawdata.bz2")
Next, we can use the sqldf package to perform a SQL query to get total fatalities by storm type (as classified by the EVTYPE column in the original data set):
# Load required package
library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
## Loading required package: DBI
# Perform SQL queries
fatal <- sqldf("select distinct(EVTYPE), sum(FATALITIES)
from rd
group by EVTYPE")
## Loading required package: tcltk
# Change column names
names(fatal) <- c("type", "deaths")
# Sort by number of deaths
fatal <- fatal[order(fatal$deaths, decreasing = TRUE), ]
# Calculate what fraction of all storm-related deaths are due to each unique EVTYPE
fatal$frac <- with(fatal, deaths/sum(deaths))
# Get cumulative fractions
fatal$cfrac <- cumsum(fatal$frac)
# Only keep the top 10 EVTYPES
topf <- fatal[1:10, ]
# Add "other" row
topf <- rbind(topf,
data.frame(type = "OTHER",
deaths = sum(rd$FATALITIES) - sum(topf$deaths),
frac = 1 - sum(topf$frac),
cfrac = 1))
# Round off frac and cfrac columns
topf$frac <- round(topf$frac, 2)
topf$cfrac <- round(topf$cfrac, 2)
Determining economic damage is more complicated because rather than using the actual raw number, the original data set uses two columns indicating (1) the numeric value of the damage (columns PROPDMG and CROPDMG) and (2) the order of magnitude of the damage, expressed in 10^3 increments (e.g. “K” for thousands, “M” for millions, “B” for billions, etc.). Consequently, damages must first be translated into the correct total numeric value, which is accomplished in this analysis using the function numD defined below:
numD <- function(d, e) {
# d - numeric damage value vector
# e - character string exponent vector (e.g. "K", "M", etc.)
# Predefine result
result <- rep(0, length(d))
# Multiply damages by matching exponents for K, M, and B (all other types of
# exponents are negligible)
result[e == "K"] <- d[e == "K"] * 1e3
result[e == "M"] <- d[e == "M"] * 1e6
result[e == "B"] <- d[e == "B"] * 1e9
# Return result
return(result)
}
Having defined the numD function, we can now (1) apply it to all of the property and crop damage vectors in the rd data frame, and (2) use the same kind of SQL query applied to fatalities to get the top 10 most economically damaging storm types:
rd$pDam <- numD(d = rd$PROPDMG, e = rd$PROPDMGEXP) +
numD(d = rd$CROPDMG, e = rd$CROPDMGEXP)
# Perform SQL queries
pdam <- sqldf("select distinct(EVTYPE), sum(pDam)
from rd
group by EVTYPE")
# Change column names
names(pdam) <- c("type", "cost")
# Sort by cost
pdam <- pdam[order(pdam$cost, decreasing = TRUE), ]
# Calculate what fraction of costs due to each unique EVTYPE
pdam$frac <- with(pdam, cost/sum(cost))
# Get cumulative fractions
pdam$cfrac <- cumsum(pdam$frac)
# Only keep the top 10 EVTYPES
topd <- pdam[1:10, ]
# Add "other" row
topd <- rbind(topd,
data.frame(type = "OTHER",
cost = sum(rd$pDam) - sum(topd$cost),
frac = 1 - sum(topd$frac),
cfrac = 1))
# Round off frac and cfrac columns
topd$frac <- round(topd$frac, 2)
topd$cfrac <- round(topd$cfrac, 2)
# Finally, convert to billions of dollars
topd$cost <- topd$cost / 1e9
The top 10 most harmful storm types with respect to population health (as measured by the number of fatalities reported in the NCDC Storm Data) are shown in Figure 1 and Table 1. The most significant single source of weather related deaths is tornadoes, followed by excessive heat. All other sources of weather-related deaths are < 6% of the total reported fatalities, but taken together they’re a substantial (20%) portion of the total.
# Load ggplot2
library(ggplot2)
# Construct plot
ggplot(topf, aes(type, deaths)) + geom_bar(stat = "identity") + labs(x = "EVTYPE", y = "Deaths", title = "Figure 1: Fatalities by Storm Type") + theme(axis.text.x = element_text(angle = 30, hjust = 1))
knitr::kable(topf, caption = "Table 1: Ten most harmful storm types as measured by number of fatalities. 'frac' is the fraction of all weather-related deaths attributed to the given EVTYPE value, and 'cfrac' is the cumulative summation of the 'frac' value.")
| type | deaths | frac | cfrac | |
|---|---|---|---|---|
| 826 | TORNADO | 5633 | 0.37 | 0.37 |
| 124 | EXCESSIVE HEAT | 1903 | 0.13 | 0.50 |
| 151 | FLASH FLOOD | 978 | 0.06 | 0.56 |
| 271 | HEAT | 937 | 0.06 | 0.62 |
| 453 | LIGHTNING | 816 | 0.05 | 0.68 |
| 846 | TSTM WIND | 504 | 0.03 | 0.71 |
| 167 | FLOOD | 470 | 0.03 | 0.74 |
| 572 | RIP CURRENT | 368 | 0.02 | 0.77 |
| 343 | HIGH WIND | 248 | 0.02 | 0.78 |
| 19 | AVALANCHE | 224 | 0.01 | 0.80 |
| 1 | OTHER | 3064 | 0.20 | 1.00 |
The top 10 most harmful storm types with respect to economic damages (as measured by the combined property and crop damages reported in the NCDC Storm Data) are shown in Figure 2 and Table 2. The most significant single source of economic damages is flooding, followed by hurricanes and tornadoes. Combined, these three storm types account for approximately 60% of all economic losses from weather-related events. Like with fatalities, there is a substantial tail of damages that fall into the “other” category.
# Load ggplot2
library(ggplot2)
# Construct plot
ggplot(topd, aes(type, cost)) + geom_bar(stat = "identity") + labs(x = "EVTYPE", y = "Cost (billions of dollars)", title = "Figure 2: Property and Crop Damages by Storm Type") + theme(axis.text.x = element_text(angle = 30, hjust = 1))
knitr::kable(topd, caption = "Table 2: Ten most harmful storm types as measured by property and crop damages. 'frac' is the fraction of all weather-related costs attributed to the given EVTYPE value, and 'cfrac' is the cumulative summation of the 'frac' value.")
| type | cost | frac | cfrac | |
|---|---|---|---|---|
| 167 | FLOOD | 150.319678 | 0.32 | 0.32 |
| 393 | HURRICANE/TYPHOON | 71.913713 | 0.15 | 0.47 |
| 826 | TORNADO | 57.340614 | 0.12 | 0.59 |
| 656 | STORM SURGE | 43.323541 | 0.09 | 0.68 |
| 241 | HAIL | 18.752904 | 0.04 | 0.72 |
| 151 | FLASH FLOOD | 17.562129 | 0.04 | 0.75 |
| 91 | DROUGHT | 15.018672 | 0.03 | 0.79 |
| 385 | HURRICANE | 14.610229 | 0.03 | 0.82 |
| 577 | RIVER FLOOD | 10.148404 | 0.02 | 0.84 |
| 422 | ICE STORM | 8.967041 | 0.02 | 0.86 |
| 1 | OTHER | 68.416573 | 0.14 | 1.00 |