Synopsis

The National Oceanic and Atmospheric Agency (NOAA) maintains records of weather events in the United States and their estimated impact in terms of human casualties and economic costs. This analysis looks at those records over the 10 year period from 2002 through the end of 2011, and answers the questions: what type of weather event caused the greatest human casualties, and what events had the highest economic costs?

The analysis shows that tornadoes were responsible for the highest casualties, in terms of both injuries, and fatalities. From an economic perspective, the most costly type of weather event was flooding, which caused significant property damage as well as some crop damage. However, the most significant crop damage was caused by droughts.

Data Processing

The NOAA weather event data that was used was provided in a compressed file format. It contains many fields that describe each weather event, but for the purposes of this analysis, only the following fields were considered:

After downloading the file from the specified website, it was loaded into an R data frame. The following preliminary processing of the data was then performed:

library(dplyr, quietly = T)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(RColorBrewer)
library(knitr)
# Read in the NOAA Storm Data
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
bz2 <- "NOAA_Storm_Data.csv.bz2"
csv <- "NOAA_Storm_Data.csv"

download.file(url, bz2)

df <- read.csv(bz2, stringsAsFactors = FALSE)  # read.csv knows how to unzip bz2 files...

df$BGN_YEAR <- substring(df$BGN_DATE,1,regexpr(" ", df$BGN_DATE)-1)
df$BGN_YEAR <- substring(df$BGN_YEAR,nchar(df$BGN_YEAR)-3,10)
df$EVTYPE <- as.factor(df$EVTYPE)

# Create a table of multipliers for the exponent column

exptbl <- data.frame(explabel = (c("k","K","m","M","b","B","0","1","2","3","4","5","6","7","8","+","?","h","H","-","")),
                   propmultiplier = c(1e+03,1e+03,1e+06,1e+06,1e+09,1e+09,10,10,10,10,10,10,10,10,10,1,0,100,100,0,0))
exptbl$explabel = as.character(exptbl$explabel)

# Add a column with the numeric value of the property damage multiplier
df <- df %>% 
    inner_join(exptbl, by = c("PROPDMGEXP" = "explabel"))

names(exptbl) <- c("explabel","cropmultiplier")

# Now do the same for the crop damage multiplier

df <- df %>% 
    inner_join(exptbl, by = c("CROPDMGEXP" = "explabel"))

# Create new columns that have the crop damage and property damage costs
df$cropdmgcost <- df$CROPDMG*df$cropmultiplier
df$propdmgcost <- df$PROPDMG*df$propmultiplier

# Now write out the dataframe

#write.csv(df,"dfdata.csv")
#df <- read.csv("dfdata.csv", stringAsFactors = F)

# Ready to build a summary table from which we will gather all needed statistics
evtsum <- df %>% 
    filter(BGN_YEAR > "2001") %>%
    group_by(EVTYPE) %>% 
    summarize(occurrences = length(EVTYPE),
              casualties = sum(FATALITIES+INJURIES), 
              fatalities = sum(FATALITIES), 
              injuries = sum(INJURIES),
              casperevt = sum(FATALITIES+INJURIES)/length(EVTYPE),
              fatperevt = sum(FATALITIES)/length(EVTYPE),
              cost = sum(cropdmgcost) + sum(propdmgcost),
              propcost = sum(propdmgcost),
              cropcost = sum(cropdmgcost),
              costperevt = (sum(cropdmgcost) + sum(propdmgcost))/length(EVTYPE))

totcas <- sum(evtsum$casualties)
totinj <- sum(evtsum$injuries)
totfat <- sum(evtsum$fatalities)
totfat <- sum(evtsum$fatalities)
totcost <- sum(evtsum$cost)
totpropcost <- sum(evtsum$propcost)
totcropcost <- sum(evtsum$cropcost)

Results

In total, weather events resulted in 29,609 injuries, and 5,048 fatalities between 2002 and 2011. During the same time period, weather events resulted in property damage totaling $315B, and crop damage of $18.5B.

Events Resulting In Greatest Numbers of Casualties

The chart below summarizes the weather events that resulted in the greatest human casualties over the 10-year period from 2002 to 2011.

# Build table of top causes of casualties
maxcas <- evtsum %>%
    select(EVTYPE, injuries, fatalities, casualties) %>%
    arrange(desc(casualties)) 

maxcas$pct <- maxcas$casualties/totcas
maxcas$cumpct <- maxcas$pct

# Get the cumulative percent

for (i in 2:length(maxcas$EVTYPE)) {
    maxcas$cumpct[i] <- maxcas$cumpct[i-1]+maxcas$pct[i]}

# Limit the table to entries representing at least 90% of all casualties or at least 10 events
maxcas <- maxcas %>%
    filter(cumpct <= 0.90 | row_number(cumpct) < 11)

topevt <- maxcas$EVTYPE[1]
topinj <- maxcas$injuries[1]
topfat <- maxcas$fatalities[1]

par(oma = c(6,1,1,1))
mcbp <- barplot(t(maxcas[,2:3]),
                col = brewer.pal(3, "Paired"),
                main = "Weather Events Accounting for Top 90% of Casualties, 2002-2011",
                ylab = "Number of Casualties",
                legend = colnames(maxcas[,2:3]),
                xaxt = "n",
                yaxt = "n")
axis(1,at = mcbp, labels = maxcas$EVTYPE, las = 3, cex.axis = .9) 
axis(2,at = axTicks(2), labels = formatC(axTicks(2), format = "d", big.mark = ","))

As indicated in the chart, tornadoes were the largest cause of casualties, resulting in 13,588 injuries (45% of total) and 1,112 deaths (22%) over a 10 year span.

The table below gives statistics for the top events.

maxcasd <- maxcas
maxcasd$injuries <- formatC(maxcas$injuries, format = "d", big.mark = ",")
maxcasd$fatalities <- formatC(maxcas$fatalities, format = "d", big.mark = ",")
maxcasd$casualties <- formatC(maxcas$casualties, format = "d", big.mark = ",")
maxcasd$pct = paste(formatC(maxcas$pct*100, format = "f", digits = 1),"%", sep = "")
maxcasd$cumpct = paste(formatC(maxcas$cumpct*100, format = "f", digits = 1),"%", sep = "")
kable(maxcasd, 
      table.attr='class="myTable"',
      col.names = c("Event","Injuries","Fatalities","Casualties", "% Casualties", "Cum % Casualties"),
      align = c("l","r","r","r","c","c"), 
      format = "html",
      row.names = "F",
      caption = "Events with highest casualties (injuries + fatalities), in sorted order")
Events with highest casualties (injuries + fatalities), in sorted order
Event Injuries Fatalities Casualties % Casualties Cum % Casualties
TORNADO 13,588 1,112 14,700 42.4% 42.4%
EXCESSIVE HEAT 2,797 691 3,488 10.1% 52.5%
LIGHTNING 2,250 370 2,620 7.6% 60.0%
THUNDERSTORM WIND 1,400 130 1,530 4.4% 64.5%
HEAT 1,222 229 1,451 4.2% 68.6%
HURRICANE/TYPHOON 1,275 64 1,339 3.9% 72.5%
TSTM WIND 1,146 77 1,223 3.5% 76.0%
FLASH FLOOD 517 539 1,056 3.0% 79.1%
WILDFIRE 911 75 986 2.8% 81.9%
HIGH WIND 486 97 583 1.7% 83.6%
FLOOD 301 247 548 1.6% 85.2%
RIP CURRENT 208 340 548 1.6% 86.8%
HAIL 456 3 459 1.3% 88.1%
WINTER WEATHER 343 33 376 1.1% 89.2%

Events Resulting In Highest Damage Costs

The chart below shows the weather events that had the greatest economic impact over the 10-year period.

# Build table of costs

maxcost <- evtsum %>%
    select(EVTYPE, 
           Property_Damage = propcost, 
           Crop_Damage = cropcost, 
           cost,
           Cost_Per_Event = cost/occurrences) %>%
    arrange(desc(cost)) 

maxcost$pct <- maxcost$cost/totcost
maxcost$cumpct <- maxcost$pct

# Get the cumulative percent

for (i in 2:length(maxcost$EVTYPE)) {
    maxcost$cumpct[i] <- maxcost$cumpct[i-1]+maxcost$pct[i]}

# Limit table to at least 10 entries that represents at least 90% of all costs
maxcost <- maxcost %>%
    filter(cumpct <= 0.90 | row_number(cumpct) < 11)


costchartcumpct <- max(maxcost$cumpct)
topevt <- maxcost$EVTYPE[1]
topcost <- maxcost$cost[1]
topprop <- maxcost$Property_Damage[1]
topproppct <- topprop/topcost
topcropcost <- max(maxcost$Crop_Damage)
topcostperocc <- max(maxcost$Cost_Per_Event)
topcropevt <- maxcost[maxcost$Crop_Damage == topcropcost, "EVTYPE"][1]
topcostperoccevt <- maxcost[maxcost$Cost_Per_Event == topcostperocc, "EVTYPE"][1]

par(oma = c(6,1,1,1))
mcbp <- barplot(t(maxcost[,2:3]),
                col = brewer.pal(3, "Paired"),
                main = paste("Weather Events Accounting for Top ",
                             formatC(round(costchartcumpct*100), format = "d"),
                             "% of Damage Costs, 2002-2011",
                             sep=""),
                ylab = "Cost of Damage ($ Billions)",
                legend = colnames(maxcost[,2:3]),
                xaxt = "n",
                yaxt = "n")
axis(1,at = mcbp, labels = maxcost$EVTYPE, las = 3, cex.axis = .9) 
axis(2,at = axTicks(2), labels = formatC(axTicks(2)/1000000000, format = "f", big.mark = ",", digits = 0))

The top 10 weather events accounted for 94% of all costs. Floods represented 41% of all weather event-related damages during the period, with total costs of $137.0B, of which 97% was property damage. The largest source of crop damage costs was the event labeled “DROUGHT”, representing 29% of all such costs, for a total value of $5.4B.

The table below summarizes the cost statistics for the top causes of damage.

maxcostd <- maxcost
maxcostd$Property_Damage <- paste("$",formatC(maxcostd$Property_Damage/1000000000, 
                                              format = "f", 
                                              digits = 1,
                                              big.mark = ","),
                                  sep = "")
maxcostd$Crop_Damage <- paste("$",formatC(maxcostd$Crop_Damage/1000000000, 
                                          format = "f", 
                                          digits = 1, 
                                          big.mark = ","),
                              "B", 
                              sep = "")
maxcostd$cost <- paste("$",formatC(maxcostd$cost/1000000000, 
                                   format = "f", 
                                   digits = 1, 
                                   big.mark = ","),
                       "B", 
                       sep = "")
maxcostd$Cost_Per_Event <- paste("$",formatC(maxcostd$Cost_Per_Event, 
                                   format = "f", 
                                   digits = 0, 
                                   big.mark = ","),
                                 sep = "")
maxcostd$pct = paste(formatC(maxcostd$pct*100, format = "f", digits = 1),"%", sep = "")
maxcostd$cumpct = paste(formatC(maxcostd$cumpct*100, format = "f", digits = 1),"%", sep = "")
kable(maxcostd, 
      table.attr='class="myTable"',
      col.names = c("Event",
                    "Property Damage",
                    "Crop Damage",
                    "Total Damage", 
                    "Cost per Occurrence", 
                    "% Damage", 
                    "Cum % Damage"),
      align = c("l","r","r","r","r","c","c"), 
      format = "html",
      row.names = "F",
      caption = "Events with highest economic cost (property damage + crop damage), in sorted order")
Events with highest economic cost (property damage + crop damage), in sorted order
Event Property Damage Crop Damage Total Damage Cost per Occurrence % Damage Cum % Damage
FLOOD $133.4 $3.6B $137.0B $247 41.1% 41.1%
HURRICANE/TYPHOON $69.3 $2.6B $71.9B $64 21.6% 62.6%
STORM SURGE $43.2 $0.0B $43.2B $0 12.9% 75.6%
TORNADO $18.4 $0.2B $18.6B $1,112 5.6% 81.1%
FLASH FLOOD $10.7 $0.8B $11.5B $539 3.5% 84.6%
HAIL $9.2 $1.4B $10.6B $3 3.2% 87.8%
DROUGHT $0.8 $5.4B $6.3B $0 1.9% 89.6%
HIGH WIND $4.8 $0.5B $5.3B $97 1.6% 91.2%
WILDFIRE $4.8 $0.3B $5.1B $75 1.5% 92.8%
STORM SURGE/TIDE $4.6 $0.0B $4.6B $11 1.4% 94.1%

The event with the highest cost per occurrence was TORNADO, with a per-occurrence cost of $1,112. Although floods are the most costly in total, they have a lower cost per occurrence than either tornadoes or flash floods, which strike quickly and with little warning.

Conclusions

Tornadoes are by far the most dangerous type of weather event, in terms of both injuries and fatalities. This can be attributed to the fact that they strike quickly, with little warning, and have immense energy, thus making escape more challenging.

In terms of economic damage, on the other hand, floods were more costly than any other single event type; the cost per occurrence was lower than events like tornadoes, but they occurr with greater frequency. Perhaps not surprisingly, the greatest crop damage was due to droughts.

==========================================================================================