Analysis of Injury and Damage from Storms

Synopsis

The impact of storms, hurricanes, tornadoes and other weather events is a significant costs across the United States. This impact can come in both human terms in the form of fatalities and injuries, and financial terms with both property and agricultural (crop) damage. This analysis quantifies this impact. Weather events are categorized into a number of different types, including rain, snow, ice, heat, cold, thunderstorms, droughts, and flooding, just to mention a few. One of the challenges of this analysis is the need to consolidate the self-reported event type information into usable data. The original NOAA data includes 985 different event type, and these are reduced to 23 for the analysis. The end result of this analysis is to identify the five top causes of injuries and fatalities, and of property and crop damage.

Data Processing

In this section, the data are downloaded from the source location. The data are downloaded from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2. Additional documentation is available from https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf. Clean up variable names to make all lowercase and eliminate underscore characters. Convert EVTYPE to be title case.

One challenge of this analysis is the inconsistent nature of event coding. Each reporting center uses its own discretion in labeling the nature of the event. As a result, for example, there are at least 120 different event codes used for events related to Thunderstorm damage. In all, the data file contains 985 unique event type. As a result, it seemed appropriate to attempt to collapse the number of events into a more manageable number by collapsing similar events into a larger category.

To this end, the researcher has used his best judgement to recode the 985 events into 23 unique categories for this analysis. These are the event codes used in all subsequent analysis.

## Load data from URL into data.frame n1
URL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(URL, "NOAA.csv.bz2")
n0 <- read.csv("NOAA.csv.bz2", stringsAsFactors = FALSE)
n1 <- n0  ##  [sample(nrow(n0), size = 10000, replace = FALSE),]
## Transform the column names to be lower case legal names, transfer the event types to be title case.
names(n1) <- tolower(make.names(names(n1), unique = T, allow_ = F))
n1[,8] <- stri_trans_totitle(n1[,8])

## read the CSV file created by the researcher and which mapes the 985 event types 
## from the raw data into 23 major event types for the analysis.
eventRecodeLabels <- read.csv(file = "EventRecodeLabels.csv")

## This loop proceeses all of the records in the raw data file and creates the 
## recoded event codes. Note - this takes approx 2 hours to process.
for (i in 1:nrow(n1)) {
      n1[i,38] <- names(which(sapply(eventRecodeLabels, function(x) any(x == n1[i,8]))))
}
colnames(n1)[38] <- "eventRecode"

Analysis

The goal of the analysis is to determine across the United States: 1) Which types of events tracked in the NOAA database are the most harmful with respect to human health, and 2) Which types of events have the greatest economic consequences?

This section performs the supporting computations and analysis to determine the answers to these two questions.

## Aggregate Data for Fatalities and Injuries by event type

## Calculate the total sum of fatalities, injuries, property and crop damage. 
## Convert dollar values to millions
sumofFat <- sum(n1$fatalities)
sumofInj <- sum(n1$injuries)
sumofPrp <- sum(n1$propdmg)/1000
sumofCrp <- sum(n1$cropdmg)/1000

## aggregate data by eventRecode types
n1.fatal <- aggregate(n1$fatalities ~ n1$eventRecode, data = n1, sum)
n1.injur <- aggregate(n1$injuries ~ n1$eventRecode, data = n1, sum)
n1.prop.damg <- aggregate(n1$propdmg ~ n1$eventRecode, data = n1, sum)
n1.crop.damg <- aggregate(n1$cropdmg ~ n1$eventRecode, data = n1, sum)
n1.crop.damg[,2]  <- n1.crop.damg[,2]/1000
n1.prop.damg[,2]  <- n1.prop.damg[,2]/1000

n1.human <- merge(n1.fatal, n1.injur)
n1.human$total.impact <- n1.human$`n1$fatalities` + n1.human$`n1$injuries`

n1.money <- merge(n1.prop.damg, n1.crop.damg)
n1.money$total.impact <- n1.money$`n1$propdmg` + n1.money$`n1$cropdmg`

## Get the top 5 for each grouping.
n1.human.inj <- slice(n1.human[order(-n1.human$`n1$injuries`),], 1:5)
n1.human.fat <- slice(n1.human[order(-n1.human$`n1$fatalities`),], 1:5)
n1.human.tot <- slice(n1.human[order(-n1.human$total.impact),], 1:5)

## get the total sum for each of the topic 5.
topfive.inj <- sum(n1.human.inj$`n1$injuries`)
topfive.fat <- sum(n1.human.fat$`n1$fatalities`)
topfive.fatinj <- sum(n1.human.tot$total.impact)


n1.money.prp <- slice(n1.money[order(-n1.money$`n1$propdmg`),], 1:5)
n1.money.crp <- slice(n1.money[order(-n1.money$`n1$cropdmg`),], 1:5)
n1.money.tot <- slice(n1.money[order(-n1.money$total.impact),], 1:5)

topfive.prp <- sum(n1.money.prp$`n1$propdmg`)
topfive.crp <- sum(n1.money.crp$`n1$cropdmg`)
topfive.prpcrp <- sum(n1.money.tot$total.impact)

top5pct.inj <- topfive.inj/sumofInj
top5pct.fat <- topfive.fat/sumofFat
top5pct.prp <- topfive.prp/sumofPrp
top5pct.crp <- topfive.crp/sumofCrp

Results

The goal of this analysis is to identify the most harmful and most costly meteorological events in the United States.

Harmful events can be defined as those causing either death or injury. The most costly events are those that cause property damage or crop damage.

Injuries and Fatalities

Looking first a harm to humans, statistics are collected for both fatalities and injuries. Some events tend to result in more injuries, while others are more like to result in fatalities. The chart below shows the five leading causes of combined deaths and injuries, fatalities only, and injuries only, by event type.

par(mfrow = c(3,1), oma = c(2,2,2,2), mar = c(2,4,4,2))

barplot(n1.human.tot$total.impact,names.arg = n1.human.tot$`n1$eventRecode`, axes = TRUE, axis.lty = 1, col = "lightpink1")
mtext("Top 5 Causes of Fatalities and Injuries", outer = T, cex = 1.25)
title(main = "Fatalities and Injuries Combined by Event Type", 
      ylab = "Number of Fatalities and Injuries (actual)", outer = F)
              box()

barplot(n1.human.fat$`n1$fatalities`,names.arg = n1.human.fat$`n1$eventRecode`, axes = TRUE, axis.lty = 1, col = "lightblue")
        title(main = "Fatalities by Event Type",
              ylab = "Number of Fatalities (actual)", 
              outer = F)
        box()

barplot(n1.human.inj$`n1$injuries`,names.arg = n1.human.inj$`n1$eventRecode`, axes = TRUE, axis.lty = 1,  col = "lightgreen")
        title(main = "Injuries by Event Type",
              ylab = "Number of Injuries (actual)", 
              outer = F)
        box()

When it comes to fatalities, the leading cause is Tornado (5,665 deaths), followed by Heat.Related (3,177 deaths), Floods (1,582 deaths), Thuderstorms (867 deaths), and Lightning (817 deaths), for a total of 12,108 deaths, which is 79.9% of the total of 15,145 deaths.

For injuries, the leading cause is Tornado (91,439 injuries), followed by Thuderstorms (9,931 injuries), Heat.Related (9,243 injuries), Floods (8,802 injuries), and Lightning (5,232 injuries), for a total of 124,647 injuries, which is 88.7% of the total of 140,528 deaths.

Damage to Property and Crops

Turning now to financial damage, statistics are collected for both property damage (residential and commercial) and crop damage. There is more diversity of the types of events that cause property damage when compared to crop damage. The chart below shows the five leading causes of combined damage (property and crop), property damage only, and crop damage only, by event type.

par(mfrow = c(3,1), oma = c(2,2,2,2), mar = c(2,4,4,2))

barplot(n1.money.tot$total.impact,names.arg = n1.money.tot$`n1$eventRecode`, axes = TRUE, axis.lty = 1, col = "lightpink1")
mtext("Top 5 Causes of All Financial Damages", outer = T, cex = 1.25)
title(main = "Property and Crop Damage Combined by Event Type",
              ylab = "Dollar Value of Damage ($Millions)", 
              outer = F)
              box()

barplot(n1.money.prp$`n1$propdmg`,names.arg = n1.money.prp$`n1$eventRecode`, axes = TRUE, axis.lty = 1, col = "lightblue")
        title(main = "Propery Damage by Event Type",
              ylab = "Dollar Value of Damage ($Millions)", 
              outer = F)
        box()

barplot(n1.money.crp$`n1$cropdmg`,names.arg = n1.money.crp$`n1$eventRecode`, axes = TRUE, axis.lty = 1,  col = "lightgreen")
        title(main = "Crop Damage by Event Type",
              ylab = "Dollar Value of Damage ($Millions)", 
              outer = F)
        box()

When it comes to financial damages to property, the leading cause is Tornado ($3,225.53 million), followed by Thuderstorms ($2,745.66 million), Floods ($2,443.91 million), Hail ($689.81 million), and Lightning ($603.44 million), for a total of $9,708.35 million in property damage, which is 89.2% of the total of $10,884.50 million in property damage.

When it comes to financial damages to crops, the leading cause is Hail ($581.47 million), followed by Floods ($366.86 million), Thuderstorms ($201.38 million), Tornado ($100.03 million), and Drought.Related ($33.90 million), for a total of $1,283.64 million in crop damage, which is 93.2% of the total of $1,377.83 million in crop damage. .

Summary

In summary, it appears that Tornadoes present the greatest risk in terms of human harm (injury and death) and property damage. The major exceptions is the impact of hail on crop damage. As is often the case, the top five sources of damage and industry account for anywhere from 80% to 93% of the total impact.