Financial and Public Health Costs of Weather Events in the U.S.

SYNOPSIS

In this analysis I investigate various types of weather events and their consequences on: human health in terms of injury and fatality, and their economic toll measured through property and crop damage. My main hypothesis is that the events that cause the most economic damage are also the types of natural disasters that lead to the most injuries and fatalities. To test this assertion, I used data from the U.S. National Oceanic and Atmospheric Administration's storm database. These data record statistics on the characteristics of major storms and weather events. My findings indicate that the leading causes of fatalities and injuries are not the events that lead to the most financial damage. Most injuries and fatalities are the result of tornados, whereas, most of the financial damage across the United States is caused by flooding.

Data Processing

The data was downloaded from the class website and is a 'csv' file that comes as a 'bzip2' file. Asumming the zip file is in your working directory, the following code will read in the data.

data <- read.csv(bzfile("repdata-data-StormData.csv.bz2"), header = TRUE)

Results

The most harmful events in the US with respect to health.

First, subset the relevant variables and only the observations that include fatalities or injuries.

subData <- subset(data, data$FATALITIES != 0 & data$INJURIES != 0, select = c("EVTYPE", 
    "FATALITIES", "INJURIES"))

The following code makes a data frame with the number of fatalities per disaster type then creates a vector(“fatalOrd”) with the types in a descending order which will be passed to the plot function.

fatalFrame <- na.omit(as.data.frame(tapply(subData$FATALITIES, subData$EVTYPE, 
    sum)))
names(fatalFrame)[1] <- "Number.Deaths"
fatalOrd <- fatalFrame[order(fatalFrame$Number.Deaths, decreasing = TRUE), ]

Show the top ten causes of fatality.

head(fatalOrd, 10)
##        TORNADO EXCESSIVE HEAT      LIGHTNING      TSTM WIND    FLASH FLOOD 
##           5227            402            283            199            171 
##          FLOOD      HIGH WIND   WINTER STORM           HEAT       WILDFIRE 
##            104            102             85             73             55

Do the same as above but for the number of injuries per disaster type.

injurFrame <- na.omit(as.data.frame(tapply(subData$INJURIES, subData$EVTYPE, 
    sum)))
names(injurFrame)[1] <- "Number.Injury"
injurOrd <- injurFrame[order(injurFrame$Number.Injury, decreasing = TRUE), ]

Show the top ten causes of injury.

head(injurOrd, 10)
##           TORNADO    EXCESSIVE HEAT             FLOOD         ICE STORM 
##             60187              4791              2679              1720 
##              HEAT HURRICANE/TYPHOON          BLIZZARD         LIGHTNING 
##              1420              1219               718               649 
##         TSTM WIND       FLASH FLOOD 
##               646               641

Plot the results.

par(mfrow = c(2, 1), font.lab = 4, cex.axis = 0.75, mar = c(4.5, 4.5, 2, 1))
barplot(fatalOrd[1:5], col = "lightblue", xlab = "", ylab = "Number of Fatalities", 
    main = "Most Harmful Events Across U.S.")
text(c(0.75, 2, 3, 4.3, 5.5), c(4500, 1000, 1000, 1000, 1000), labels = c("5227", 
    "402", "283", "199", "171"))
barplot(injurOrd[1:5], col = "red", xlab = "Event Types", ylab = "Number of Injuries")
text(c(0.75, 2, 3, 4.3, 5.5), c(50000, 10000, 10000, 10000, 10000), labels = c("60187", 
    "4791", "2679", "1720", "1420"))

plot of chunk unnamed-chunk-7

From the plots, we can see that tornados account for most fatalities and injuries across the United States and the difference between tornados and other event types is visibly large.

Economic consequences of event types across the US.

First we subset the data by event type(“EVTYPE”),and by the variables that account for damage(“PROPDMG”,“PROPDMGEXP”,“CROPDMG”,“CROPDMGEXP”).

damageData <- subset(data, data$PROPDMG != 0 & data$CROPDMG != 0, select = c("EVTYPE", 
    "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))

Here's what the data look like.

str(damageData)
## 'data.frame':    16242 obs. of  5 variables:
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 409 786 405 408 408 786 786 834 834 812 ...
##  $ PROPDMG   : num  0.1 5 25 48 20 50 500 500 500 5 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 14 19 19 19 18 17 17 17 17 17 ...
##  $ CROPDMG   : num  10 500 1 4 10 50 50 5 50 15 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 9 7 9 9 8 7 7 7 7 7 ...

Inspecting the “PROPDMGEXP” and “CROPDMGEXP” variables, we can see that they are multi-leveled.

levels(damageData$PROPDMGEXP)
##  [1] ""  "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
levels(damageData$CROPDMGEXP)
## [1] ""  "?" "0" "2" "B" "k" "K" "m" "M"

Page twelve of the 'Storm Data Documentation' which can be found at https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf, explains that these letters refer to amount classifications for the corresponding observation linking this value to the appropriate damage variables. For instance, if 'PROPDMG' equals 2.1 and “PROPDMGEXP” is 'B', then the property damage for that observation is 2.1 billion. “B” stands for 'billions', “K” for 'thousands' and “M” for 'millions'. “H” is unaccounted for but if the pattern holds, then it should stand for 'hundreds'. The other symbols don't have any classifications that were mentioned in the documentation.

Checking for the proportion of observations with known classifications, shows that these known categories make up most of the observations.

nrow(damageData[damageData$PROPDMGEXP %in% c("B", "h", "H", "K", "m", "M") & 
    damageData$CROPDMGEXP %in% c("B", "k", "K", "m", "M"), ])/nrow(damageData)
## [1] 0.9984

Given that the unknown classifications make up a small fraction of the observations, they will be left out of the analysis.

Next, we will subset the observations that have defined classifications.

subDamDat <- damageData[damageData$PROPDMGEXP %in% c("B", "h", "H", "K", "m", 
    "M") & damageData$CROPDMGEXP %in% c("B", "k", "K", "m", "M"), ]

We will convert each letter to upper case then map these to their numerical value for later computation.

subDamDat$PROPDMGEXP <- toupper(subDamDat$PROPDMGEXP)
subDamDat$CROPDMGEXP <- toupper(subDamDat$CROPDMGEXP)

We'll use the 'mapvalues' function from the 'plyr' package to change the factor levels to their numerical equivalent.

library(plyr)
subDamDat$PROPDMGEXP <- mapvalues(subDamDat$PROPDMGEXP, from = c("B", "M", "K", 
    "H"), to = c(10^9, 10^6, 10^3, 10^2))
## The following `from` values were not present in `x`: H
subDamDat$CROPDMGEXP <- mapvalues(subDamDat$CROPDMGEXP, from = c("B", "M", "K"), 
    to = c(10^9, 10^6, 10^3))
subDamDat$PROPDMGEXP <- as.numeric(subDamDat$PROPDMGEXP)
subDamDat$CROPDMGEXP <- as.numeric(subDamDat$CROPDMGEXP)

Now we will create a new variable that will be the total sum of damages to both property and crops.

subDamDat <- transform(subDamDat, totalDamage = (PROPDMG * PROPDMGEXP) + (CROPDMG * 
    CROPDMGEXP))

Here's how the current data looks.

str(subDamDat)
## 'data.frame':    16216 obs. of  6 variables:
##  $ EVTYPE     : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 409 786 405 408 408 786 786 834 834 812 ...
##  $ PROPDMG    : num  0.1 5 25 48 20 50 500 500 500 5 ...
##  $ PROPDMGEXP : num  1e+09 1e+06 1e+06 1e+06 1e+06 1e+03 1e+03 1e+03 1e+03 1e+03 ...
##  $ CROPDMG    : num  10 500 1 4 10 50 50 5 50 15 ...
##  $ CROPDMGEXP : num  1e+06 1e+03 1e+06 1e+06 1e+06 1e+03 1e+03 1e+03 1e+03 1e+03 ...
##  $ totalDamage: num  1.10e+08 5.50e+06 2.60e+07 5.20e+07 3.00e+07 1.00e+05 5.50e+05 5.05e+05 5.50e+05 2.00e+04 ...
head(subDamDat)
##                           EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 187566 HURRICANE OPAL/HIGH WINDS     0.1      1e+09      10      1e+06
## 187571        THUNDERSTORM WINDS     5.0      1e+06     500      1e+03
## 187581            HURRICANE ERIN    25.0      1e+06       1      1e+06
## 187583            HURRICANE OPAL    48.0      1e+06       4      1e+06
## 187584            HURRICANE OPAL    20.0      1e+06      10      1e+06
## 187653        THUNDERSTORM WINDS    50.0      1e+03      50      1e+03
##        totalDamage
## 187566     1.1e+08
## 187571     5.5e+06
## 187581     2.6e+07
## 187583     5.2e+07
## 187584     3.0e+07
## 187653     1.0e+05

Make a data frame that has the sum of damages separated by levels of the event type, and order it in descending order.

damageFrame <- na.omit(as.data.frame(tapply(subDamDat$totalDamage, subDamDat$EVTYPE, 
    sum)))
names(damageFrame)[1] <- "Sum.Damage"
damageOrd <- damageFrame[order(damageFrame$Sum.Damage, decreasing = TRUE), ]

A call to 'head' gives you the top ten event types that cause the most damage.

head(damageOrd, 10)
##             FLOOD HURRICANE/TYPHOON         HURRICANE       RIVER FLOOD 
##         1.260e+11         2.935e+10         1.050e+10         1.011e+10 
##         ICE STORM       FLASH FLOOD              HAIL           TORNADO 
##         5.109e+09         4.309e+09         3.814e+09         2.336e+09 
##    HURRICANE OPAL         HIGH WIND 
##         2.187e+09         1.919e+09

Finally, plot the results.

par(cex.axis = 0.7, mar = c(7.5, 4, 3, 1))
barplot(damageOrd[1:5], main = "Most Economically Damaging Events(US)", xlab = "", 
    ylab = "Damage (in Billions of $)", col = "orange", yaxt = "n", xaxt = "n")
axis(side = 2, at = c(0, 4e+10, 8e+10, 1.2e+11), labels = c(0, 40, 80, 120), 
    las = 2)
axis(side = 1, at = c(0.6, 2, 3.2, 4.4, 5.7), labels = c("FLOOD", "HURRICANE/TYPHOON", 
    "HURRICANE", "RIVER FLOOD", "ICE STORM"), las = 2)

plot of chunk unnamed-chunk-19

This plot displays the top five most damaging events with 'FLOOD' being more than three times more damaging than the other categories.