Synopsis

This study examines the Storm Data dataset to determine the worst natural disasters with respect to both health and economic effects in the US. We determined that tornadoes caused the most injury and dominated the list of worst health disasters, but heat-related disasters also caused a significant number of fatalities. In terms of economic impact, flooding was the worst disaster by a wide margin; hurricane/typhoons were also very expensive. We describe the details of our analysis below.

Data Processing

The data file was read using read.csv after setting the appropriate working directory and loading the required libraries.

### load the required libraries
library(plyr)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
library(knitr)
## Warning: package 'knitr' was built under R version 3.1.3
library(qdap)
## Warning: package 'qdap' was built under R version 3.1.3
## Loading required package: qdapDictionaries
## Warning: package 'qdapDictionaries' was built under R version 3.1.3
## Loading required package: qdapRegex
## Warning: package 'qdapRegex' was built under R version 3.1.3
## 
## Attaching package: 'qdapRegex'
## 
## The following object is masked from 'package:ggplot2':
## 
##     %+%
## 
## Loading required package: qdapTools
## Warning: package 'qdapTools' was built under R version 3.1.3
## 
## Attaching package: 'qdapTools'
## 
## The following object is masked from 'package:plyr':
## 
##     id
## 
## Loading required package: RColorBrewer
## Warning: package 'RColorBrewer' was built under R version 3.1.3
## 
## Attaching package: 'qdap'
## 
## The following object is masked from 'package:base':
## 
##     Filter
library(reshape2)
### Set the working directory
setwd("C:/Users/Rachel/Documents/R Programming/reproddata")

### read in the data
storm_data <- read.csv("repdata-data-StormData.csv.bz2")

Then, I subsetted the data to include only the columns relevant for analysis. Since we are interested in only the the most damaging events, I also removed the data where a storm was recorded, but no bodily damage or monetary cost occurred.

### include only health or property-related columns
storm_data2 <- subset(storm_data, select=c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG",
    "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))

### drop the entries where nothing bad happened
storm_data3 <- subset(storm_data2, FATALITIES != 0 | INJURIES != 0 | PROPDMG !=0 | 
  CROPDMG !=0)

The monetary values are coded in by letter: ‘k’ for thousands of dollars, ‘m’ for millions of dollars, and ‘b’ for billions of dollars. For the purpose of this analysis, I removed all data that did not code in money damages this way.

## convert factor in exponent to characters and make all lowercase
storm_data3$PROPDMGEXP <- tolower(as.character(storm_data3$PROPDMGEXP))
storm_data3$CROPDMGEXP <- tolower(as.character(storm_data3$CROPDMGEXP))

## drop the entries with irregular EXP values (only include k, m, or b)
storm_data4 <- subset(storm_data3, (PROPDMGEXP == 'k'|PROPDMGEXP =='m'|PROPDMGEXP =='b')
    &(CROPDMGEXP == 'k'|CROPDMGEXP == 'm'|CROPDMGEXP == 'b'))

I then converted the exponent codes into real numbers, then multiplied them by their respective coefficients to get the monetary damage in US dollars.

## Convert exponents to real numbers

# do a multiple gsub to replace with characters
storm_data4$PROPDMGEXP <- mgsub(c('k','m', 'b'), c('1000','1000000','1000000000'),
  storm_data4$PROPDMGEXP)
storm_data4$CROPDMGEXP <- mgsub(c('k','m', 'b'), c('1000','1000000','1000000000'),
  storm_data4$CROPDMGEXP)

# convert characters to numeric
storm_data4$PROPDMGEXP <- as.numeric(storm_data4$PROPDMGEXP)
storm_data4$CROPDMGEXP <- as.numeric(storm_data4$CROPDMGEXP)

# Multiply property and crop damage by exponent, create new columns called CROP_CASH 
# and PROP_CASH
storm_data4$PROP_CASH <- storm_data4$PROPDMG * storm_data4$PROPDMGEXP
storm_data4$CROP_CASH <- storm_data4$CROPDMG * storm_data4$CROPDMGEXP

Results

What’s the worst type of event, health-wise?

To answer this question, I looked to the number of fatalities and injuries caused by different storm events. There are many possible ways to evaluate the severity of health based on these two criteria, but I chose to rank severity by total number of fatalities, rather than the total number of both fatalities and injuries. For example, tornadoes caused fewer fatalities than excessive heat, but a drastically greater number of injuries. I considered events to be worse if they caused more fatalities, even if fewer individuals were harmed overall.

Below, I plot the 25 most deadly events, and show the average and standard deviation of both fatalities and injuries in boxplot format. The number of injuries and fatalities are plotted on a logarithmic scale for clarity. Injuries are shown in blue, whereas fatalities are shown in red. Heat events are the most deadly, but tornadoes make up the majority of the 25 worst events for health. I would say that tsunamis are also similarly severe, followed by floods, wildfires, and hurricane-like events.

## create new column that highlights total bodily harm
storm_data4$TOTALHARM <- storm_data4$INJURIES + storm_data4$FATALITIES

## Rank health severity by fatalities first, total harm second
## beacause dying is worse than being injured, obviously
storm_health <- arrange(storm_data4, FATALITIES, TOTALHARM, decreasing=TRUE)

## Just look at the worst 25 events
storm_health_bad <- storm_health[1:25,]

## Reshape the data in order to plot by injuries/fatalities
storm_health_melt <- melt(storm_health_bad)
## Using EVTYPE as id variables
storm_health_melt2 <- subset(storm_health_melt, variable == 'FATALITIES'|
  storm_health_melt$variable == 'INJURIES')
## plot the data
hp <- qplot(EVTYPE, value, data=storm_health_melt2, geom='boxplot', 
            main = "Most Harmful Storm Types", 
            ylab = "Total Harm", xlab = "Storm Event Type", color=variable,
            fig.width = 8, fig.height = 8, dpi = 144)
hp+scale_y_log10()+theme_bw()+theme(plot.title = element_text(size=18, face="bold"),
            axis.text=element_text(size=14),axis.title=element_text(size=16,face="bold"),
            axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 6 rows containing non-finite values (stat_boxplot).

The table below compares the number of injuries, fatalities, and total bodily harm for all of the 25 worst health disasters.

storm_health_disp <- subset(storm_health_bad,select=c(EVTYPE, FATALITIES, INJURIES,
      TOTALHARM))
print(storm_health_disp)
##                        EVTYPE FATALITIES INJURIES TOTALHARM
## 1                     TORNADO        158     1150      1308
## 2              EXCESSIVE HEAT         46       18        64
## 3                     TORNADO         44      800       844
## 4                     TORNADO         32      258       290
## 5                     TSUNAMI         32      129       161
## 6                     TORNADO         27       12        39
## 7                     TORNADO         27        0        27
## 8  TORNADOES, TSTM WIND, HAIL         25        0        25
## 9                     TORNADO         23        0        23
## 10                    TORNADO         22      150       172
## 11                    TORNADO         20      700       720
## 12                       HEAT         20      225       245
## 13                FLASH FLOOD         20       24        44
## 14                    TORNADO         18      100       118
## 15                    TORNADO         16       37        53
## 16                       HEAT         16        0        16
## 17          HURRICANE/TYPHOON         15      104       119
## 18                      FLOOD         15        2        17
## 19                    TORNADO         14      200       214
## 20                   WILDFIRE         14       90       104
## 21                    TORNADO         14        0        14
## 22                    TORNADO         13       44        57
## 23                    TORNADO         13       30        43
## 24                    TORNADO         13        9        22
## 25                  HURRICANE         13        0        13

What’s the worst type of event, money-wise?

To answer this question, I considered crop damage and property damage to be equal in economic severity, and simply added them together to calculate the total monetary cost of these events.

Below, I plot the ten most expensive storm events (also on a logarithmic scale, for clarity). Floods (general) are by far the most expensive disasters. Unlike tornadoes, which tear a relatively narrow path of destruction through a town, floods are delocalized and have a much larger effective “radius,” which explains their high relative cost. River floods are likewise expensive. In decreasing severity, the other most expensive storms are hurricane/typhoon, ice storm, storm surge/tide, hurricane, and tornado.

## create new column that highlights total cost
storm_data4$TOTALCOST <- storm_data4$CROP_CASH + storm_data4$PROP_CASH
storm_money <- arrange(storm_data4, TOTALCOST, decreasing=TRUE)

## Just look at the 10 most expensive events
storm_money_bad <- storm_money[1:10,]
## Plot the data
mp <- qplot(EVTYPE, TOTALCOST, data=storm_money_bad, geom="boxplot", 
            main = "Most Expensive Storm Types", ylab = "Total Cost (USD)", 
            xlab = "Storm Event Type", color=EVTYPE)
mp+scale_y_log10()+theme_bw()+theme(plot.title = element_text(size=18, face="bold"),
            axis.text=element_text(size=14), axis.title=element_text(size=16,face="bold"),
            axis.text.x = element_text(angle = 90, hjust = 1))

The table below compares the cost (in USD) of property and crop damage, and the total cost of the most expensive disasters.

storm_money_disp <- subset(storm_money_bad,select=c(EVTYPE, CROP_CASH, PROP_CASH, TOTALCOST))
print(storm_money_disp)
##               EVTYPE CROP_CASH PROP_CASH    TOTALCOST
## 1              FLOOD  3.25e+07  1.15e+11 115032500000
## 2        RIVER FLOOD  5.00e+09  5.00e+09  10000000000
## 3  HURRICANE/TYPHOON  1.51e+09  5.88e+09   7390000000
## 4  HURRICANE/TYPHOON  2.85e+08  5.42e+09   5705000000
## 5          ICE STORM  5.00e+09  5.00e+05   5000500000
## 6  HURRICANE/TYPHOON  9.32e+07  4.83e+09   4923200000
## 7  HURRICANE/TYPHOON  2.50e+07  4.00e+09   4025000000
## 8   STORM SURGE/TIDE  0.00e+00  4.00e+09   4000000000
## 9          HURRICANE  5.00e+08  3.00e+09   3500000000
## 10           TORNADO  0.00e+00  2.80e+09   2800000000