This study examines the Storm Data dataset to determine the worst natural disasters with respect to both health and economic effects in the US. We determined that tornadoes caused the most injury and dominated the list of worst health disasters, but heat-related disasters also caused a significant number of fatalities. In terms of economic impact, flooding was the worst disaster by a wide margin; hurricane/typhoons were also very expensive. We describe the details of our analysis below.
The data file was read using read.csv after setting the appropriate working directory and loading the required libraries.
### load the required libraries
library(plyr)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
library(knitr)
## Warning: package 'knitr' was built under R version 3.1.3
library(qdap)
## Warning: package 'qdap' was built under R version 3.1.3
## Loading required package: qdapDictionaries
## Warning: package 'qdapDictionaries' was built under R version 3.1.3
## Loading required package: qdapRegex
## Warning: package 'qdapRegex' was built under R version 3.1.3
##
## Attaching package: 'qdapRegex'
##
## The following object is masked from 'package:ggplot2':
##
## %+%
##
## Loading required package: qdapTools
## Warning: package 'qdapTools' was built under R version 3.1.3
##
## Attaching package: 'qdapTools'
##
## The following object is masked from 'package:plyr':
##
## id
##
## Loading required package: RColorBrewer
## Warning: package 'RColorBrewer' was built under R version 3.1.3
##
## Attaching package: 'qdap'
##
## The following object is masked from 'package:base':
##
## Filter
library(reshape2)
### Set the working directory
setwd("C:/Users/Rachel/Documents/R Programming/reproddata")
### read in the data
storm_data <- read.csv("repdata-data-StormData.csv.bz2")
Then, I subsetted the data to include only the columns relevant for analysis. Since we are interested in only the the most damaging events, I also removed the data where a storm was recorded, but no bodily damage or monetary cost occurred.
### include only health or property-related columns
storm_data2 <- subset(storm_data, select=c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG",
"PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))
### drop the entries where nothing bad happened
storm_data3 <- subset(storm_data2, FATALITIES != 0 | INJURIES != 0 | PROPDMG !=0 |
CROPDMG !=0)
The monetary values are coded in by letter: ‘k’ for thousands of dollars, ‘m’ for millions of dollars, and ‘b’ for billions of dollars. For the purpose of this analysis, I removed all data that did not code in money damages this way.
## convert factor in exponent to characters and make all lowercase
storm_data3$PROPDMGEXP <- tolower(as.character(storm_data3$PROPDMGEXP))
storm_data3$CROPDMGEXP <- tolower(as.character(storm_data3$CROPDMGEXP))
## drop the entries with irregular EXP values (only include k, m, or b)
storm_data4 <- subset(storm_data3, (PROPDMGEXP == 'k'|PROPDMGEXP =='m'|PROPDMGEXP =='b')
&(CROPDMGEXP == 'k'|CROPDMGEXP == 'm'|CROPDMGEXP == 'b'))
I then converted the exponent codes into real numbers, then multiplied them by their respective coefficients to get the monetary damage in US dollars.
## Convert exponents to real numbers
# do a multiple gsub to replace with characters
storm_data4$PROPDMGEXP <- mgsub(c('k','m', 'b'), c('1000','1000000','1000000000'),
storm_data4$PROPDMGEXP)
storm_data4$CROPDMGEXP <- mgsub(c('k','m', 'b'), c('1000','1000000','1000000000'),
storm_data4$CROPDMGEXP)
# convert characters to numeric
storm_data4$PROPDMGEXP <- as.numeric(storm_data4$PROPDMGEXP)
storm_data4$CROPDMGEXP <- as.numeric(storm_data4$CROPDMGEXP)
# Multiply property and crop damage by exponent, create new columns called CROP_CASH
# and PROP_CASH
storm_data4$PROP_CASH <- storm_data4$PROPDMG * storm_data4$PROPDMGEXP
storm_data4$CROP_CASH <- storm_data4$CROPDMG * storm_data4$CROPDMGEXP
To answer this question, I looked to the number of fatalities and injuries caused by different storm events. There are many possible ways to evaluate the severity of health based on these two criteria, but I chose to rank severity by total number of fatalities, rather than the total number of both fatalities and injuries. For example, tornadoes caused fewer fatalities than excessive heat, but a drastically greater number of injuries. I considered events to be worse if they caused more fatalities, even if fewer individuals were harmed overall.
Below, I plot the 25 most deadly events, and show the average and standard deviation of both fatalities and injuries in boxplot format. The number of injuries and fatalities are plotted on a logarithmic scale for clarity. Injuries are shown in blue, whereas fatalities are shown in red. Heat events are the most deadly, but tornadoes make up the majority of the 25 worst events for health. I would say that tsunamis are also similarly severe, followed by floods, wildfires, and hurricane-like events.
## create new column that highlights total bodily harm
storm_data4$TOTALHARM <- storm_data4$INJURIES + storm_data4$FATALITIES
## Rank health severity by fatalities first, total harm second
## beacause dying is worse than being injured, obviously
storm_health <- arrange(storm_data4, FATALITIES, TOTALHARM, decreasing=TRUE)
## Just look at the worst 25 events
storm_health_bad <- storm_health[1:25,]
## Reshape the data in order to plot by injuries/fatalities
storm_health_melt <- melt(storm_health_bad)
## Using EVTYPE as id variables
storm_health_melt2 <- subset(storm_health_melt, variable == 'FATALITIES'|
storm_health_melt$variable == 'INJURIES')
## plot the data
hp <- qplot(EVTYPE, value, data=storm_health_melt2, geom='boxplot',
main = "Most Harmful Storm Types",
ylab = "Total Harm", xlab = "Storm Event Type", color=variable,
fig.width = 8, fig.height = 8, dpi = 144)
hp+scale_y_log10()+theme_bw()+theme(plot.title = element_text(size=18, face="bold"),
axis.text=element_text(size=14),axis.title=element_text(size=16,face="bold"),
axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 6 rows containing non-finite values (stat_boxplot).
The table below compares the number of injuries, fatalities, and total bodily harm for all of the 25 worst health disasters.
storm_health_disp <- subset(storm_health_bad,select=c(EVTYPE, FATALITIES, INJURIES,
TOTALHARM))
print(storm_health_disp)
## EVTYPE FATALITIES INJURIES TOTALHARM
## 1 TORNADO 158 1150 1308
## 2 EXCESSIVE HEAT 46 18 64
## 3 TORNADO 44 800 844
## 4 TORNADO 32 258 290
## 5 TSUNAMI 32 129 161
## 6 TORNADO 27 12 39
## 7 TORNADO 27 0 27
## 8 TORNADOES, TSTM WIND, HAIL 25 0 25
## 9 TORNADO 23 0 23
## 10 TORNADO 22 150 172
## 11 TORNADO 20 700 720
## 12 HEAT 20 225 245
## 13 FLASH FLOOD 20 24 44
## 14 TORNADO 18 100 118
## 15 TORNADO 16 37 53
## 16 HEAT 16 0 16
## 17 HURRICANE/TYPHOON 15 104 119
## 18 FLOOD 15 2 17
## 19 TORNADO 14 200 214
## 20 WILDFIRE 14 90 104
## 21 TORNADO 14 0 14
## 22 TORNADO 13 44 57
## 23 TORNADO 13 30 43
## 24 TORNADO 13 9 22
## 25 HURRICANE 13 0 13
To answer this question, I considered crop damage and property damage to be equal in economic severity, and simply added them together to calculate the total monetary cost of these events.
Below, I plot the ten most expensive storm events (also on a logarithmic scale, for clarity). Floods (general) are by far the most expensive disasters. Unlike tornadoes, which tear a relatively narrow path of destruction through a town, floods are delocalized and have a much larger effective “radius,” which explains their high relative cost. River floods are likewise expensive. In decreasing severity, the other most expensive storms are hurricane/typhoon, ice storm, storm surge/tide, hurricane, and tornado.
## create new column that highlights total cost
storm_data4$TOTALCOST <- storm_data4$CROP_CASH + storm_data4$PROP_CASH
storm_money <- arrange(storm_data4, TOTALCOST, decreasing=TRUE)
## Just look at the 10 most expensive events
storm_money_bad <- storm_money[1:10,]
## Plot the data
mp <- qplot(EVTYPE, TOTALCOST, data=storm_money_bad, geom="boxplot",
main = "Most Expensive Storm Types", ylab = "Total Cost (USD)",
xlab = "Storm Event Type", color=EVTYPE)
mp+scale_y_log10()+theme_bw()+theme(plot.title = element_text(size=18, face="bold"),
axis.text=element_text(size=14), axis.title=element_text(size=16,face="bold"),
axis.text.x = element_text(angle = 90, hjust = 1))
The table below compares the cost (in USD) of property and crop damage, and the total cost of the most expensive disasters.
storm_money_disp <- subset(storm_money_bad,select=c(EVTYPE, CROP_CASH, PROP_CASH, TOTALCOST))
print(storm_money_disp)
## EVTYPE CROP_CASH PROP_CASH TOTALCOST
## 1 FLOOD 3.25e+07 1.15e+11 115032500000
## 2 RIVER FLOOD 5.00e+09 5.00e+09 10000000000
## 3 HURRICANE/TYPHOON 1.51e+09 5.88e+09 7390000000
## 4 HURRICANE/TYPHOON 2.85e+08 5.42e+09 5705000000
## 5 ICE STORM 5.00e+09 5.00e+05 5000500000
## 6 HURRICANE/TYPHOON 9.32e+07 4.83e+09 4923200000
## 7 HURRICANE/TYPHOON 2.50e+07 4.00e+09 4025000000
## 8 STORM SURGE/TIDE 0.00e+00 4.00e+09 4000000000
## 9 HURRICANE 5.00e+08 3.00e+09 3500000000
## 10 TORNADO 0.00e+00 2.80e+09 2800000000