This study shows the most dramatic damages on health and wealth caused by disastrous natural events occurred in the USA between 1950 and 2011. The analysis is based on the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, for which complete documentation is available. The database keeps record of the time, the location, the type, the duration, the intensity, and the damages of atmospheric calamities. Based on this data, the analysis focus on which kind of calamities are most harmful in terms of health damages, both fatal and injuries, and economic damages as estimated by an insurance company.
Results show that on the health level, tornados and ice storms are the most harmful calamities, causing up to almost 2000 injured. Of notice, heat caused up to almost 600 victims, earning the title of most deadly calamity. On the economic level instead, floods are the most expensive calamities, causing damages up to hundreds of billions of dollars.
In conclusion, this simple analysis points out the kind of atmospheric events which have been more dangerous in the past, and which should be monitored in the future to limit damages on both health and economic level.
The analysis has been performed with the software RStudio (R version 3.4.4) on a platform x86_64-pc-linux-gnu (64-bit) with Ubuntu 18.04.1 LT as operating system. Data, which was download and processed in January 2019, consists in the NOAA database which takes the form of a comma-separated-value file compressed via the bzip2 algorithm. Data are loaded into R simply using the read.csv() function which automatically decompresses bzip2 files, as shown below.
fileUrl<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
filePath<-"repdata_data_StormData.csv.bz2"
if(!file.exists(filePath)){download.file(fileUrl,filePath,"libcurl")}
data<-read.csv(filePath,stringsAsFactors = FALSE);
The resulting data frame has 902297 observations and 37 variables, occupying 4.86962510^{8} bytes. Using the library dplyr, the original data is been subset including only variables of interest, i.e. the ones related to the location, the type, the health, and the economic damages. Then, the subset data is processed as follow: the variable related to the type of atmospheric event occurred (EVTYPE) is factorized, a new variable (healthDamage) is created adding the number of fatalities and injuries together, and another new variable (economicDamage) is created summing the amount of dollars estimated for property and crop damages. To achieve this last modification, the variables relative to property and crop damages have been transformed to the same magnitude, i.e. thousands of dollars, by modifying the exponent as a power of 10 (for example Million is equal to 10^6) and then divided by 1000 because of the very large amounts present in the original database.
library(dplyr)
var<-c("STATE","EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")
dataSub<-select(data,var)
head(dataSub,20)
## STATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 AL TORNADO 0 15 25.0 K 0
## 2 AL TORNADO 0 0 2.5 K 0
## 3 AL TORNADO 0 2 25.0 K 0
## 4 AL TORNADO 0 2 2.5 K 0
## 5 AL TORNADO 0 2 2.5 K 0
## 6 AL TORNADO 0 6 2.5 K 0
## 7 AL TORNADO 0 1 2.5 K 0
## 8 AL TORNADO 0 0 2.5 K 0
## 9 AL TORNADO 1 14 25.0 K 0
## 10 AL TORNADO 0 0 25.0 K 0
## 11 AL TORNADO 0 3 2.5 M 0
## 12 AL TORNADO 0 3 2.5 M 0
## 13 AL TORNADO 1 26 250.0 K 0
## 14 AL TORNADO 0 12 0.0 K 0
## 15 AL TORNADO 0 6 25.0 K 0
## 16 AL TORNADO 4 50 25.0 K 0
## 17 AL TORNADO 0 2 25.0 K 0
## 18 AL TORNADO 0 0 25.0 K 0
## 19 AL TORNADO 0 0 25.0 K 0
## 20 AL TORNADO 0 0 25.0 K 0
dataSub <- mutate(dataSub,EVTYPE=factor(EVTYPE)) # factorize the event type
dataSub <- mutate(dataSub,healthDamage=FATALITIES + INJURIES) # create a varible for health damages
# create a varible for the economic damage in Thuosands of $
exponentK<-function(x){
x<-sub("[Bb]","9",x);
x<-sub("[Mm]","6",x);
x<-sub("[Kk]","3",x);
x<-sub("[Hh]","2",x);
x<-sub("?|+|-|","0",x);
x<-(10^as.numeric(x))/1000
}
dataSub<-mutate(dataSub,propdmgK=PROPDMG*exponentK(PROPDMGEXP),cropdmgK=CROPDMG*exponentK(CROPDMGEXP))
## Warning in exponentK(PROPDMGEXP): NAs introduced by coercion
## Warning in exponentK(CROPDMGEXP): NAs introduced by coercion
dataSub <- mutate(dataSub,economicDamage=propdmgK + cropdmgK)
Then, two data frames containing the health and economic damages, respectively, are built. First, the subset data frame is grouped according to the type of the event. Then, using the function summarise(), for each type of event the maximum number of health damages is selected; also, the numbers of fatalities and injuries relative to the maximum are reported. For plotting porpuses, i.e. to distinguish between fatalities and injuries, an auxiliary data frame is built by merging the number of fatalities and injuries together, using the function melt() of the library reshape2.
dataEvtype<-dplyr::group_by(dataSub,EVTYPE)
dataResults1<-summarise(dataEvtype,health=max(healthDamage),fatality=FATALITIES[which.max(healthDamage)],injury=INJURIES[which.max(healthDamage)]);
dataResults1<-arrange(dataResults1,desc(health));head(dataResults1,10)
## # A tibble: 10 x 4
## EVTYPE health fatality injury
## <fct> <dbl> <dbl> <dbl>
## 1 TORNADO 1742 42 1700
## 2 ICE STORM 1569 1 1568
## 3 FLOOD 802 2 800
## 4 HURRICANE/TYPHOON 787 7 780
## 5 HEAT 583 583 0
## 6 EXCESSIVE HEAT 521 2 519
## 7 BLIZZARD 390 5 385
## 8 HEAT WAVE 202 2 200
## 9 TROPICAL STORM 201 1 200
## 10 HEAVY SNOW 185 0 185
library(reshape2)
dfHealth<-melt(dataResults1,id.vars="EVTYPE",value.name="count",variable.name="damage",measure.vars=c("fatality","injury"));
head(dfHealth)
## EVTYPE damage count
## 1 TORNADO fatality 42
## 2 ICE STORM fatality 1
## 3 FLOOD fatality 2
## 4 HURRICANE/TYPHOON fatality 7
## 5 HEAT fatality 583
## 6 EXCESSIVE HEAT fatality 2
For the economic damages, using the function summarise(), a data frame is built in which for each type of event the maximum number of economic damages is selected. In this case, the distinction between damages to properties and crops is not significant, hence it’s not further investigated.
dataResults2<-summarise(dataEvtype,economic=max(economicDamage,na.rm=TRUE));
dataResults2<-arrange(dataResults2,desc(economic));head(dataResults2,10)
## # A tibble: 10 x 2
## EVTYPE economic
## <fct> <dbl>
## 1 FLOOD 115032500
## 2 STORM SURGE 31300000
## 3 HURRICANE/TYPHOON 16930000
## 4 RIVER FLOOD 10000000
## 5 TROPICAL STORM 5150000
## 6 ICE STORM 5000500
## 7 WINTER STORM 5000000
## 8 STORM SURGE/TIDE 4000000
## 9 HURRICANE 3500000
## 10 TORNADO 2800000
Results are summarized by two figures created with the library ggplot2, one for the health damages and another for the economic damages. Note that on the y-axis is shown the log10 of the variable of interest due to the large range involved. Also, since there are 985 event types, only subsets of the results are shown in the following figures. In particular, for the results on the health damages, only the types of events with at least 31(=10^(1.5)) fatalities/injuries are shown (~40 types). For the results on the economic damages, only the types of events with at least 31000(=10^(4.5)) thousands of dollars of damage are shown (~60 types).
library(ggplot2)
g1<-ggplot(subset(dfHealth,log10(count)>=1.5))
p1<-g1+geom_bar(aes(EVTYPE,log10(count),fill=damage),stat="identity",position = "dodge") +
scale_fill_manual(values=c("coral2","cornflowerblue")) +
theme(axis.text.x = element_text(angle = 90,hjust = 1)) +
geom_vline(xintercept=29,color="black",size=1) +
xlab("event type") +
labs(title = "Victims of natural disaster events in USA") +
theme(plot.title = element_text(hjust = 0.5))
print(p1)
In regard to the health damages, the figure above shows that the most dangerous type of atmospheric event is tornados, marked by a black line, with up to 1700 injuries and 42 injuries. Slightly less dangerous are ice storms with up to 1568 injuries and fortunately no fatal victims. On the other hand, heat can cause up to 583 deaths, resulting in the most deadly type of event. Other types of events which causes more than 300 injuries are hurricanes/typhoons, floods, excessive heat, and blizzards.
g2<-ggplot(subset(dataResults2,log10(economic)>=4.5))
## Warning in eval(e, x, parent.frame()): NaNs produced
p2<-g2+geom_point(aes(EVTYPE,log10(economic)),size=3,color="blue4")+
geom_vline(xintercept=15,color="black",size=1) +
theme(axis.text.x = element_text(angle = 90,hjust = 1)) +
xlab("event type") + ylab("log10 of thousands of $") +
labs(title = "Economic damages of natural disaster events in USA")+
theme(plot.title = element_text(hjust = 0.5))
print(p2)
In regard of the economic damages, the figure above shows that the most expensive type of atmospheric event is floods, marked by a black line, with up to 115 millions of thousands of dollars of damages. Other significantly expensive atmosferic events are storm surges, hurricanes/typhoons, and river floods with up to 31, 16, and 10 millions of thousands of dollars of damages, respectively.
In conclusion, the atmospheric event which has the worst impact on the health and economic levels is the flood, although tornados can have more than twice the injuries and more than 2 millions of thousands of dollars of damages. Besides ranking the worst atmospheric event, it’s worth notice that there are many events which can cause substantial damages on the health and the economic level. For this reason, it’s important to monitor these types of events and possibly to plan interventions that can prevent, at least partially, the disastrous consequences of these natural calamities.