The purpose of this paper is to find out the most devastating natural disaster in terms of human health and economy. It is based on statistics collected by National Weather Service, National Oceanic and Atmospheric Administration from 1950 till Nov. 2011. Throughtout the study, Health effects were summed from both number of fatalities and enjuries, Whereas economic effects where summed from both expenditure and crop losses. In the study, it is discovered that Tornados are the most devastating natural disasters in terms of human casualities. For economic loss, Floods are the most in effect.
Through the study, data was aggregated for each natural disaster, summing up both (FATALITIES) and (INJURIES) ,in numbers, as casualities, and both (PROPDMG) and (CROPDMG) in USD as economic loss taking into consideration the PROPDMGEXP and CROPDMGEXP as multiplier fields.
The initial data was captured from
https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2
and described by National Weather Service website:
https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf
In the study RStudio was used, version 1.1 based on R 3.5.3.
setwd("D:/Box Sync/Data Science/Reproducible research")
df<- read.csv("repdata_data_StormData.csv")
# Create data frame with a new field summing up both fatalities and injuries as casualities(harm)
df1=data.frame(EVTYPE=df$EVTYPE, harm= df$FATALITIES+df$INJURIES)
# Aggregate (harm) by Event type
m1=aggregate(harm ~ EVTYPE, data=df1, sum)
# First define function multiplier that returns 1000 for "K, 1,000,000 for "M",
# and 1,000,000,000 for "B" in PROPDMGEXP and CROPDMGEXP fields.
multiplier= function(x)
{if (x=="K") {r=1000}
else if (x=="M") {r= 1000000}
else if (x=="B") {r=1000000000}
else {r = 0}
r
}
# limit the analysis to "EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP", "CROPDMG", "CROPDMGEXP" fields
d=df[, c(8,23,24,25,26,27,28)]
# Create data frame with a new field ecoloss (0) to be filled up
df2=data.frame(EVTYPE=d, ecoloss= 0)
# Fill up ecoloss field in (USD) by PROPDMG and CROPDMG multiplied by PROPDMGEXP and CROPDMGEXP
l=dim(df2)[1]
for (i in 1:l)
{
if (df2[i,4] !=0) {df2[i,8]=df2[i,4]*multiplier(df2[i,5])}
if (df2[i,6] !=0) {df2[i,8]=df2[i,8] + df2[i,6]*multiplier(df2[i,7])}
}
# Aggregate (ecoloss) by Event type
m2=aggregate(ecoloss ~ EVTYPE.EVTYPE, data=df2, sum)
# take only the Event types that are higher than the mean
m1_lim=m1[m1[,2]>mean(m1[,2]),]
barplot(m1_lim$harm, names.arg = m1_lim$EVTYPE,
main="Most harmful natural disasters for human health", cex.names = 0.5, cex.axis = 0.5)
# Maximim effect
max_events = max(m1_lim[,2])
max_events
## [1] 96979
max_effect=m1_lim[m1_lim[,2]==max_events,1]
c("Maximum Effect")
## [1] "Maximum Effect"
max_effect
## [1] TORNADO
## 985 Levels: HIGH SURF ADVISORY COASTAL FLOOD ... WND
# take only the Event types that are higher than the mean
m2_lim=m2[m2[,2]>mean(m2[,2]),]
barplot(m2_lim$ecoloss, names.arg = m2_lim$EVTYPE,
main="Most harmful natural disasters for economy", cex.names = 0.5, cex.axis = 0.5)
# Maximim loss in USD
max_loss = max(m2_lim[,2])
max_loss
## [1] 150319678250
max_effect_eco=m2_lim[m2_lim[,2]==max_loss,1]
c("Maximum economic loss")
## [1] "Maximum economic loss"
max_effect_eco
## [1] FLOOD
## 985 Levels: HIGH SURF ADVISORY COASTAL FLOOD ... WND
As it is showing, Tornados are having the most harmful effect on human health, while Floods are having the most ecomomic loss effect.