Synopsis

The purpose of this paper is to find out the most devastating natural disaster in terms of human health and economy. It is based on statistics collected by National Weather Service, National Oceanic and Atmospheric Administration from 1950 till Nov. 2011. Throughtout the study, Health effects were summed from both number of fatalities and enjuries, Whereas economic effects where summed from both expenditure and crop losses. In the study, it is discovered that Tornados are the most devastating natural disasters in terms of human casualities. For economic loss, Floods are the most in effect.

Through the study, data was aggregated for each natural disaster, summing up both (FATALITIES) and (INJURIES) ,in numbers, as casualities, and both (PROPDMG) and (CROPDMG) in USD as economic loss taking into consideration the PROPDMGEXP and CROPDMGEXP as multiplier fields.

The initial data was captured from

https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2

and described by National Weather Service website:

https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf

In the study RStudio was used, version 1.1 based on R 3.5.3.

Data Aquisition:

setwd("D:/Box Sync/Data Science/Reproducible research")
df<- read.csv("repdata_data_StormData.csv")

Data Processing for human casualities:

# Create data frame with a new field summing up both fatalities and injuries as casualities(harm)

df1=data.frame(EVTYPE=df$EVTYPE, harm= df$FATALITIES+df$INJURIES)

# Aggregate (harm) by Event type

m1=aggregate(harm ~ EVTYPE, data=df1, sum)

Data Processing for economic losses:

# First define function multiplier that returns 1000 for "K, 1,000,000 for "M",
# and 1,000,000,000 for "B" in PROPDMGEXP and CROPDMGEXP fields.

multiplier= function(x)

{if (x=="K") {r=1000}  
  else if (x=="M") {r= 1000000} 
  else if (x=="B") {r=1000000000} 
  else {r = 0} 
r
}

# limit the analysis to  "EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP", "CROPDMG",    "CROPDMGEXP" fields

d=df[, c(8,23,24,25,26,27,28)]

# Create data frame with a new field ecoloss (0) to be filled up


df2=data.frame(EVTYPE=d, ecoloss= 0)

# Fill up ecoloss field in (USD) by PROPDMG and CROPDMG multiplied by PROPDMGEXP and CROPDMGEXP

l=dim(df2)[1]

for (i in 1:l)
{
  if (df2[i,4] !=0)   {df2[i,8]=df2[i,4]*multiplier(df2[i,5])}
  if  (df2[i,6] !=0)  {df2[i,8]=df2[i,8] + df2[i,6]*multiplier(df2[i,7])}
}


# Aggregate (ecoloss) by Event type

m2=aggregate(ecoloss ~ EVTYPE.EVTYPE, data=df2, sum)

Data Display for Casualities

# take only the Event types that are higher than the mean 

m1_lim=m1[m1[,2]>mean(m1[,2]),]
barplot(m1_lim$harm, names.arg = m1_lim$EVTYPE, 
main="Most harmful natural disasters for human health", cex.names = 0.5, cex.axis = 0.5)

# Maximim effect

max_events = max(m1_lim[,2])
max_events
## [1] 96979
max_effect=m1_lim[m1_lim[,2]==max_events,1]
c("Maximum Effect")
## [1] "Maximum Effect"
max_effect
## [1] TORNADO
## 985 Levels:    HIGH SURF ADVISORY  COASTAL FLOOD ... WND

Data Display for Economic Losses

# take only the Event types that are higher than the mean 

m2_lim=m2[m2[,2]>mean(m2[,2]),]
barplot(m2_lim$ecoloss, names.arg = m2_lim$EVTYPE, 
main="Most harmful natural disasters for economy", cex.names = 0.5, cex.axis = 0.5)

# Maximim loss in USD

max_loss = max(m2_lim[,2])
max_loss
## [1] 150319678250
max_effect_eco=m2_lim[m2_lim[,2]==max_loss,1]
c("Maximum economic loss")
## [1] "Maximum economic loss"
max_effect_eco
## [1] FLOOD
## 985 Levels:    HIGH SURF ADVISORY  COASTAL FLOOD ... WND

Results

As it is showing, Tornados are having the most harmful effect on human health, while Floods are having the most ecomomic loss effect.