In this report we aim to find out the natural disasters that are the most harmful, on both the human health and the economic levels between 1950 and 2012 in the US.
Our overall hypothesis is that tornados are by far the most destructive due to their seasonal consistency in the US and to being unpreventable. however, the average tornado may not be as damaging as other natural disasters that are not as common.
The data is obtained from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. and can be downloaded as a bzip2 file from Here
The data is download and extacted through this R code:
if(!file.exists("data.bz2"))
fzip<-download.file(destfile = "data.bz2" ,"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2")
df<-read.csv("data.bz2")
total number of observations :
nrow(df)
## [1] 902297
then we can convert the data variable to a proper Date type. we also add two additional variables that contain the total property damage and the total crop damage.
df[,2]<-as.Date(df[,2],"%m/%d/%Y")
## the f() function coverts the exponent values "kilo","million","billion" to their corresponding numeric values
f<-function(a){
a<-as.character(a)
for(i in 1:length(a)){
if(a[i]=="K")
a[i]<-1000
else if (a[i]=="M")
a[i]<-1000000
else if (a[i]=="M")
a[i]<-1000000000
else
a[i]<-1
}
as.numeric(a)
}
df$totalpropdmg<-df$PROPDMG * f(df$PROPDMGEXP)
df$totalcropdmg<-df$CROPDMG * f(df$CROPDMGEXP)
we also include som packages that we’re going to need later.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(reshape2)
We’re going to look into the impact of these distasters on both the population health and the economy.
Human casulaties can can be categorized into fatalities and injuries. so we visualize both the sum and the mean of those two variables by the corresponding event type (EVTYPE)
by calculating the sum of casualties per each type of event , we can find teh total damage dealt by each type of disaster between 1950-2012
df2<-group_by(df,EVTYPE) ##we group the data by groups
## TOTAL INJURIES / FATALITIES
x<-summarize(df2, fatalities=sum(FATALITIES),injuries=sum(INJURIES)) ##we create a data frame with the event type and its corresponding total fatalities and injuries
y<-x ##we're going to need the y later
x<-arrange(x,-(injuries+fatalities)) ##we sort our obtained data frame by the sum of injuries and fatalities
x<-x[1:10,] ## we only take the first 10 rows (with most injuries+fatalities) to be able to visialize them
x<-melt(x,id.vars="EVTYPE") ##we the variables
## we plot out barplot
g<-ggplot(data=as.data.frame(x),aes(x=EVTYPE,y=(value), fill=variable)) +
geom_bar(stat="identity" , aes(fill=variable)) +
scale_x_discrete(limits=rev(x[1:10,][[1]]))+
coord_flip() +
scale_fill_brewer(palette="Set1")+
labs(title="Total human casualties per event type (1950-2012)", x="Event type" , y="Casualties")
g
NOTE : We can notice that tornados have caused more human casualties than any other type of disaster.
In fact, tornados alone have caused almost double the casualties of the rest of the distasters combined !
sprintf("Total tornados caualties : %g",sum(y[y$EVTYPE=="TORNADO",]$injuries))
## [1] "Total tornados caualties : 91346"
sprintf("Total non-tornados casualties : %g",sum(y[!y$EVTYPE=="TORNADO",]$injuries))
## [1] "Total non-tornados casualties : 49182"
the sum of the casualties may not give us a clear idea on which single instace(occurence) of a disaster can be the most damaging/deadliest, since tornados are more common than any other natural type of disaster in the US. So we have to plot the same barplot above , but with the mean of casualties per type of disaster instead of the sum.
### MEAN INJURIES / FATALTIES
x<-summarize(df2, fatalities=mean(FATALITIES),injuries=mean(INJURIES))
y<-x
x<-arrange(x,-(injuries+fatalities))
x<-x[1:10,]
x<-melt(x,id.vars="EVTYPE")
h<-ggplot(data=as.data.frame(x),aes(x=EVTYPE,y=(value), fill=variable)) +
geom_bar(stat="identity" , aes(fill=variable)) +
scale_x_discrete(limits=rev(x[1:10,][[1]]))+
coord_flip() +
scale_fill_brewer(palette="Set1")+
labs(title="Mean of the casualties per event type (1950-2012)", x="Event type" , y="Casualties")
h
NOTE : If the event can be predicted or forseen , it’s better to use the data from the second barplot of the mean of the casualties.
the Economic losses can be calculated by summming up the estimated Property damage and Crop damage.
x<-summarise(df2,dmg=sum(totalpropdmg+totalcropdmg))
x<-arrange(x,-dmg)
x<-x[1:10,]
g<-ggplot(x,aes(x=EVTYPE , y=dmg , fill=dmg)) + geom_bar(stat="identity")+
coord_flip()+
scale_x_discrete(limits=rev(x[1:10,][[1]]))+
labs(title="Total economic damage per disaster type (1950-2012)", x="event type" , y="Economic damage")
g