In this analysis, an exploration is done to study disaster impact in U.S. from year 1950 and end in November 2011. The sum impact of population health and economic damage are aggregated by each disaster type and plotted in barcharts. In the result section, Tornado and Flood are found to be the most severe disaster in population health and economic damage respectively. In the last section, the Tornado location is plotted on top of map to identify the area which are prone to Tornado disaster.
url<-'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
if (!file.exists('data.csv.bz2')) download.file(url,'data.csv.bz2',method='curl')
library(readr)
data<-read_csv('data.csv.bz2',col_types='dcccdcccdccccdcdccdddddddcdccccddddcd',progress=F)
##
|================================================================================| 100% 535 MB
sortDF<-function(df,by){
df<-df[order(by,decreasing = T),]
return (df)
}
transformEXP<-function(var){
if (class(var)!= 'character') var=sapply(var,as.character)
x=tolower(var)
x[x=='b']<-9
x[x=='h']<-2
x[x=='k']<-3
x[x=='m']<-6
x[x %in% c("","-","?","+")]<-0
x=as.integer(x)
return (x)
}
data$healthImpact<-data$FATALITIES+data$INJURIES
healthImpactByType<-aggregate(healthImpact~EVTYPE,data=data,sum)
healthImpactByType<-sortDF(healthImpactByType,healthImpactByType$healthImpact)
data$PROPDMGEXP<-transformEXP(data$PROPDMGEXP)
data$CROPDMGEXP<-transformEXP(data$CROPDMGEXP)
data$PropLoss<-with(data,(PROPDMG*(10**PROPDMGEXP)+CROPDMG*(10**CROPDMGEXP)))
PropLossByType<-aggregate(PropLoss~EVTYPE,data=data,sum)
PropLossByType<-sortDF(PropLossByType,PropLossByType$PropLoss)
Tornado is found to be the most severe disaster in U.S. in terms of population health.
Flood is found to be the most severe disaster in U.S. in terms of economic damage.
library(ggplot2)
ggplot(healthImpactByType[1:10,],aes(EVTYPE,healthImpact))+geom_bar(stat='identity',fill='red')+theme_bw()+coord_flip()+
labs(title='Disaster Impact on Health',x='Type of Disaster',y='Fatality+Injury')
ggplot(PropLossByType[1:10,],aes(EVTYPE,PropLoss))+geom_bar(stat='identity',fill='blue')+theme_bw()+coord_flip()+
labs(title='Disaster Impact on Economic Loss',x='Type of Disaster',y='Property Loss+Crop Loss')
library(ggmap)
Tornado<-data[data$EVTYPE=='TORNADO',]
Tornado$LONGITUDE<-(-Tornado$LONGITUDE/100)
Tornado$LATITUDE<-Tornado$LATITUDE/100
map<-get_map('United States',zoom=4)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=United+States&zoom=4&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
Tornado$size<-round(Tornado$healthImpact / max(Tornado$healthImpact) *10 ,2)
ggmap(map)+geom_jitter(aes(LONGITUDE,LATITUDE,size=size),data=Tornado,alpha=0.05,colour='red')
## Warning: Removed 1081 rows containing missing values (geom_point).
The tornado is mostly happened in center and southeast coast of the United States.