An Analysis of Impact of U.S. Disasters on Population Health and Economic Damage

Synopsis:

In this analysis, an exploration is done to study disaster impact in U.S. from year 1950 and end in November 2011. The sum impact of population health and economic damage are aggregated by each disaster type and plotted in barcharts. In the result section, Tornado and Flood are found to be the most severe disaster in population health and economic damage respectively. In the last section, the Tornado location is plotted on top of map to identify the area which are prone to Tornado disaster.

Data Processing:

Downloading and Loading dataset:

url<-'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
if (!file.exists('data.csv.bz2')) download.file(url,'data.csv.bz2',method='curl')
library(readr)
data<-read_csv('data.csv.bz2',col_types='dcccdcccdccccdcdccdddddddcdccccddddcd',progress=F)
## 
|================================================================================| 100%  535 MB
  • function sort data descendingly
sortDF<-function(df,by){
  df<-df[order(by,decreasing = T),]
  return (df)
}
  • function transform EXP to integers
transformEXP<-function(var){
  if (class(var)!= 'character') var=sapply(var,as.character)
  x=tolower(var)
  x[x=='b']<-9
  x[x=='h']<-2
  x[x=='k']<-3
  x[x=='m']<-6
  x[x %in% c("","-","?","+")]<-0
  x=as.integer(x)
  return (x)
}

Aggregate Fatality and Injury by Type:

data$healthImpact<-data$FATALITIES+data$INJURIES
healthImpactByType<-aggregate(healthImpact~EVTYPE,data=data,sum)
healthImpactByType<-sortDF(healthImpactByType,healthImpactByType$healthImpact)

Aggregate Economic Loss by Type:

data$PROPDMGEXP<-transformEXP(data$PROPDMGEXP)
data$CROPDMGEXP<-transformEXP(data$CROPDMGEXP)
data$PropLoss<-with(data,(PROPDMG*(10**PROPDMGEXP)+CROPDMG*(10**CROPDMGEXP)))
PropLossByType<-aggregate(PropLoss~EVTYPE,data=data,sum)
PropLossByType<-sortDF(PropLossByType,PropLossByType$PropLoss)

Results:

Plot the top 10 disaster types:

Tornado is found to be the most severe disaster in U.S. in terms of population health.
Flood is found to be the most severe disaster in U.S. in terms of economic damage.

library(ggplot2)
ggplot(healthImpactByType[1:10,],aes(EVTYPE,healthImpact))+geom_bar(stat='identity',fill='red')+theme_bw()+coord_flip()+
  labs(title='Disaster Impact on Health',x='Type of Disaster',y='Fatality+Injury')

plot of chunk unnamed-chunk-6

ggplot(PropLossByType[1:10,],aes(EVTYPE,PropLoss))+geom_bar(stat='identity',fill='blue')+theme_bw()+coord_flip()+
  labs(title='Disaster Impact on Economic Loss',x='Type of Disaster',y='Property Loss+Crop Loss')

plot of chunk unnamed-chunk-6

Plot the location of the Tornado

library(ggmap)
Tornado<-data[data$EVTYPE=='TORNADO',]
Tornado$LONGITUDE<-(-Tornado$LONGITUDE/100)
Tornado$LATITUDE<-Tornado$LATITUDE/100
map<-get_map('United States',zoom=4)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=United+States&zoom=4&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
Tornado$size<-round(Tornado$healthImpact / max(Tornado$healthImpact) *10 ,2)
ggmap(map)+geom_jitter(aes(LONGITUDE,LATITUDE,size=size),data=Tornado,alpha=0.05,colour='red')
## Warning: Removed 1081 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-7

The tornado is mostly happened in center and southeast coast of the United States.