This analysis report basically address below two questions:
- Across the United States, which types of events are most harmful with respect to population health? Report used Population Damage units (1 Death= 5 Injuries) model to calculate the overall population health damage and listed the top 5 event type.
- Across the United States, which types of events have the greatest economic consequences?
Below codes listed how the raw data is read into R and is processed to the clean dataset for analysis.
Raw dataset is subset to the necessary columns including date, country, state, fatality/injuries reports, proprity and crop damange and their exponent number. Date is extracted from “BGN_DATE” column, year data is also prepared for year based analysis.
##read data
raw<-read.csv("repdata-data-StormData.csv.bz2")
data<-subset(raw,select=c("BGN_DATE","COUNTYNAME","STATE","EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP"))
##extract date information
data$Date<-as.Date(as.POSIXct(as.character(data$BGN_DATE),format = "%m/%d/%Y %H:%M:%S",tz="US/Central"),tz="US/Central")
data$BGN_DATE<-NULL
data$Year<-substr(data$Date,1,4)
##calcualte total years span
ny <- 2011-1950
##plot(table(data$Year)) this shows that events increases by years, but this maybe
## due to the data collection tools and reporting is more modern by years.
To answer first question, we introduced the model of People Heath Damange Units (1Death=5injurires=5 PHD units). We add the deaths and injries units and aggregate by event types. We can see that the EVTYPE is very messy categorized, but due to lack of more convincable and reasonable classifcation /group rules, for this analysis , we kept them as-is.
##subset data for problem 1
p1<-subset(data,select=c("Date","Year","EVTYPE","FATALITIES","INJURIES"))
## add score model of people health damage based on weight of fatalities : injuries as 5:1
p1$phdmg<-p1$FATALITIES*5+p1$INJURIES
## calculate the average year damage score for each type of events.
dmg<-aggregate(p1$phdmg,by=list(type=p1$EVTYPE),sum)
names(dmg)<-c("type","phdmg")
##sort event by damage descending order and get top 5 events
dmg<-dmg[order(dmg$phdmg,decreasing=T),]
dmg5<-dmg[1:5,]
##sort TORNADO events and aggreagate yearly damage data.
tor<-p1[p1$EVTYPE=="TORNADO",]
dmgtor<-aggregate(tor$phdmg,by=list(Year=tor$Year),sum)
names(dmgtor)<-c("Year","DamageUnits")
To answer the second question, we added up the propriety loss with crop damage. Before that,we unify all exponent symbol into numbers eg. change “B” to 10e9, “M” to 10e6 etc.. After sum is done to the total damange column, we aggregate the all time events sum by event type. Again, we kept the EVTYPE as-is due to lack of a convincable and reasonable categorization guideline.
p2<-subset(data,select=c("Date","Year","EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP"))
## turn PROPDMGEXP into character
p2$PROPDMGEXP<-as.character(p2$PROPDMGEXP)
##clean damage units to exponent value, treat small unkown symbols as exp=0
p2[p2$PROPDMGEXP=="?"|p2$PROPDMGEXP=="+"|p2$PROPDMGEXP=="-","PROPDMGEXP"]<-""
## H/h for a hundred
p2[p2$PROPDMGEXP=="h"|p2$PROPDMGEXP=="H","PROPDMGEXP"]<-"2"
## K/k for a thousand
p2[p2$PROPDMGEXP=="k"|p2$PROPDMGEXP=="K","PROPDMGEXP"]<-"3"
## M/m for a million
p2[p2$PROPDMGEXP=="m"|p2$PROPDMGEXP=="M","PROPDMGEXP"]<-"6"
## B/b for a billion
p2[p2$PROPDMGEXP=="b"|p2$PROPDMGEXP=="B","PROPDMGEXP"]<-"9"
##change to numeric for exponent
p2$PROPDMGEXP<-as.numeric(as.character(p2$PROPDMGEXP))
p2$pdmg<-p2$PROPDMG*10^p2$PROPDMGEXP
## do the same for CROP Damage... OMG
p2$CROPDMGEXP<-as.character(p2$CROPDMGEXP)
p2[p2$CROPDMGEXP=="?"|p2$CROPDMGEXP=="+"|p2$CROPDMGEXP=="-","CROPDMGEXP"]<-""
## H/h for a hundred
p2[p2$CROPDMGEXP=="h"|p2$CROPDMGEXP=="H","CROPDMGEXP"]<-"2"
## K/k for a thousand
p2[p2$CROPDMGEXP=="k"|p2$CROPDMGEXP=="K","CROPDMGEXP"]<-"3"
## M/m for a million
p2[p2$CROPDMGEXP=="m"|p2$CROPDMGEXP=="M","CROPDMGEXP"]<-"6"
## B/b for a billion
p2[p2$CROPDMGEXP=="b"|p2$CROPDMGEXP=="B","CROPDMGEXP"]<-"9"
##change to numeric for exponent
p2$CROPDMGEXP<-as.numeric(as.character(p2$CROPDMGEXP))
p2$cdmg<-p2$CROPDMG*10^p2$CROPDMGEXP
##change NA value to zero for calculation total damage
p2[is.na(p2$pdmg),"pdmg"]<-0
p2[is.na(p2$cdmg),"cdmg"]<-0
p2$totaldmg<-p2$pdmg+p2$cdmg
dmg2<-aggregate(p2$totaldmg,by=list(type=p2$EVTYPE),sum)
names(dmg2)<-c("type","totaldamage")
dmg2<-dmg2[order(dmg2$totaldamage,decreasing=T),]
The top five event type that caused most population health damage is listed in below barchart(left). We used 1 Death = 5 Injuries= 5Units calculation method. From this chart, we can see the top damage type is TORNADO.
Another chart(right) shows the TORNADO population damange throught years.
par(mfrow=c(1,2),mar=c(4,4,3,2))
barplot(dmg5$phdmg,names.arg=dmg5$type,col="Grey10",cex.names=0.8,
xlab="Events Type",ylab="Population Damage Units")
title("Top 5 Population Health Damage Hazards in US",cex.main=0.7)
plot(as.character(dmgtor$Year),dmgtor$DamageUnits,type="l",lwd=2,
col="Grey10",xlab="Year",ylab="Population Damange Units")
title("Tornado Population Damage 1950-2011",cex.main=0.8)
The top five event type that caused most economy damage is listed in below barchart. We calculated the sum of propriety and crop damage and ranked based on this total number. From this chart, we can see the top damage type is FLOOD.
##plotting top economy loss events
dmg2top5<-dmg2[1:5,]
par(mfrow=c(1,1),mar=c(5,4,1,1))
barplot(dmg2top5$totaldamage,names.arg=dmg2top5$type,col="Grey10",cex.names=0.8,
xlab="Events Type",ylab="Economy Damange (USD)")
title("Top 5 Economy Damage Weather Hazards in US",cex.main=1.2)
Thank you for reviewing this report. The author shows no interest of conflicts with any other research/report.