This report studied different damage created by different storm types. The data used in this report is from U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database started in 1950 and ended in November 2011. In detail, it studied population health damage and property and crop economy damage by different strom types. It also list top 10 most harmful storm types to population and economy.
Load the data from csv files:
data<-read.csv("repdata-data-StormData.csv.bz2")
We want to investigate the damage resulted from each type of event. So the only columns we needed from the data set are EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP. First, we need to do some data cleaning. There are some typos in the data entry. All damages are entered as dollar amount, DMG is the number and DMGEXP is the unit of the number. As explained in the data handbook, the unit should be only k,m,b. But when we print the levels of these units, it shows some other charactors. We need to clean them out.
levels(data$PROPDMGEXP)
## [1] "" "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
levels(data$CROPDMGEXP)
## [1] "" "?" "0" "2" "B" "k" "K" "m" "M"
We convert everyting to lower case:
data$PROPDMGEXP<-tolower(data$PROPDMGEXP)
data$CROPDMGEXP<-tolower(data$CROPDMGEXP)
To be more efficient, we need to subset the data to the only columns we need:
storm<-data[which(data$PROPDMGEXP %in% list("","k","m","b") & data$CROPDMGEXP %in% list("","k","m","b")),c(8,23,24,25,26,27,28)]
To explore the damage to health and economy, we define two new varaibles summing up for these two categories. Health is the total number of fatalities and injuries. Economy is the total dollar amount of property damage and crop damage.
storm$health<-storm$FATALITIES+storm$INJURIES
storm$propunit<-1
storm[which(storm$PROPDMGEXP=="k"),]$propunit<-1000
storm[which(storm$PROPDMGEXP=="m"),]$propunit<-1000000
storm[which(storm$PROPDMGEXP=="b"),]$propunit<-1000000000
storm$cropunit<-1
storm[which(storm$CROPDMGEXP=="k"),]$cropunit<-1000
storm[which(storm$CROPDMGEXP=="m"),]$cropunit<-1000000
storm[which(storm$CROPDMGEXP=="b"),]$cropunit<-1000000000
storm$economy<-storm$PROPDMG*storm$propunit+storm$CROPDMG*storm$cropunit
Now we take a look at the event type, part of the result looks like following:
levels(storm$EVTYPE)[677:750]
## [1] "STRONG WIND GUST" "Strong winds"
## [3] "Strong Winds" "STRONG WINDS"
## [5] "Summary August 10" "Summary August 11"
## [7] "Summary August 17" "Summary August 2-3"
## [9] "Summary August 21" "Summary August 28"
## [11] "Summary August 4" "Summary August 7"
## [13] "Summary August 9" "Summary Jan 17"
## [15] "Summary July 23-24" "Summary June 18-19"
## [17] "Summary June 5-6" "Summary June 6"
## [19] "Summary of April 12" "Summary of April 13"
## [21] "Summary of April 21" "Summary of April 27"
## [23] "Summary of April 3rd" "Summary of August 1"
## [25] "Summary of July 11" "Summary of July 2"
## [27] "Summary of July 22" "Summary of July 26"
## [29] "Summary of July 29" "Summary of July 3"
## [31] "Summary of June 10" "Summary of June 11"
## [33] "Summary of June 12" "Summary of June 13"
## [35] "Summary of June 15" "Summary of June 16"
## [37] "Summary of June 18" "Summary of June 23"
## [39] "Summary of June 24" "Summary of June 3"
## [41] "Summary of June 30" "Summary of June 4"
## [43] "Summary of June 6" "Summary of March 14"
## [45] "Summary of March 23" "Summary of March 24"
## [47] "SUMMARY OF MARCH 24-25" "SUMMARY OF MARCH 27"
## [49] "SUMMARY OF MARCH 29" "Summary of May 10"
## [51] "Summary of May 13" "Summary of May 14"
## [53] "Summary of May 22" "Summary of May 22 am"
## [55] "Summary of May 22 pm" "Summary of May 26 am"
## [57] "Summary of May 26 pm" "Summary of May 31 am"
## [59] "Summary of May 31 pm" "Summary of May 9-10"
## [61] "Summary Sept. 25-26" "Summary September 20"
## [63] "Summary September 23" "Summary September 3"
## [65] "Summary September 4" "Summary: Nov. 16"
## [67] "Summary: Nov. 6-7" "Summary: Oct. 20-21"
## [69] "Summary: October 31" "Summary: Sept. 18"
## [71] "Temperature record" "THUDERSTORM WINDS"
## [73] "THUNDEERSTORM WINDS" "THUNDERESTORM WINDS"
We noticed that there are some summary in the event type which is not actually not an event. So we need to take them out.
storm$event<-tolower(storm$EVTYPE)
realstorm<-storm[which(substr(storm$event,1,7)!="summary"),]
Now we calculate the health and economy damage for each event type:
Health<-tapply(realstorm$health,realstorm$event,FUN=sum)
Health<-data.frame(Health)
Health$event<-rownames(Health)
Economy<-tapply(realstorm$economy,realstorm$event,FUN=sum)
Economy<-data.frame(Economy)
Economy$event<-rownames(Economy)
After data processing, we can finally answer the following questions:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
To explore the most harmful event to population health, we take the max of total harmed people:
Health[which(Health$Health==max(Health$Health)),]
## Health event
## tornado 96951 tornado
So the answer is tornado. To see how it is compared to other top 10 event, we make the following plot
sortHealth<-Health[order(-Health$Health),]
top10Health<-sortHealth[c(1:10),]
top10Health
## Health event
## tornado 96951 tornado
## excessive heat 8428 excessive heat
## tstm wind 7461 tstm wind
## flood 7259 flood
## lightning 6046 lightning
## heat 3037 heat
## flash flood 2755 flash flood
## ice storm 2064 ice storm
## thunderstorm wind 1621 thunderstorm wind
## winter storm 1527 winter storm
plot(top10Health$Health,xaxt="n",xlab="Event",main="Top 10 Harmful Event to Population Health")
axis(1, at=1:10,labels=rownames(top10Health))
To explore the most harmful event to economy, we take the max dollar amount of economy damage:
Economy[which(Economy$Economy==max(Economy$Economy)),]
## Economy event
## flood 1.503e+11 flood
So the answer is flood. To see how it is compared to other top 10 event, we make the following plot
sortEconomy<-Economy[order(-Economy$Economy),]
top10Economy<-sortEconomy[c(1:10),]
top10Economy
## Economy event
## flood 1.503e+11 flood
## hurricane/typhoon 7.191e+10 hurricane/typhoon
## tornado 5.730e+10 tornado
## storm surge 4.332e+10 storm surge
## hail 1.873e+10 hail
## flash flood 1.756e+10 flash flood
## drought 1.502e+10 drought
## hurricane 1.461e+10 hurricane
## river flood 1.015e+10 river flood
## ice storm 8.967e+09 ice storm
plot(top10Economy$Economy,xaxt="n",xlab="Event",main="Top 10 Harmful Event to Economy")
axis(1, at=1:10,labels=rownames(top10Economy))