United States Storms Damage Analysis

This report studied different damage created by different storm types. The data used in this report is from U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database started in 1950 and ended in November 2011. In detail, it studied population health damage and property and crop economy damage by different strom types. It also list top 10 most harmful storm types to population and economy.

Data Processing

Load the data from csv files:

data<-read.csv("repdata-data-StormData.csv.bz2")

We want to investigate the damage resulted from each type of event. So the only columns we needed from the data set are EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP. First, we need to do some data cleaning. There are some typos in the data entry. All damages are entered as dollar amount, DMG is the number and DMGEXP is the unit of the number. As explained in the data handbook, the unit should be only k,m,b. But when we print the levels of these units, it shows some other charactors. We need to clean them out.

levels(data$PROPDMGEXP)
##  [1] ""  "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
levels(data$CROPDMGEXP)
## [1] ""  "?" "0" "2" "B" "k" "K" "m" "M"

We convert everyting to lower case:

data$PROPDMGEXP<-tolower(data$PROPDMGEXP)
data$CROPDMGEXP<-tolower(data$CROPDMGEXP)

To be more efficient, we need to subset the data to the only columns we need:

storm<-data[which(data$PROPDMGEXP %in% list("","k","m","b") & data$CROPDMGEXP %in% list("","k","m","b")),c(8,23,24,25,26,27,28)]

To explore the damage to health and economy, we define two new varaibles summing up for these two categories. Health is the total number of fatalities and injuries. Economy is the total dollar amount of property damage and crop damage.

storm$health<-storm$FATALITIES+storm$INJURIES
storm$propunit<-1
storm[which(storm$PROPDMGEXP=="k"),]$propunit<-1000
storm[which(storm$PROPDMGEXP=="m"),]$propunit<-1000000
storm[which(storm$PROPDMGEXP=="b"),]$propunit<-1000000000
storm$cropunit<-1
storm[which(storm$CROPDMGEXP=="k"),]$cropunit<-1000
storm[which(storm$CROPDMGEXP=="m"),]$cropunit<-1000000
storm[which(storm$CROPDMGEXP=="b"),]$cropunit<-1000000000
storm$economy<-storm$PROPDMG*storm$propunit+storm$CROPDMG*storm$cropunit

Now we take a look at the event type, part of the result looks like following:

levels(storm$EVTYPE)[677:750]
##  [1] "STRONG WIND GUST"       "Strong winds"          
##  [3] "Strong Winds"           "STRONG WINDS"          
##  [5] "Summary August 10"      "Summary August 11"     
##  [7] "Summary August 17"      "Summary August 2-3"    
##  [9] "Summary August 21"      "Summary August 28"     
## [11] "Summary August 4"       "Summary August 7"      
## [13] "Summary August 9"       "Summary Jan 17"        
## [15] "Summary July 23-24"     "Summary June 18-19"    
## [17] "Summary June 5-6"       "Summary June 6"        
## [19] "Summary of April 12"    "Summary of April 13"   
## [21] "Summary of April 21"    "Summary of April 27"   
## [23] "Summary of April 3rd"   "Summary of August 1"   
## [25] "Summary of July 11"     "Summary of July 2"     
## [27] "Summary of July 22"     "Summary of July 26"    
## [29] "Summary of July 29"     "Summary of July 3"     
## [31] "Summary of June 10"     "Summary of June 11"    
## [33] "Summary of June 12"     "Summary of June 13"    
## [35] "Summary of June 15"     "Summary of June 16"    
## [37] "Summary of June 18"     "Summary of June 23"    
## [39] "Summary of June 24"     "Summary of June 3"     
## [41] "Summary of June 30"     "Summary of June 4"     
## [43] "Summary of June 6"      "Summary of March 14"   
## [45] "Summary of March 23"    "Summary of March 24"   
## [47] "SUMMARY OF MARCH 24-25" "SUMMARY OF MARCH 27"   
## [49] "SUMMARY OF MARCH 29"    "Summary of May 10"     
## [51] "Summary of May 13"      "Summary of May 14"     
## [53] "Summary of May 22"      "Summary of May 22 am"  
## [55] "Summary of May 22 pm"   "Summary of May 26 am"  
## [57] "Summary of May 26 pm"   "Summary of May 31 am"  
## [59] "Summary of May 31 pm"   "Summary of May 9-10"   
## [61] "Summary Sept. 25-26"    "Summary September 20"  
## [63] "Summary September 23"   "Summary September 3"   
## [65] "Summary September 4"    "Summary: Nov. 16"      
## [67] "Summary: Nov. 6-7"      "Summary: Oct. 20-21"   
## [69] "Summary: October 31"    "Summary: Sept. 18"     
## [71] "Temperature record"     "THUDERSTORM WINDS"     
## [73] "THUNDEERSTORM WINDS"    "THUNDERESTORM WINDS"

We noticed that there are some summary in the event type which is not actually not an event. So we need to take them out.

storm$event<-tolower(storm$EVTYPE)
realstorm<-storm[which(substr(storm$event,1,7)!="summary"),]

Now we calculate the health and economy damage for each event type:

Health<-tapply(realstorm$health,realstorm$event,FUN=sum)
Health<-data.frame(Health)
Health$event<-rownames(Health)
Economy<-tapply(realstorm$economy,realstorm$event,FUN=sum)
Economy<-data.frame(Economy)
Economy$event<-rownames(Economy)

Results

After data processing, we can finally answer the following questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

To explore the most harmful event to population health, we take the max of total harmed people:

Health[which(Health$Health==max(Health$Health)),]
##         Health   event
## tornado  96951 tornado

So the answer is tornado. To see how it is compared to other top 10 event, we make the following plot

sortHealth<-Health[order(-Health$Health),]
top10Health<-sortHealth[c(1:10),]
top10Health
##                   Health             event
## tornado            96951           tornado
## excessive heat      8428    excessive heat
## tstm wind           7461         tstm wind
## flood               7259             flood
## lightning           6046         lightning
## heat                3037              heat
## flash flood         2755       flash flood
## ice storm           2064         ice storm
## thunderstorm wind   1621 thunderstorm wind
## winter storm        1527      winter storm
plot(top10Health$Health,xaxt="n",xlab="Event",main="Top 10 Harmful Event to Population Health")
axis(1, at=1:10,labels=rownames(top10Health))

Top 10 Harmful Event to Population

To explore the most harmful event to economy, we take the max dollar amount of economy damage:

Economy[which(Economy$Economy==max(Economy$Economy)),]
##         Economy event
## flood 1.503e+11 flood

So the answer is flood. To see how it is compared to other top 10 event, we make the following plot

sortEconomy<-Economy[order(-Economy$Economy),]
top10Economy<-sortEconomy[c(1:10),]
top10Economy
##                     Economy             event
## flood             1.503e+11             flood
## hurricane/typhoon 7.191e+10 hurricane/typhoon
## tornado           5.730e+10           tornado
## storm surge       4.332e+10       storm surge
## hail              1.873e+10              hail
## flash flood       1.756e+10       flash flood
## drought           1.502e+10           drought
## hurricane         1.461e+10         hurricane
## river flood       1.015e+10       river flood
## ice storm         8.967e+09         ice storm
plot(top10Economy$Economy,xaxt="n",xlab="Event",main="Top 10 Harmful Event to Economy")
axis(1, at=1:10,labels=rownames(top10Economy))

plot of chunk unnamed-chunk-12