Synopsis

The analysis in the following sections attemps to determine which are the most harmful types of events with respect to damages to population heath and which type of events have the greatest economic consequences.

Data Processing

This section loads and processes the storm data from the NOAA Storm Database, and attempts to find out which types of events impacts most significantly human health and economy. The data is loaded into an R-data frame where the values in the relevant damage columns are aggregated, and then ordered so that it can be evident just be looking at the order of the data in the table, which events are the most significant.

Economic damage is quantified by the cost of damage to crops and property. The columns CROPDMGEXP and PROPDMGEXP contains characters like ‘k’ which means 1000, ‘m’ which means 1,000,000 and ‘b’ which means a billion. These values are replaced with actual numbers to aid in the analysis.

The data file given is in “bz” format, with a “bz2” extension. The data can be loaded into R as follows :

data<-read.csv(bzfile("repdata_data_StormData.csv.bz2"),stringsAsFactors=FALSE)

Human Health Impact

The total fatalities and injuries due to the events are obtained by adding up the values in the columns “FATALITIES” and “INJURIES” for each event type, and then ordering the list from the highest number of fatalities/injuries to the lowest.

library(plyr)
healthharm<-ddply(data,~EVTYPE,summarise,fatalities = sum(FATALITIES),injuries=sum(INJURIES))
orderedhealthharmfatal<-healthharm[order(-healthharm$fatalities),]
orderedhealthharminjury<-healthharm[order(-healthharm$injuries),]

Economic Damage

In order to isolate the sources of economic damage, the columns CROPDMG, CROPDMGEXP, PROPDMG and PROPDMGEXP are selected and put into a new table.

propdmgdata<-data[,c(8,25,26)]
cropdmgdata<-data[,c(8,27,28)]

Crop Damage

Crop damage is found by aggregating the values in the CROPDMG column by EVTYPE and CROPDMGEXP. The values in the “CROPDMGEXP” columns are replaced by their corresponding numbers (only k/K, m/M and b/B are used, the rest are set to 0) and the total actual damage cost is determined by multiplying the values in CROPDMG and the new CROPDMGEXP. Again, the resulting table is ordered by descending values of total damage cost.

cropdmgcost<-aggregate(CROPDMG~EVTYPE+CROPDMGEXP, cropdmgdata, FUN=sum)
cropdmgcost[cropdmgcost$CROPDMGEXP == "k" | cropdmgcost$CROPDMGEXP == "K",]$CROPDMGEXP <- "1e3"
cropdmgcost[cropdmgcost$CROPDMGEXP == "m" | cropdmgcost$CROPDMGEXP == "M",]$CROPDMGEXP <- "1e6"
cropdmgcost[cropdmgcost$CROPDMGEXP == "b" | cropdmgcost$CROPDMGEXP == "B",]$CROPDMGEXP <- "1e9"
cropdmgcost[cropdmgcost$CROPDMGEXP == "?" | cropdmgcost$CROPDMGEXP == "2" | cropdmgcost$CROPDMGEXP == "",]$CROPDMGEXP<-"0"
cropdmgcost$CROPDMGEXP<-as.numeric(cropdmgcost$CROPDMGEXP)
cropdmgcost$SubTotal<- cropdmgcost$CROPDMG * cropdmgcost$CROPDMGEXP
cropdmgtotal<-aggregate(SubTotal~EVTYPE,cropdmgcost,FUN=sum)
cropdmgtotal<-cropdmgtotal[order(-cropdmgtotal$SubTotal),]

Property Damage

Property damage is found by aggregating the values in the PROPDMG column, by EVTYPE and PROPDMGEXP. As was in the case for crop damage, the values in the PROPDMGEXP column are replaced by their corresponding numeric values. Only k/K. m/M and b/B are used, the rest are set to 0. The total cost of damage is found by multiplying the values in the PROPDMG column with those in the new PROPDMGEXP column. The resulting table is sorted by total value in descending order.

propdmgcost<-aggregate(PROPDMG~EVTYPE+PROPDMGEXP, propdmgdata, FUN=sum)
propdmgcost[propdmgcost$PROPDMGEXP == "k" | propdmgcost$PROPDMGEXP == "K",]$PROPDMGEXP <- "1e3"
propdmgcost[propdmgcost$PROPDMGEXP == "m" | propdmgcost$PROPDMGEXP == "M",]$PROPDMGEXP <- "1e6"
propdmgcost[propdmgcost$PROPDMGEXP == "b" | propdmgcost$PROPDMGEXP == "B",]$PROPDMGEXP <- "1e9"
propdmgcost[propdmgcost$PROPDMGEXP == "?" | propdmgcost$PROPDMGEXP == "2" | propdmgcost$PROPDMGEXP == ""| propdmgcost$PROPDMGEXP == "-" | propdmgcost$PROPDMGEXP == "1" | propdmgcost$PROPDMGEXP == "3"| propdmgcost$PROPDMGEXP == "4"| propdmgcost$PROPDMGEXP == "5"| propdmgcost$PROPDMGEXP == "6"| propdmgcost$PROPDMGEXP == "7" | propdmgcost$PROPDMGEXP == "8" | propdmgcost$PROPDMGEXP == "h"| propdmgcost$PROPDMGEXP == "H" | propdmgcost$PROPDMGEXP == "+",]$PROPDMGEXP<-"0"
propdmgcost$PROPDMGEXP<-as.numeric(propdmgcost$PROPDMGEXP)
propdmgcost$SubTotal<- propdmgcost$PROPDMG * propdmgcost$PROPDMGEXP
propdmgtotal<-aggregate(SubTotal~EVTYPE,propdmgcost,FUN=sum)
propdmgtotal<-propdmgtotal[order(-propdmgtotal$SubTotal),]

Results

The top cause of injury is TORNADO. The top cause of death is TORNADO.

Here are the top 10 causes of death due to the tabulated events :

orderedhealthharmfatal[1:10,c(1,2)]
##             EVTYPE fatalities
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
## 856      TSTM WIND        504
## 170          FLOOD        470
## 585    RIP CURRENT        368
## 359      HIGH WIND        248
## 19       AVALANCHE        224

Here are the top 10 causes of injury due to the tabulated events :

orderedhealthharminjury[1:10,c(1,3)]
##                EVTYPE injuries
## 834           TORNADO    91346
## 856         TSTM WIND     6957
## 170             FLOOD     6789
## 130    EXCESSIVE HEAT     6525
## 464         LIGHTNING     5230
## 275              HEAT     2100
## 427         ICE STORM     1975
## 153       FLASH FLOOD     1777
## 760 THUNDERSTORM WIND     1488
## 244              HAIL     1361

The following plot shows top death and injuries versus event type.

par(mfrow=c(2,1))
barplot(orderedhealthharmfatal[1:10,]$fatalities, names.arg=orderedhealthharmfatal[1:10,]$EVTYPE,main="Deaths from Events", xlab="EventType", ylab="Deaths")
orderedhealthharminjury<-healthharm[order(-healthharm$injuries),]
barplot(orderedhealthharminjury[1:10,]$injuries, names.arg=orderedhealthharminjury[1:10,]$EVTYPE,main="Injuries from Events", xlab="EventType", ylab="Injuries")

The three most significant events with negative impact to property are FLOOD, HURRICANE/TYPHOON and TORNADO.

propdmgtotal[1:10,]
##                EVTYPE     SubTotal
## 170             FLOOD 144657709800
## 411 HURRICANE/TYPHOON  69305840000
## 834           TORNADO  56937160480
## 670       STORM SURGE  43323536000
## 153       FLASH FLOOD  16140811510
## 244              HAIL  15732266720
## 402         HURRICANE  11868319010
## 848    TROPICAL STORM   7703890550
## 972      WINTER STORM   6688497250
## 359         HIGH WIND   5270046260

The three most significant events with negative impact to crops are DROUGHT, FLOOD and RIVER FLOOD.

cropdmgtotal[1:10,]
##                EVTYPE    SubTotal
## 95            DROUGHT 13972566000
## 170             FLOOD  5661968450
## 590       RIVER FLOOD  5029459000
## 427         ICE STORM  5022113500
## 244              HAIL  3025954450
## 402         HURRICANE  2741910000
## 411 HURRICANE/TYPHOON  2607872800
## 153       FLASH FLOOD  1421317100
## 140      EXTREME COLD  1292973000
## 212      FROST/FREEZE  1094086000

The following plots show the top contributing events to damages to crops and property.

par(mfrow=c(2,1))
barplot(cropdmgtotal[1:10,]$SubTotal, names.arg=cropdmgtotal[1:10,]$EVTYPE,main="Damage to Crops", xlab="EventType", ylab="Damage Cost($)")
barplot(propdmgtotal[1:10,]$SubTotal, names.arg=propdmgtotal[1:10,]$EVTYPE,main="Damage to Property", xlab="EventType", ylab="Damage Cost($)")