The analysis in the following sections attemps to determine which are the most harmful types of events with respect to damages to population heath and which type of events have the greatest economic consequences.
This section loads and processes the storm data from the NOAA Storm Database, and attempts to find out which types of events impacts most significantly human health and economy. The data is loaded into an R-data frame where the values in the relevant damage columns are aggregated, and then ordered so that it can be evident just be looking at the order of the data in the table, which events are the most significant.
Economic damage is quantified by the cost of damage to crops and property. The columns CROPDMGEXP and PROPDMGEXP contains characters like ‘k’ which means 1000, ‘m’ which means 1,000,000 and ‘b’ which means a billion. These values are replaced with actual numbers to aid in the analysis.
The data file given is in “bz” format, with a “bz2” extension. The data can be loaded into R as follows :
data<-read.csv(bzfile("repdata_data_StormData.csv.bz2"),stringsAsFactors=FALSE)
The total fatalities and injuries due to the events are obtained by adding up the values in the columns “FATALITIES” and “INJURIES” for each event type, and then ordering the list from the highest number of fatalities/injuries to the lowest.
library(plyr)
healthharm<-ddply(data,~EVTYPE,summarise,fatalities = sum(FATALITIES),injuries=sum(INJURIES))
orderedhealthharmfatal<-healthharm[order(-healthharm$fatalities),]
orderedhealthharminjury<-healthharm[order(-healthharm$injuries),]
In order to isolate the sources of economic damage, the columns CROPDMG, CROPDMGEXP, PROPDMG and PROPDMGEXP are selected and put into a new table.
propdmgdata<-data[,c(8,25,26)]
cropdmgdata<-data[,c(8,27,28)]
Crop damage is found by aggregating the values in the CROPDMG column by EVTYPE and CROPDMGEXP. The values in the “CROPDMGEXP” columns are replaced by their corresponding numbers (only k/K, m/M and b/B are used, the rest are set to 0) and the total actual damage cost is determined by multiplying the values in CROPDMG and the new CROPDMGEXP. Again, the resulting table is ordered by descending values of total damage cost.
cropdmgcost<-aggregate(CROPDMG~EVTYPE+CROPDMGEXP, cropdmgdata, FUN=sum)
cropdmgcost[cropdmgcost$CROPDMGEXP == "k" | cropdmgcost$CROPDMGEXP == "K",]$CROPDMGEXP <- "1e3"
cropdmgcost[cropdmgcost$CROPDMGEXP == "m" | cropdmgcost$CROPDMGEXP == "M",]$CROPDMGEXP <- "1e6"
cropdmgcost[cropdmgcost$CROPDMGEXP == "b" | cropdmgcost$CROPDMGEXP == "B",]$CROPDMGEXP <- "1e9"
cropdmgcost[cropdmgcost$CROPDMGEXP == "?" | cropdmgcost$CROPDMGEXP == "2" | cropdmgcost$CROPDMGEXP == "",]$CROPDMGEXP<-"0"
cropdmgcost$CROPDMGEXP<-as.numeric(cropdmgcost$CROPDMGEXP)
cropdmgcost$SubTotal<- cropdmgcost$CROPDMG * cropdmgcost$CROPDMGEXP
cropdmgtotal<-aggregate(SubTotal~EVTYPE,cropdmgcost,FUN=sum)
cropdmgtotal<-cropdmgtotal[order(-cropdmgtotal$SubTotal),]
Property damage is found by aggregating the values in the PROPDMG column, by EVTYPE and PROPDMGEXP. As was in the case for crop damage, the values in the PROPDMGEXP column are replaced by their corresponding numeric values. Only k/K. m/M and b/B are used, the rest are set to 0. The total cost of damage is found by multiplying the values in the PROPDMG column with those in the new PROPDMGEXP column. The resulting table is sorted by total value in descending order.
propdmgcost<-aggregate(PROPDMG~EVTYPE+PROPDMGEXP, propdmgdata, FUN=sum)
propdmgcost[propdmgcost$PROPDMGEXP == "k" | propdmgcost$PROPDMGEXP == "K",]$PROPDMGEXP <- "1e3"
propdmgcost[propdmgcost$PROPDMGEXP == "m" | propdmgcost$PROPDMGEXP == "M",]$PROPDMGEXP <- "1e6"
propdmgcost[propdmgcost$PROPDMGEXP == "b" | propdmgcost$PROPDMGEXP == "B",]$PROPDMGEXP <- "1e9"
propdmgcost[propdmgcost$PROPDMGEXP == "?" | propdmgcost$PROPDMGEXP == "2" | propdmgcost$PROPDMGEXP == ""| propdmgcost$PROPDMGEXP == "-" | propdmgcost$PROPDMGEXP == "1" | propdmgcost$PROPDMGEXP == "3"| propdmgcost$PROPDMGEXP == "4"| propdmgcost$PROPDMGEXP == "5"| propdmgcost$PROPDMGEXP == "6"| propdmgcost$PROPDMGEXP == "7" | propdmgcost$PROPDMGEXP == "8" | propdmgcost$PROPDMGEXP == "h"| propdmgcost$PROPDMGEXP == "H" | propdmgcost$PROPDMGEXP == "+",]$PROPDMGEXP<-"0"
propdmgcost$PROPDMGEXP<-as.numeric(propdmgcost$PROPDMGEXP)
propdmgcost$SubTotal<- propdmgcost$PROPDMG * propdmgcost$PROPDMGEXP
propdmgtotal<-aggregate(SubTotal~EVTYPE,propdmgcost,FUN=sum)
propdmgtotal<-propdmgtotal[order(-propdmgtotal$SubTotal),]
The top cause of injury is TORNADO. The top cause of death is TORNADO.
Here are the top 10 causes of death due to the tabulated events :
orderedhealthharmfatal[1:10,c(1,2)]
## EVTYPE fatalities
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
## 856 TSTM WIND 504
## 170 FLOOD 470
## 585 RIP CURRENT 368
## 359 HIGH WIND 248
## 19 AVALANCHE 224
Here are the top 10 causes of injury due to the tabulated events :
orderedhealthharminjury[1:10,c(1,3)]
## EVTYPE injuries
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
## 427 ICE STORM 1975
## 153 FLASH FLOOD 1777
## 760 THUNDERSTORM WIND 1488
## 244 HAIL 1361
The following plot shows top death and injuries versus event type.
par(mfrow=c(2,1))
barplot(orderedhealthharmfatal[1:10,]$fatalities, names.arg=orderedhealthharmfatal[1:10,]$EVTYPE,main="Deaths from Events", xlab="EventType", ylab="Deaths")
orderedhealthharminjury<-healthharm[order(-healthharm$injuries),]
barplot(orderedhealthharminjury[1:10,]$injuries, names.arg=orderedhealthharminjury[1:10,]$EVTYPE,main="Injuries from Events", xlab="EventType", ylab="Injuries")
The three most significant events with negative impact to property are FLOOD, HURRICANE/TYPHOON and TORNADO.
propdmgtotal[1:10,]
## EVTYPE SubTotal
## 170 FLOOD 144657709800
## 411 HURRICANE/TYPHOON 69305840000
## 834 TORNADO 56937160480
## 670 STORM SURGE 43323536000
## 153 FLASH FLOOD 16140811510
## 244 HAIL 15732266720
## 402 HURRICANE 11868319010
## 848 TROPICAL STORM 7703890550
## 972 WINTER STORM 6688497250
## 359 HIGH WIND 5270046260
The three most significant events with negative impact to crops are DROUGHT, FLOOD and RIVER FLOOD.
cropdmgtotal[1:10,]
## EVTYPE SubTotal
## 95 DROUGHT 13972566000
## 170 FLOOD 5661968450
## 590 RIVER FLOOD 5029459000
## 427 ICE STORM 5022113500
## 244 HAIL 3025954450
## 402 HURRICANE 2741910000
## 411 HURRICANE/TYPHOON 2607872800
## 153 FLASH FLOOD 1421317100
## 140 EXTREME COLD 1292973000
## 212 FROST/FREEZE 1094086000
The following plots show the top contributing events to damages to crops and property.
par(mfrow=c(2,1))
barplot(cropdmgtotal[1:10,]$SubTotal, names.arg=cropdmgtotal[1:10,]$EVTYPE,main="Damage to Crops", xlab="EventType", ylab="Damage Cost($)")
barplot(propdmgtotal[1:10,]$SubTotal, names.arg=propdmgtotal[1:10,]$EVTYPE,main="Damage to Property", xlab="EventType", ylab="Damage Cost($)")