Peer Assessment 2 - Storm Data Analysis

This report answers two questions.

  1. Which types of events in the U.S. are most harmful to population health?

  2. Which types of events in the U.S. have the greatest economic consequences?

Synopsis

We analysed the data from the U.S. National Oceanic and Atmospheric Administration storm database (https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2) to answer the above two questions.

The analysis shows that the natural event with the greatest impact on population health is TORNADOES. Next is EXCESSIVE HEAT (for instance, the Chicago heatwave of 1995 had 583 fatalities).

For economic consequences, the events with the highest impact are TORNADOES followed by FLOODS, although crops were damaged mostly by HAIL.

The data indicates that Hurricane Katrina and the California floods of 2000 occasioned the most economic damage in recent times (combined cost of over 180 billions of dollars between the two).

We notice that deaths attributed to Hurricane Katrina seem to be under-reported in this data set.

Data Processing

Having loaded the necessary packages, we read in the zipped file directly. We changed variable names from capitals to lowercase to aid legibility, and extracted the columns of data that we would concentrate on. When looking at economic consequences, we followed the course forum advice of translating “exp” values from letter codes to numeric values.

library(knitr)
library(plyr)
library(dplyr)
library(ggplot2)


# Read in the data file

data<-read.csv("repdata%2Fdata%2FStormData.csv",sep=",", header=TRUE)

# Check data size 

dim(data)
## [1] 902297     37
# Make the variable names more readable 

names(data)<-tolower(names(data))

# Extract the columns we want to work with

sdata<-data[,c("evtype","bgn_date","fatalities","injuries","propdmg","propdmgexp","cropdmg","cropdmgexp")]

# Assemble population damage (injuries + fatalities) according to event type

population<-with(sdata, aggregate(injuries+fatalities~evtype, data=sdata, FUN=sum))

# Change "injuries+fatalities" to "Combined_casualties"
names(population)[2]<-"Combined_casualties"

#Order population on casualties

ordered_population<-population[order(population$Combined_casualties, decreasing=TRUE),]

#Extract top 6 event types

top_population<-head(ordered_population,6)
top_population
##             evtype Combined_casualties
## 834        TORNADO               96979
## 130 EXCESSIVE HEAT                8428
## 856      TSTM WIND                7461
## 170          FLOOD                7259
## 464      LIGHTNING                6046
## 275           HEAT                3037
# Now plot this 

 ggplot(top_population, aes(evtype, Combined_casualties,fill=evtype))+geom_bar(stat="Identity")+ylab("Total casualties")+xlab("Event Type")+ggtitle("Which event types are most harmful to population health")

# Now find the event with the heighest number of fatalities.
max<-which.max(sdata$fatalities)
sdata[max, ]
##        evtype          bgn_date fatalities injuries propdmg propdmgexp
## 198704   HEAT 7/12/1995 0:00:00        583        0       0           
##        cropdmg cropdmgexp
## 198704       0
# Also we will find the overall numbers of injuries and fatalities

sum(sdata$injuries)
## [1] 140528
sum(sdata$fatalities)
## [1] 15145
# To gain a clearer picture, we look at injuries separately

injured<-with(sdata, aggregate(injuries~evtype,data=sdata, FUN=sum))
ordered_injured<-injured[order(injured$injuries,decreasing=TRUE),]
head(ordered_injured,6)
##             evtype injuries
## 834        TORNADO    91346
## 856      TSTM WIND     6957
## 170          FLOOD     6789
## 130 EXCESSIVE HEAT     6525
## 464      LIGHTNING     5230
## 275           HEAT     2100
# and we look at fatalities separately

fatal<-with(sdata, aggregate(fatalities~evtype, data=sdata, FUN=sum))
ordered_fatal<-fatal[order(fatal$fatalities, decreasing=TRUE),]
head(ordered_fatal,6)
##             evtype fatalities
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
## 856      TSTM WIND        504

Results for Question 1. Which types of events in the U.S. are most harmful to population health?

We see that the event type most injurious to population health is “tornado”, followed by excessive heat.

Injuries Fatalities

Tornado

91346 (65%)

5633 (37%)

Excessive Heat

6525 (5%)

1903 (13%)
# We now translate the "propdmgexp" letter codes to the appropriate numerical value

prop_damage_translate<-mapvalues(sdata$propdmgexp,c("H","h","K","M","m","B",
"+","-","?","0","1","2","3","4","5","6","7","8",""), c(1e2,1e2,1e3,1e6,1e6,1e9,1,0,0,10,10,10,10,10,10,10,10,10,0))

# and we translate the "cropdmgexp" letter codes to the appropriate numerical value

cropdmg_translate<-mapvalues(sdata$cropdmgexp, c("K","k","M","m","B","?","0","2",""),
c(1e3,1e3,1e6,1e6,1e9,0,10,10,0))

# We introduce a new variable to store the correct value of the property damage

sdata$proptotaldmg<-as.numeric(prop_damage_translate)*sdata$propdmg

# and we introduce a corresponding new variable to hold the correct value of the crop damage

sdata$croptotaldmg<-as.numeric(cropdmg_translate)*sdata$cropdmg

# We now look separately at crop and property damage, before looking at the total picture

# First, we look at the 6 top causes of crop damage
crop<-with(sdata, aggregate(croptotaldmg~evtype, data=sdata, FUN=sum))
ordered_crop<-crop[order(crop$croptotaldmg, decreasing=TRUE),]
top_cropdmg<-head(ordered_crop,6)
top_cropdmg
##                evtype croptotaldmg
## 244              HAIL    2320785.0
## 153       FLASH FLOOD     718045.2
## 170             FLOOD     677650.9
## 856         TSTM WIND     437255.7
## 834           TORNADO     400069.5
## 760 THUNDERSTORM WIND     267514.2
# We shall plot this

ggplot(top_cropdmg, aes(evtype, croptotaldmg,fill=evtype))+geom_bar(stat="Identity")+ylab("Crop Damage")+xlab("Event Type")+ggtitle("Which event caused the most crop damage")

# Secondly, we look at the 6 top causes of property damage
prop<-with(sdata, aggregate(proptotaldmg~evtype, data=sdata, FUN=sum))
ordered_prop<-prop[order(prop$proptotaldmg, decreasing=TRUE),]
head(ordered_prop,6)
##                evtype proptotaldmg
## 834           TORNADO     19321050
## 153       FLASH FLOOD      8532395
## 856         TSTM WIND      8018781
## 170             FLOOD      5420630
## 760 THUNDERSTORM WIND      5263238
## 244              HAIL      4144327
# Now we look at the total damage picture

# We introduce a new variable containing the total economic damage

sdata$totaldmg<-sdata$proptotaldmg+sdata$croptotaldmg

# We calculate the total economic damage per event type
totaldamage<-aggregate(totaldmg~evtype, data=sdata, FUN=sum)

# and we sort this

ordered_totaldamage<-totaldamage[order(totaldamage$totaldmg, decreasing=TRUE),]

# and select the top six event types

top_totaldmg<-head(ordered_totaldamage,6)
top_totaldmg
##                evtype totaldmg
## 834           TORNADO 19721119
## 153       FLASH FLOOD  9250440
## 856         TSTM WIND  8456036
## 244              HAIL  6465112
## 170             FLOOD  6098281
## 760 THUNDERSTORM WIND  5530752
# We plot this to see which event type contributes most to economic damage

ggplot(top_totaldmg, aes(evtype, totaldmg, fill=evtype))+geom_bar(stat="Identity")+ylab("Total Damage")+xlab("Event Type")+ggtitle("Which event caused the greatest economic damage")

Results for Question 2. Which types of events in the U.S. have the greatest economic consequences

We see that the event type that causes the greatest economic damage is “tornado”, followed by flood events (Flash Flood + Flood). Hail causes the most crop damage.

We finish the analysis by looking at individual economically high cost events

# Find events with economic cost in billions

billion_list<-which(sdata$propdmgexp=="B")
data_billions<-sdata[billion_list,]
max<-which.max(data_billions$propdmg)
data_billions[max,]
##        evtype         bgn_date fatalities injuries propdmg propdmgexp
## 605953  FLOOD 1/1/2006 0:00:00          0        0     115          B
##        cropdmg cropdmgexp proptotaldmg croptotaldmg totaldmg
## 605953    32.5          M          460        162.5    622.5
## Sort the events with a billion dollar cost
data_billions_ordered<-arrange(data_billions,as.numeric(data_billions$propdmg))

# Inspect the most expensive events
tail(data_billions_ordered)
##               evtype           bgn_date fatalities injuries propdmg
## 35 HURRICANE/TYPHOON  8/28/2005 0:00:00          0        0    7.35
## 36 HURRICANE/TYPHOON 10/24/2005 0:00:00          5        0   10.00
## 37       STORM SURGE  8/29/2005 0:00:00          0        0   11.26
## 38 HURRICANE/TYPHOON  8/28/2005 0:00:00          0        0   16.93
## 39       STORM SURGE  8/29/2005 0:00:00          0        0   31.30
## 40             FLOOD   1/1/2006 0:00:00          0        0  115.00
##    propdmgexp cropdmg cropdmgexp proptotaldmg croptotaldmg totaldmg
## 35          B     0.0                   29.40          0.0    29.40
## 36          B     0.0                   40.00          0.0    40.00
## 37          B     0.0                   45.04          0.0    45.04
## 38          B     0.0                   67.72          0.0    67.72
## 39          B     0.0                  125.20          0.0   125.20
## 40          B    32.5          M       460.00        162.5   622.50

Conclusion

We see that the California floods of 2006 and Hurrican Katrina in August 2005 were the most expensive recent events.