This report answers two questions.
Which types of events in the U.S. are most harmful to population health?
Which types of events in the U.S. have the greatest economic consequences?
We analysed the data from the U.S. National Oceanic and Atmospheric Administration storm database (https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2) to answer the above two questions.
The analysis shows that the natural event with the greatest impact on population health is TORNADOES. Next is EXCESSIVE HEAT (for instance, the Chicago heatwave of 1995 had 583 fatalities).
For economic consequences, the events with the highest impact are TORNADOES followed by FLOODS, although crops were damaged mostly by HAIL.
The data indicates that Hurricane Katrina and the California floods of 2000 occasioned the most economic damage in recent times (combined cost of over 180 billions of dollars between the two).
We notice that deaths attributed to Hurricane Katrina seem to be under-reported in this data set.
Having loaded the necessary packages, we read in the zipped file directly. We changed variable names from capitals to lowercase to aid legibility, and extracted the columns of data that we would concentrate on. When looking at economic consequences, we followed the course forum advice of translating “exp” values from letter codes to numeric values.
library(knitr)
library(plyr)
library(dplyr)
library(ggplot2)
# Read in the data file
data<-read.csv("repdata%2Fdata%2FStormData.csv",sep=",", header=TRUE)
# Check data size
dim(data)
## [1] 902297 37
# Make the variable names more readable
names(data)<-tolower(names(data))
# Extract the columns we want to work with
sdata<-data[,c("evtype","bgn_date","fatalities","injuries","propdmg","propdmgexp","cropdmg","cropdmgexp")]
# Assemble population damage (injuries + fatalities) according to event type
population<-with(sdata, aggregate(injuries+fatalities~evtype, data=sdata, FUN=sum))
# Change "injuries+fatalities" to "Combined_casualties"
names(population)[2]<-"Combined_casualties"
#Order population on casualties
ordered_population<-population[order(population$Combined_casualties, decreasing=TRUE),]
#Extract top 6 event types
top_population<-head(ordered_population,6)
top_population
## evtype Combined_casualties
## 834 TORNADO 96979
## 130 EXCESSIVE HEAT 8428
## 856 TSTM WIND 7461
## 170 FLOOD 7259
## 464 LIGHTNING 6046
## 275 HEAT 3037
# Now plot this
ggplot(top_population, aes(evtype, Combined_casualties,fill=evtype))+geom_bar(stat="Identity")+ylab("Total casualties")+xlab("Event Type")+ggtitle("Which event types are most harmful to population health")
# Now find the event with the heighest number of fatalities.
max<-which.max(sdata$fatalities)
sdata[max, ]
## evtype bgn_date fatalities injuries propdmg propdmgexp
## 198704 HEAT 7/12/1995 0:00:00 583 0 0
## cropdmg cropdmgexp
## 198704 0
# Also we will find the overall numbers of injuries and fatalities
sum(sdata$injuries)
## [1] 140528
sum(sdata$fatalities)
## [1] 15145
# To gain a clearer picture, we look at injuries separately
injured<-with(sdata, aggregate(injuries~evtype,data=sdata, FUN=sum))
ordered_injured<-injured[order(injured$injuries,decreasing=TRUE),]
head(ordered_injured,6)
## evtype injuries
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
# and we look at fatalities separately
fatal<-with(sdata, aggregate(fatalities~evtype, data=sdata, FUN=sum))
ordered_fatal<-fatal[order(fatal$fatalities, decreasing=TRUE),]
head(ordered_fatal,6)
## evtype fatalities
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
## 856 TSTM WIND 504
We see that the event type most injurious to population health is “tornado”, followed by excessive heat.
| Injuries | Fatalities | |
|---|---|---|
Tornado |
91346 (65%) |
|
Excessive Heat |
6525 (5%) |
|
# We now translate the "propdmgexp" letter codes to the appropriate numerical value
prop_damage_translate<-mapvalues(sdata$propdmgexp,c("H","h","K","M","m","B",
"+","-","?","0","1","2","3","4","5","6","7","8",""), c(1e2,1e2,1e3,1e6,1e6,1e9,1,0,0,10,10,10,10,10,10,10,10,10,0))
# and we translate the "cropdmgexp" letter codes to the appropriate numerical value
cropdmg_translate<-mapvalues(sdata$cropdmgexp, c("K","k","M","m","B","?","0","2",""),
c(1e3,1e3,1e6,1e6,1e9,0,10,10,0))
# We introduce a new variable to store the correct value of the property damage
sdata$proptotaldmg<-as.numeric(prop_damage_translate)*sdata$propdmg
# and we introduce a corresponding new variable to hold the correct value of the crop damage
sdata$croptotaldmg<-as.numeric(cropdmg_translate)*sdata$cropdmg
# We now look separately at crop and property damage, before looking at the total picture
# First, we look at the 6 top causes of crop damage
crop<-with(sdata, aggregate(croptotaldmg~evtype, data=sdata, FUN=sum))
ordered_crop<-crop[order(crop$croptotaldmg, decreasing=TRUE),]
top_cropdmg<-head(ordered_crop,6)
top_cropdmg
## evtype croptotaldmg
## 244 HAIL 2320785.0
## 153 FLASH FLOOD 718045.2
## 170 FLOOD 677650.9
## 856 TSTM WIND 437255.7
## 834 TORNADO 400069.5
## 760 THUNDERSTORM WIND 267514.2
# We shall plot this
ggplot(top_cropdmg, aes(evtype, croptotaldmg,fill=evtype))+geom_bar(stat="Identity")+ylab("Crop Damage")+xlab("Event Type")+ggtitle("Which event caused the most crop damage")
# Secondly, we look at the 6 top causes of property damage
prop<-with(sdata, aggregate(proptotaldmg~evtype, data=sdata, FUN=sum))
ordered_prop<-prop[order(prop$proptotaldmg, decreasing=TRUE),]
head(ordered_prop,6)
## evtype proptotaldmg
## 834 TORNADO 19321050
## 153 FLASH FLOOD 8532395
## 856 TSTM WIND 8018781
## 170 FLOOD 5420630
## 760 THUNDERSTORM WIND 5263238
## 244 HAIL 4144327
# Now we look at the total damage picture
# We introduce a new variable containing the total economic damage
sdata$totaldmg<-sdata$proptotaldmg+sdata$croptotaldmg
# We calculate the total economic damage per event type
totaldamage<-aggregate(totaldmg~evtype, data=sdata, FUN=sum)
# and we sort this
ordered_totaldamage<-totaldamage[order(totaldamage$totaldmg, decreasing=TRUE),]
# and select the top six event types
top_totaldmg<-head(ordered_totaldamage,6)
top_totaldmg
## evtype totaldmg
## 834 TORNADO 19721119
## 153 FLASH FLOOD 9250440
## 856 TSTM WIND 8456036
## 244 HAIL 6465112
## 170 FLOOD 6098281
## 760 THUNDERSTORM WIND 5530752
# We plot this to see which event type contributes most to economic damage
ggplot(top_totaldmg, aes(evtype, totaldmg, fill=evtype))+geom_bar(stat="Identity")+ylab("Total Damage")+xlab("Event Type")+ggtitle("Which event caused the greatest economic damage")
We see that the event type that causes the greatest economic damage is “tornado”, followed by flood events (Flash Flood + Flood). Hail causes the most crop damage.
We finish the analysis by looking at individual economically high cost events
# Find events with economic cost in billions
billion_list<-which(sdata$propdmgexp=="B")
data_billions<-sdata[billion_list,]
max<-which.max(data_billions$propdmg)
data_billions[max,]
## evtype bgn_date fatalities injuries propdmg propdmgexp
## 605953 FLOOD 1/1/2006 0:00:00 0 0 115 B
## cropdmg cropdmgexp proptotaldmg croptotaldmg totaldmg
## 605953 32.5 M 460 162.5 622.5
## Sort the events with a billion dollar cost
data_billions_ordered<-arrange(data_billions,as.numeric(data_billions$propdmg))
# Inspect the most expensive events
tail(data_billions_ordered)
## evtype bgn_date fatalities injuries propdmg
## 35 HURRICANE/TYPHOON 8/28/2005 0:00:00 0 0 7.35
## 36 HURRICANE/TYPHOON 10/24/2005 0:00:00 5 0 10.00
## 37 STORM SURGE 8/29/2005 0:00:00 0 0 11.26
## 38 HURRICANE/TYPHOON 8/28/2005 0:00:00 0 0 16.93
## 39 STORM SURGE 8/29/2005 0:00:00 0 0 31.30
## 40 FLOOD 1/1/2006 0:00:00 0 0 115.00
## propdmgexp cropdmg cropdmgexp proptotaldmg croptotaldmg totaldmg
## 35 B 0.0 29.40 0.0 29.40
## 36 B 0.0 40.00 0.0 40.00
## 37 B 0.0 45.04 0.0 45.04
## 38 B 0.0 67.72 0.0 67.72
## 39 B 0.0 125.20 0.0 125.20
## 40 B 32.5 M 460.00 162.5 622.50
We see that the California floods of 2006 and Hurrican Katrina in August 2005 were the most expensive recent events.