Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
In this report we analyse the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database (events starting in 1950 thru to November 2011) to answer the following questions:
1. Across the United States, which types of events are most harmful with respect to population health?
2. Across the United States, which types of events have the greatest economic consequences?
With respect to population health, this project aggregates Fatalities and Injuries both separately and then combined, by Event Type (I decided both were required and would be weighted equally to identify most harmful events); these are sorted and top ones displayed.
With respect to economic consequences, this project normalises the Property (PROPDMG) and Crop (CROPDMG) Damages into Dollars (they are stored as Hundreds, Thousands, Millions and Billions) before aggregating; these are sorted and top ones displayed.
Start by downloading the data source which is in the form of a comma-separated values (CSV) file compressed via the bzip2 algorithm to reduce its size - Total size uncompressed is approx 535MB; compressed size approx 47MB. There are 902,297 rows in this data set (37 Variables)
Note the file can also be downloaded from the following link:
Set the Working Directory and Load Storm Data data
setwd("c:/R Programming/Rep_Research/Assignment2")
storm.data <- read.table("repdata-data-StormData.csv.bz2", sep = ",", header=T)
Aggregate Fatalities and Injuries separately by Event Type (and sort)
fatalities <- aggregate(FATALITIES ~ EVTYPE, data=storm.data, FUN = sum)
sorted.fatalities <- fatalities[order(-fatalities$FATALITIES), ]
injuries <- aggregate(INJURIES ~ EVTYPE, data=storm.data, FUN = sum)
sorted.injuries <- injuries[order(-injuries$INJURIES), ]
Aggregate combined Fatalities + Injuries by Event Type
combined <- aggregate(cbind(FATALITIES + INJURIES) ~ EVTYPE, data=storm.data, FUN = sum)
combined.injuries.fatalities <- subset(combined, combined$V1 > 1700)
Reformat Population columns to make more meaningful when displayed on plots
names(combined.injuries.fatalities)[names(combined.injuries.fatalities)=="V1"] <- "Total"
names(combined.injuries.fatalities)[names(combined.injuries.fatalities)=="EVTYPE"] <- "Event_Type"
Transform/convert Property (PROPDMG) Damages into dollars ($) This report only processes “H”,“K”,“M"and "B” - I wasn't able to determine the meaning of the other values.
storm.data$PROPDMG = ifelse(as.character(storm.data$PROPDMGEXP)=="H", storm.data$PROPDMG*100, storm.data$PROPDMG)
storm.data$PROPDMG = ifelse(as.character(storm.data$PROPDMGEXP)=="K", storm.data$PROPDMG*1000, storm.data$PROPDMG)
storm.data$PROPDMG = ifelse(as.character(storm.data$PROPDMGEXP)=="M", storm.data$PROPDMG*1000000, storm.data$PROPDMG)
storm.data$PROPDMG = ifelse(as.character(storm.data$PROPDMGEXP)=="B", storm.data$PROPDMG*1000000000, storm.data$PROPDMG)
Transform/convert Crop (CROPDMG) Damages into dollars ($) This report only processes “H”,“K”,“M"and "B” - I wasn't able to determine the meaning of the other values.
storm.data$CROPDMG = ifelse(as.character(storm.data$CROPDMGEXP)=="H", storm.data$CROPDMG*100, storm.data$CROPDMG)
storm.data$CROPDMG = ifelse(as.character(storm.data$CROPDMGEXP)=="K", storm.data$CROPDMG*1000, storm.data$CROPDMG)
storm.data$CROPDMG = ifelse(as.character(storm.data$CROPDMGEXP)=="M", storm.data$CROPDMG*1000000, storm.data$CROPDMG)
storm.data$CROPDMG = ifelse(as.character(storm.data$CROPDMGEXP)=="B", storm.data$CROPDMG*1000000000, storm.data$CROPDMG)
Aggregate and sort the PROPDMG and CROPDMG columns separately by EVTYPE
prop.cost.by.evtype <- aggregate(PROPDMG ~ EVTYPE, data = storm.data, FUN=sum)
sorted.prop.cost <- prop.cost.by.evtype[order(-prop.cost.by.evtype$PROPDMG), ]
crop.cost.by.evtype <- aggregate(CROPDMG ~ EVTYPE, data = storm.data, FUN=sum)
sorted.crop.cost <- crop.cost.by.evtype[order(-crop.cost.by.evtype$CROPDMG), ]
cost.by.evtype <- aggregate(cbind(PROPDMG + CROPDMG) ~ EVTYPE, data = storm.data, FUN=sum)
sorted.cost.by.evtype <- cost.by.evtype[order(-cost.by.evtype$V1), ]
Reformat Economic Factor columns to make more meaningful when displayed on plots
sorted.prop.cost$PROPDMG<-sorted.prop.cost$PROPDMG/1000000 ## convert to Millions Dollars
sorted.crop.cost$CROPDMG<-sorted.crop.cost$CROPDMG/1000000 ## convert to Millions Dollars
names(sorted.prop.cost)[names(sorted.prop.cost)=="PROPDMG"] <- "Property_Damage(Millions_$)"
names(sorted.crop.cost)[names(sorted.crop.cost)=="CROPDMG"] <- "Crop_Damage(Millions_$)"
names(sorted.prop.cost)[names(sorted.prop.cost)=="EVTYPE"] <- "Event_Type"
names(sorted.crop.cost)[names(sorted.crop.cost)=="EVTYPE"] <- "Event_Type"
names(sorted.cost.by.evtype)[names(sorted.cost.by.evtype)=="V1"] <- "Total_Cost_Billions"
names(sorted.cost.by.evtype)[names(sorted.cost.by.evtype)=="EVTYPE"] <- "Event_Type"
sorted.cost.by.evtype$Total_Cost_Billions <- sorted.cost.by.evtype$Total_Cost_Billions / 1000000000 ## convert to Billions Dollars
The events which are most harmful (resulting in fatality) to the population health is Tornado (in 1st place), followed by Excessive Heat.
The most harmful events causing fatality are detailed below together with the corresponding number of fatalities (ordered):
print(head(sorted.fatalities))
## EVTYPE FATALITIES
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
## 856 TSTM WIND 504
The most harmful events causing injury (ordered) are detailed below (once again “Tornado” is the highest event causing injury):
print(head(sorted.injuries))
## EVTYPE INJURIES
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
library("ggplot2")
ggplot(data=combined.injuries.fatalities, aes(x=Event_Type, y=Total, fill=Event_Type)) +
geom_bar(colour="black", stat="identity") +
guides(fill=FALSE) +
ggtitle("Combined Top (Injuries + Fatalities) by Event Type \n ") +
theme(axis.text.x = element_text(angle=90, vjust=0.5, size=10))
From the plot above we can see that the Tornado event was the most harmful to Population Health .
The analysis has identified the Flood event as having the greatest economic consequence with a value of approx. $144.66B.
The most harmful events causing greatest Property Damage (ordered) together with costs (in $Millions):
print(head(sorted.prop.cost))
## Event_Type Property_Damage(Millions_$)
## 170 FLOOD 144657.71
## 411 HURRICANE/TYPHOON 69305.84
## 834 TORNADO 56925.66
## 670 STORM SURGE 43323.54
## 153 FLASH FLOOD 16140.81
## 244 HAIL 15727.37
Drought was identified as the most harmful event causing greatest Crop Damage.
The other highest events causing crop damage are headlined below (ordered):
print(head(sorted.crop.cost))
## Event_Type Crop_Damage(Millions_$)
## 95 DROUGHT 13972.566
## 170 FLOOD 5661.968
## 590 RIVER FLOOD 5029.459
## 427 ICE STORM 5022.114
## 244 HAIL 3025.538
## 402 HURRICANE 2741.910
library("ggplot2")
# Select only the events with a high cost so the plot is more readable
high.cost.events <- head(sorted.cost.by.evtype)
ggplot(data=high.cost.events, aes(x=Event_Type, y=Total_Cost_Billions, fill=Event_Type)) +
geom_bar(colour="black", stat="identity") +
guides(fill=FALSE) +
ggtitle("Combined Property & Crop Damage by Event Type \n ") +
theme(axis.text.x = element_text(angle=90, vjust=0.5, size=10))
From the plot above we can see that the Flood event has the largest Economic Consequence.