We analyzed the harm caused by weather events recorded in the NOAA storm database. For health effects, we examined total fatalities and injuries per event, with a particular focus on events that killed more than 20 people in a single incident. For economic damage, we focused on event types that caused at least $2.5 billion in damage, across all the incidents in the database.
Getting data
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "./stormdata.csv.bz2", mode="wb")
data <- read.csv("./stormdata.csv.bz2")
length(data$FATALITIES[data$FATALITIES > 0])
## [1] 6974
This tells us how many incidents killed at least one person - 6974 did
length(data$FATALITIES[data$INJURIES > 0])
## [1] 17604
This tells us there were 17604 incidents that injured at least one person
length(data$FATALITIES[data$FATALITIES > 0 | data$INJURIES > 0])
## [1] 21929
21929 incidents killed and/or injured someone.
Create a subset to examine health:
healthdata <- data[(data$FATALITIES > 0 | data$INJURIES > 0),]
length(unique(healthdata$EVTYPE[healthdata$FATALITIES > 0 | healthdata$INJURIES > 0]))
## [1] 220
220 different types of event either killed or injured at least one person.
Determine what types of events caused the most harm to people total.
casualtiesbyeventtype <- aggregate(FATALITIES+INJURIES ~ EVTYPE, data, sum)
Determine what types of events caused the most harm to people per incident.
meancasualtiesbyeventtype <- aggregate(FATALITIES+INJURIES ~ EVTYPE, data, mean)
Graph the results, focused on deaths but with consideration for injuries. We use the log rather than the actual number, because otherwise outliers would make the graph difficult to read.
library(ggplot2)
healthplot <- qplot(log(healthdata$FATALITIES[healthdata$FATALITIES > 20]), healthdata$EVTYPE[healthdata$FATALITIES > 20],
healthdata, color=log(healthdata$INJURIES[healthdata$FATALITIES > 20]))
First, we calculate the actual damage values, rather than the abbreviated ones. Adding a column to the dataset for more fine-grained analysis was intractable for the computer used in the analysis. Instead we constructed a vector of the total damage each type of event caused across all the years in the dataset. We make separate vectors for property damage and crop damage.
propdmgvector <- c()
cropdmgvector <- c()
for(i in as.character(unique(data$EVTYPE))) {
dmgvars <- subset(data, EVTYPE == i, select = c(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP))
actualpropdmg <- 0
actualcropdmg <- 0
for(j in 1:length(dmgvars$EVTYPE)) {
if(dmgvars[j, "PROPDMGEXP"] == "H") {
actualpropdmg <- actualpropdmg + (dmgvars[j, "PROPDMG"] * 100)
}
else if(dmgvars[j, "PROPDMGEXP"] == "K") {
actualpropdmg <- actualpropdmg + (dmgvars [j, "PROPDMG"] * 1000)
}
else if(dmgvars[j, "PROPDMGEXP"] == "M") {
actualpropdmg <- actualpropdmg + (dmgvars[j, "PROPDMG"] * 1000000)
}
else if(dmgvars[j, "PROPDMGEXP"] == "B") {
actualpropdmg <- actualpropdmg + (dmgvars[j, "PROPDMG"] * 100000000)
}
if(dmgvars[j, "CROPDMGEXP"] == "H") {
actualcropdmg <- actualcropdmg + (dmgvars[j, "CROPDMG"] * 100)
}
else if(dmgvars[j, "CROPDMGEXP"] == "K") {
actualcropdmg <- actualcropdmg + (dmgvars[j, "CROPDMG"] * 1000)
}
else if(dmgvars[j, "CROPDMGEXP"] == "M") {
actualcropdmg <- actualcropdmg + (dmgvars[j, "CROPDMG"] * 1000000)
}
else if(dmgvars[j, "CROPDMGEXP"] == "B") {
actualcropdmg <- actualcropdmg + (dmgvars[j, "CROPDMG"] * 100000000)
}
}
propdmgvector <- c(propdmgvector, actualpropdmg)
cropdmgvector <- c(cropdmgvector, actualcropdmg)
}
Now, match the damage totals to the types of event that caused the damage.
propdmgframe <- data.frame(propdmgvector, as.character(unique(data$EVTYPE)))
cropdmgframe <- data.frame(cropdmgvector, as.character(unique(data$EVTYPE)))
We’ll make plots examining the damage for event types that did more than $2.5 billion in damage over the period studied.
propdmgplot <- qplot(propdmgframe$propdmgvector[propdmgframe$propdmgvector > 2.5e+9],
propdmgframe$as.character.unique.data.EVTYPE[propdmgframe$propdmgvector > 2.5e+9])
cropdmgplot <- qplot(cropdmgframe$cropdmgvector[cropdmgframe$cropdmgvector > 2.5e+9],
cropdmgframe$as.character.unique.data.EVTYPE[cropdmgframe$cropdmgvector > 2.5e+9])
casualtiesbyeventtype[which.max(casualtiesbyeventtype[,2]),1]
## [1] TORNADO
## 985 Levels: HIGH SURF ADVISORY COASTAL FLOOD ... WND
We find that the most casualties have been caused by tornadoes.
meancasualtiesbyeventtype[which.max(meancasualtiesbyeventtype[,2]),1]
## [1] Heat Wave
## 985 Levels: HIGH SURF ADVISORY COASTAL FLOOD ... WND
The disaster that causes the most casualties per event, however, is heat waves.
We see that tornadoes and different types of heat event are some of the greatest killers. Tornadoes are particularly prone to creating large numbers of injuries.
healthplot + scale_color_gradient(name="Log injuries") + labs(x="Log fatalities", y="Event type",
title="Casualties in events with over 20 deaths")
propdmgplot + labs(x="Property damage (billions of dollars)", y="Event type",
title="Property damage in events with over $2.5 billion in damage")
We see that tornadoes and floods are particular threats to property.
cropdmgplot + labs(x="Crop damage (billions of dollars)", y="Event type",
title="Crop damage in events with over $2.5 billion in damage")
Fewer types of events cause extensive crop damage than extensive property damage. Floods are threat to crops, just as to property, but droughts cause the greatest harm to crops.
We conclude that tornadoes are the primary concern, as they cause great damage both to population health and to property. Events of secondary concern include heatwaves, for their effect on health; and flooding, for its effect on property and crops.