The following study examines the impact of weather events on population health/economics in US using the National Weather Service Storm Data. The by far most important event for both population health and economics is tornado. Heat, flood and wind events follow. With respect to crop damage, hail is the most important event but the amount of damage (in $) is much smaller that the property damage caused by tornados, floods and winds.
The data is downloaded (if not existing) into the current working directory. Data file is available as bz2. For performance reasons, this is unzipped to allow repeated reading of the file.
Information about the data can be found at the following locations:
Population health can be assessed by number of fatalities and injuries. Economic damage can be assessed by property and crop damage (both in $).
library(R.utils)
library(data.table)
if(!file.exists("repdata-data-StormData.csv.bz2")) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
destfile = "repdata-data-StormData.csv.bz2")
bunzip2("repdata-data-StormData.csv.bz2", remove=F)
}
data=fread("repdata-data-StormData.csv")
Data column names are normalized to lower case letters.
colnames(data)=tolower(colnames(data))
Many event types in the data set are doubled like: Damaging Freeze, DAMAGING FREEZE but also more complex cases like: DOWNBURST, DOWNBURST WINDS. To account for the simplest cases, the event types are normalized to lower case before the analysis is done.
data$evtype<-tolower(data$evtype)
Additionally, a simplified data set is prepared which fits the need of the following analysis.
library(dplyr)
eventData=select(data, evtype, fatalities, injuries, propdmg, cropdmg)
To answer the question which events are most harmful, the data is summarized by event type.
library(reshape2)
molten=melt(eventData, id.vars=c("evtype"))
summedByEventtype=dcast(molten, evtype ~ variable, sum)
For plotting, the data should be ordered based on a given category (fatalities, injuries, propdmg, cropdmg). Additionally, all data is neglected, which accounts for less than 10% of the impact.
library(ggplot2)
library(gridExtra)
topEventsPlot<-function(category, label){
orderedByCategory=summedByEventtype[sort(summedByEventtype[[category]],
index.return=T, dec=T)$ix, ]
incrementedByCategory=Reduce(function(x, y){append(x, y+tail(x, n=1))},
orderedByCategory[[category]], 0)
topCategories=orderedByCategory[
incrementedByCategory<0.9*tail(incrementedByCategory, n=1),]
p1<-qplot(ordered(topCategories$evtype, levels=topCategories$evtype),
topCategories[[category]])+
geom_bar(stat = "identity")+ theme(axis.text.x = element_text(angle = 90,
hjust = 1))+
labs(x="Event types", y=label)
}
Population health is summarized by fatalities and injuries. For each of these categories, there is a column in the data set.
Lets sort the events and look at the cummulative distribution. Lets concentrate in the following on the events which causes 90% of the fatalities or injuries.
p1<-topEventsPlot("fatalities", "Number of fatalities")
p2<-topEventsPlot("injuries", "Number of injuries")
do.call(grid.arrange, c(list(p1, p2), list(ncol=2),
top="Population health impact of weather events accounting for 90% of the numbers"))
Based on the plot, the most dangerous weather event is tornado with the highest number of fatalities and injuries. While for the injuries tornados are by far the most important category, excessive heat is the second most important factor for fatalities causing approx. 1/3 of the fatalities compared to tornados.
Using the same procedure as for the health impact analysis, we can now look at the economic consequences Economic consequences can again be separated into general property damage and damage with respect to crop.
p1<-topEventsPlot("propdmg", "Property damage [$]")
p2<-topEventsPlot("cropdmg", "Crop damage [$]")
do.call(grid.arrange, c(list(p1, p2), list(ncol=2),
top="Economic consequences of weather events accounting for 90% of the damage"))
The most important events with respect to economic consequences are tornados (property) and hail (crop). The next most important events are wind and flood events (summarized).