The U.S. National Oceanic and Atmospheric Administration’s storm database tracks fatalities, injuries, and property and crop damage attributable to major storms and weather events in the United States since the year 1950. Analysis of this data shows that tornadoes are by far the most harmful to public health: since 1950 they have caused 5630 fatalities (37% of total fatalities attributable to severe weather events) and 91285 injuries (65% of total injuries). Flood events have been the most harmful in economic terms, measured as the combination of all damages to property and crops. Damage from floods since 1950 amounts to $150 billion, 32% of all damages attributable to severe weather events. These findings emphasize the importance of effective tornado and flood warning and response systems, as well as effective preparedness programs to educate the U.S. population.
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database tracks characteristics of major storms and weather events in the United States since the year 1950, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
Using the NOAA storm database, the goal in this paper is to answer the following questions about severe weather events.
Across the United States, which types of events are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
The data analysis uses the following R packages:
library(R.utils)
library(ggplot2)
library(gridExtra)
library(plyr)
The data was downloaded on September 10, 2014 from NOAA Storm Data and comes in the form of a comma-separated-value file compressed via the bzip2 algorithm.
destPath <- "StormData.csv"
# download and uncompress data if not in working directory
if (!file.exists(destPath)) {
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
tmp <- tempfile(fileext="bz2")
download.file(fileUrl, tmp, method="curl")
bunzip2(tmp, destPath)
}
# load the StormData in a data frame
df <- read.csv(destPath)
The data contains 902297 observations with 37 variables.
The variables of interest to answer the questions are the following:
EVTYPE: event name describing the meteorological event leading to fatalities, injuries, damages, etc.
FATALITIES: number of people killed directly or indirectly attributable to the event.
INJURIES: number of people injured directly or indirectly attributable to the event.
PROPDMG and PROPDMGEXP : Property damage estimates in US dollar amounts attributable to the event. Alphabetical characters are used to signify magnitude include “K” for thousands, “M” for millions, and “B” for billions.
CROPDMG and CROPDMGEXP: Crop damage estimates in US dollar amounts attributable to the event. Alphabetical characters are used to signify magnitude include “K” for thousands, “M” for millions, and “B” for billions.
The following data processing steps are performed on the raw data to tidy up the data set
outlier removal: we only keep observations with valid damage expense magnitude characters : blank, K, M, B, in both PROPDMGEXP and CROPDMGEXP variables.
US dollar amount calculations in billion dollars using the dollar amounts and the magnitude characters for both property and crop damages, stored in two new variables: PropDmg and CropDmg
# dataframe with metric values for dollar amount calculations (unit is 1 billion USD)
metric <- data.frame(Symb = c("","K","M","B"), Mult = 10^c(-9,-6,-3,0))
# cleaning up data set, only observations with valid damages
df2 <- subset(df,PROPDMGEXP %in% metric$Symb & CROPDMGEXP %in% metric$Symb)
df2 <- transform(df2, PROPDMGEXP=factor(PROPDMGEXP), CROPDMGEXP=factor(CROPDMGEXP))
# adding variables with dollar amounts for damages
df2 <- merge(df2, metric, by.x="PROPDMGEXP", by.y="Symb")
df2 <- transform(df2, PropDmg = PROPDMG * Mult, Mult=NULL)
df2 <- merge(df2, metric, by.x="CROPDMGEXP", by.y="Symb")
df2 <- transform(df2, CropDmg = CROPDMG * Mult, Mult=NULL)
The tidy data contains 901921 observations with 39 variables.
As a last step, we calculate the total recorded fatalities, injuries, and total damages per event-type.
# total recorded fatalities, injuries, and total damages per event-type
df3 <- ddply(df2, .(EVTYPE), plyr::summarize,
Fatalities = sum(FATALITIES), Injuries = sum(INJURIES),
TotalDmg = sum(PropDmg+CropDmg))
The summarized data contains fatality, injuries, and total damage totals for 981 event types.
The most harmful events with respect to population health since 1950 are displayed in the graph below. In the first plot, the y-axis shows the event types with more than 125 fatalities ordered from high to low, and the x-axis indicates the fatality count per event type. In the plot below the y-axis shows the event types with more than one thousand injuries ordered from high to low, the x-axis indicates the the total injury count per event type.
p1 <-
ggplot(subset(df3,Fatalities>125),aes(reorder(EVTYPE,Fatalities),Fatalities)) + coord_flip() +
geom_bar(stat="identity",fill="white", colour="darkgreen") + ylim(c(0,6000)) +
geom_text(aes(label=round(Fatalities,1)),hjust=-0.2,size=3) +
ggtitle("Fatalities") + ylab("Fatality Count") + xlab("Event Type")
p2 <-
ggplot(subset(df3,Injuries>1000),aes(reorder(EVTYPE,Injuries),Injuries)) + coord_flip() +
geom_bar(stat="identity",fill="white", colour="darkgreen") + ylim(c(0,100000)) +
geom_text(aes(label=round(Injuries,1)),hjust=-0.2,size=3) +
ggtitle("Injuries") + ylab("Injury Count") + xlab("Event Type")
grid.arrange(p1,p2,main="Harm to U.S. Population Health by Meteorological Event")
As observed in the plots above, Tornadoes have been the most harmful to the US population health in terms of Fatalities and Injuries.
Since 1950, tornadoes caused 5630 out of 15135 fatalities for all types of meteorological events combined, or 37 % .
In addition, tornadoes caused 91285 out of 140436 injuries for all types of meteorological events combined, or 65 % .
This underlines the importance of effective tornado warning systems and tornado preparedness and response programs as outlined by the Centers of Disease Control and Prevention (CDC) in CDC - Emergency Preparedness and Response - Tornadoes
To assess economic consequence, we use the total damage of property damage and damage to crops attributable to severe weather events. The most harmful event types with respect to U.S. economy in that sense, are displayed below. On the x-axis is the total damage since 1950, the y-axis shows the event types with a total damage larger than $1 billion.
ggplot(subset(df3,TotalDmg>1 ),aes(reorder(EVTYPE,TotalDmg),TotalDmg)) + coord_flip() +
geom_bar(stat="identity",fill="white", colour="darkgreen") + ylim(c(0,170)) +
geom_text(aes(label=round(TotalDmg,1)),hjust=-0.2,size=3) +
ggtitle("Total Economic Damage from Meteorological Events") +
ylab("Total Damage [billion USD]") +
xlab("Event Type")
As observed in the plot above, of all meteorological events, economic damage from flood events, has been the most harmful to the US economy, measured as the combination of all damages to property and crops since 1950.
Flood events amount for $150 billion out of $476 billion for all types of meteorological events combined, or 32 % .
This underlines the importance of effective flood preparedness and response programs as outlined by the Centers of Disease Control and Prevention (CDC) in CDC - Emergency Preparedness and Response - Floods