Most Harmful and Economically Consequential Weather Events

Reproducible Research: Project 2

by Arif A. Arshad

Synopsis:

Using the the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database, a data anaysis was performed to identify the weather events that caused the most harm to population health in addition to finding the weather events that were the most economically consequential.  By comparing totals for fatalities, injuries, crop damage, and property damage by the kind of weather event, two overlapping sets of top events are identified that contribute heavily to health harm and economic damage.

Data and Data Processing:

The data was downloaded from US NOAA storm database directly from the link provided in the assignment, however more updated data can be found at the website, https://www.ncdc.noaa.gov/stormevents/.  The downloaded data includes (among other variables) the number of fatalities, the number of injuries, the cost of crop damage, and the cost of property damage by weather event for the years 1951-2011.
# Test for directory and create if need be
if(!file.exists("Storm Data")){
  dir.create("Storm Data")
}

# Download and unzip
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url = url, destfile = "./Storm Data/repdata%2Fdata%2FStormData.csv.bz2",
              method = "auto")
              
# Read and decompress the bz2 file
df <- read.csv("./Storm Data/repdata%2Fdata%2FStormData.csv.bz2")

Choice of Proxy Measures: Interpretation of “most harmful” and “greatest economic consequences”

For measurses of harm to population helath, the data set includes both the number of fatalities and the number of injuries that result from a weather event.  While fatalities are primary in any public health strategy, the fact that injuries occur at much greater rates strengthens the strategy of looking at both measures for assessing overall harmfulness.
There are also two measures that can capture what we mean when we look for the "greatest economic consequences".  Crop and property damages measured in dollars provide a picture of the economic consequences of a weather event.  Given the different economic implications of each kind of cost to the economy, it will be prudent to look at distribution of both kinds of costs when assessing what is most economically consequential.

Data Transformations:

The first transformation involved cleaning the EVTYPE column since "THUNDERSTORM WIND" and "THUNDERSTORM WINDS" were used interchangeably.  The dataset's documentation did not show two different EVTYPE variables.  As a result, instances of each were replaced with "THUNDERSTORM WIND", the name found in the documentation.

Other transformations applied to the data set involve summing the fatalities, injuries, crop damages, and property damages by weather event in order to get a list of totals for each kind of harm or damage. The totals were ranked by type of harm or damage, and then plotted in bar charts.
# Fix error in dataset.  THUNDERSTORM WIND and THUNDERSTORM WINDS are synonymous.
# Change all cases to THUNDERSTORM WIND
df$EVTYPE <- as.character(df$EVTYPE)
a <- grep(".*THUNDERSTORM WIND.*", df$EVTYPE)
df$EVTYPE <- gsub(".*THUNDERSTORM WIND.*", "THUNDERSTORM WIND", df$EVTYPE)

Subsetting for Measures of Physical Harm and then Ranking the Top Ten of Each Kind

# Select columsn and plot 
library(dplyr)
df_fat <- select(df,EVTYPE, FATALITIES)
df_inj <- select(df, EVTYPE, INJURIES)
fat_aggregate <- aggregate(df_fat$FATALITIES, by = list(df_fat$EVTYPE), FUN = sum)
inj_aggregate <- aggregate(df_inj$INJURIES, by = list(df_fat$EVTYPE), FUN = sum)
fat_aggregate <- arrange(fat_aggregate, desc(x))
inj_aggregate <- arrange(inj_aggregate, desc(x))
df_trim_arranged <- rbind(fat_aggregate, inj_aggregate)
status <- c(rep("FATALITY", 923), rep("INJURY", 923))
df_trim_arranged <- cbind(df_trim_arranged, status)
colnames(df_trim_arranged) <- c("EVTYPE", "COUNT", "STATUS")
df_ranked <- rbind(df_trim_arranged[1:10,], df_trim_arranged[924:933,])

Subsetting for Measures of Economic Damage and then Ranking the Top Ten

# Subset data for EVTYPE and the various damages and get totals
library(dplyr)
df_dama <- select(df, EVTYPE, PROPDMG, CROPDMG)

prop_aggregates <- as.data.frame(tapply(df_dama$PROPDMG, df_dama$EVTYPE, sum))
prop_aggregates <- cbind(rownames(prop_aggregates), prop_aggregates)
colnames(prop_aggregates) <- c("EVTYPE", "DAMAGE")
prop_aggregates <- arrange(prop_aggregates, desc(DAMAGE))

crop_aggregates <- as.data.frame(tapply(df_dama$CROPDMG, df_dama$EVTYPE, sum))
crop_aggregates <- cbind(rownames(crop_aggregates), crop_aggregates)
colnames(crop_aggregates) <- c("EVTYPE", "DAMAGE")
crop_aggregates <- arrange(crop_aggregates, desc(DAMAGE))

damages <- rbind(prop_aggregates, crop_aggregates)
d <- c(1:10, 924:933)
damages_top <- damages[d,]
damages_top$DAMAGE <- round(damages_top$DAMAGE / 100000, 2)
Damage.Type <- c(rep("Property", 10), rep("Crop", 10))
damages <- cbind(damages_top, Damage.Type)

Results:

Which weather events caused the most harm to population health?

# Plot barchart of the Top 10
library(ggplot2)
g <- ggplot(df_ranked, aes(EVTYPE, COUNT, fill = STATUS))
gg <- g + geom_bar(stat = "identity", position = "dodge")
ggt <- gg + theme(axis.text.x=element_text(angle = 90))
ggtt <- ggt + geom_text(aes(label=COUNT), position=position_dodge(width=.9), vjust=-0.75)
ggttc <-  ggtt + coord_cartesian(ylim = c(0,10000)) + annotate("text", x=12.25, y=10000, label= "91346")
ggttcl <- ggttc + labs(title = "Total Deaths and Injuries from Weather Events 1951-2011",
                       x = "Weather Event",
                       subtitle="To 10 Counts of Fatalities and Injuries Respectively, by Weather Event ")
print(ggttcl)
Figure 1: Number of Deaths and Injuries by Weather Event

Figure 1: Number of Deaths and Injuries by Weather Event

Figure 1  shows the top ten weather events by number of injuries and the top ten weather events by number of fatalities.  The tallies are displayed since the Tornado bar for property damage is cut off.  Tornadoes are the greatest cause of harm to health in general.  They have killed and injured more than any other weather event by a large margin.  

Significantly, seven of the top causes of fatalities are also the top generators of injuries.  As a result, we can conclude that the weather events that generate the greatest physical harm are tornadoes primarily, and heat, excessive heat, floods, flash floods, lightning, and TSTM winds secondarily.

Which weather events have the greatest economic consequences?

# Plot top of both kinds of damages by weather event
options(scipen=10000)
library(ggplot2)
g <- ggplot(damages, aes(EVTYPE, DAMAGE, fill = Damage.Type))
gg <- g + geom_bar(stat="identity", position = "dodge")
ggt <- gg + theme(axis.text.x = element_text(angle=90))
ggtt <- ggt + geom_text(aes(label=DAMAGE), position=position_dodge(width=.9), vjust=-0.15)
ggttc <-  ggtt + coord_cartesian(ylim = c(0,18)) + annotate("text", x=11.25, y=18, label= "32.12")
ggttcl <- ggttc + labs(title = "Top Causes of Economic Damage in the US: 1951-2011",
                       subtitle = "Top 10 Crop and Property Damage Totals by Weather Event",
                       x = "Weather Event",
                       y = "Damages in millions of $")
print(ggttcl)
Figure 2: Crop and Property Damage Totals by Weather Event

Figure 2: Crop and Property Damage Totals by Weather Event

Figure 2 shows the cost of both crop and property damages in millions of dollars.  Again, we find patterns.  And again, seven of the top ten weather events are common to both categories of damages. Similar to what we found with respect to fatalities and injuries, tornadoes cause greater devastation when compared to other storms by a large margin.  Others that are the greatest generators of both economic and property damage are floods, flash floods, TSTM winds, thunderstorm winds, hail, and high winds.

Are there any weather events that are top causes for both harm to health and economic damage?

Indeed there are four weather events that contribute heavily to both harm to health and economic damage.  These are tornadoes, floods, flash floods, and TSTM winds.