Impact of Storm Events on Health and Economic Damage in the US

Reproducible Research: Assignment 2

by R Bagnall
29 June 2015

Synopsys

The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The aims of this study are:

  • Identify the storm events that are most harmful to health
  • Identify the storm events causing the greatest economic damage

The database currently contains data from January 1950 to present, however, since recording of all 48 event types commenced in 1996, only data collected from 1996 to present was analysed. I will re-format the storm event types and subset data to required columns. An initial exploratory analysis revealed an erronious outlier value of US$115 billion damage for one storm event, which was removed. The storm event with the highest impact on population health was Tornado, whereas Hurricane/Typhoon was the event with the greatest economic impact.

Data Processing

The data for this analysis is in the form of a bz2 zipped comma separated value file (.csv.bz2). Unless already present, the file will be downloaded. The following timelines show the different time spans for each period of unique data collection:

  • 1950 through 1954, only tornado events were recorded
  • 1955 through 1992, only tornado, thunderstorm wind and hail events were recorded
  • 1993 to 1995, only tornado, thunderstorm wind and hail events were recorded
  • 1996 to present, all 48 event types are recorded, as defined on page 6 of the storm data preparation guide.

Only data from 1996 onwards will be analysed. The storm event types (EVTYPES) have many values that do not correspond to the official 48 event types recorded by NOAA. Therefore, limited re-formatting of event type data will be preformed so that the majority of values conform to official values. Additionally, the economic damage values will be converted to $US denominations, i.e. B - billion; M - million; K - thousand, and empty values will be replaced with 0. The following data procesing will be performed:

  • The year of storm event will be added to each row to enable subsetting from 1996 onwards
  • Required columns and data from years 1996 onwards will be selected using of the sqldf library
  • The month of storm event will be added to each row
  • The sum of injuries and fatalities of each event will be added
  • Re-formatting of event type data
  • Add a column of the total economic damage to crops (CROPTOTAL)
  • Add a column of the total economic damage to property (PROPTOTAL)
  • Add a column of the sum of CROPTOTAL and PROPTOTAL (ECONOMIC).

This data processing will reduce the size of the database, making subsequent analysis quicker.

require(sqldf)
require(ggplot2)
require(reshape2)
require(grid)
options(scipen=999)

# download data file if needed
if(!file.exists("Storm-data.csv.bz2"))
     download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "Storm-data.csv.bz2", method="curl", mode="wb")
# read in data
df <- read.csv("Storm-data.csv.bz2", sep=",")
# add a year column
df$YEAR <- as.factor(as.numeric(format(as.Date(sapply(strsplit(as.character(df$BGN_DATE), ' '), '[',1), "%m/%d/%Y"), "%Y")))

# sebset data to 1996 onwards, and add required columns for data analysis
df2 <- sqldf("select BGN_DATE, EVTYPE, FATALITIES, INJURIES, STATE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP from df where YEAR >1995 and (INJURIES >0 or FATALITIES >0 or PROPDMG > 0 or CROPDMG >0)")

# add column for month and (INJURIES + FATALITIES)
df2$MONTH <- as.factor(as.numeric(format(as.Date(sapply(strsplit(as.character(df2$BGN_DATE), ' '), '[',1), "%m/%d/%Y"), "%m")))
df2$HARM <- rowSums(df2[, c(3, 4)])

# Re-format EVTYPE values
df2$EVTYPE <- toupper(df2$EVTYPE)

#string replacements
df2$EVTYPE <- gsub("CURRENTS", "CURRENT", df2$EVTYPE)
df2$EVTYPE <- gsub("HEAT WAVE", "EXCESSIVE HEAT", df2$EVTYPE)
df2$EVTYPE <- gsub("EXTREME HEAT", "EXCESSIVE HEAT", df2$EVTYPE)
df2$EVTYPE <- gsub("EXTREME COLD$", "EXTREME COLD/WIND CHILL", df2$EVTYPE)
df2$EVTYPE <- gsub("TSTM", "THUNDERSTORM", df2$EVTYPE)
df2$EVTYPE <- gsub("HURRICANE$", "HURRICANE/TYPHOON", df2$EVTYPE)
df2$EVTYPE <- gsub("^COLD$", "COLD/WIND CHILL", df2$EVTYPE)
df2$EVTYPE <- gsub("WINDS", "WIND", df2$EVTYPE)
df2$EVTYPE <- gsub("^FOG", "FREEZING FOG", df2$EVTYPE)
df2$EVTYPE <- gsub("HEAVY SURF/", "", df2$EVTYPE)
df2$EVTYPE <- gsub("LANDSLIDE", "DEBRIS FLOW", df2$EVTYPE)
df2$EVTYPE <- gsub("/FOREST", "", df2$EVTYPE)

# Clean PROPDMG/EXP and CROPDMG/EXP
df2$PROPDMGEXP[!nzchar(as.character(df2$PROPDMGEXP))] <- "0"
df2$CROPDMGEXP[!nzchar(as.character(df2$CROPDMGEXP))] <- "0"

# B M K to denominations 
df2$PROPDMGEXP <- gsub("K", "1000", df2[,7])
df2$PROPDMGEXP <- gsub("M", "1000000", df2[,7])
df2$PROPDMGEXP <- gsub("B", "1000000000", df2[,7])

df2$CROPDMGEXP <- gsub("K", "1000", df2[,9])
df2$CROPDMGEXP <- gsub("M", "1000000", df2[,9])
df2$CROPDMGEXP <- gsub("B", "1000000000", df2[,9])

# add column for total crop damage and total property damage and (crop + property damage)
df2$CROPTOTAL <- as.numeric(df2$CROPDMG) * as.numeric(df2$CROPDMGEXP)
df2$PROPTOTAL <- as.numeric(df2$PROPDMG) * as.numeric(df2$PROPDMGEXP)
df2$ECONOMIC <- rowSums(df2[, c(12, 13)])

Preliminary Exploratory Data Analysis:

Following extensive data cleaning, it is prudent to perform a summary analysis of data. A summary of the injuries + fatalities, and the economic costs will be analysed.

summary(df2$HARM)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##    0.0000    0.0000    0.0000    0.3314    0.0000 1308.0000
summary(df2$ECONOMIC)
##         Min.      1st Qu.       Median         Mean      3rd Qu. 
##            0         2500        10000      1994000        35000 
##         Max. 
## 115000000000

The summary of ECONOMIC damage show that the highest single event resulted in US$115 billion damage. This seems unreasonably high, therefore, we will explore the spread of the ECONOMIC damage values.

par(mar=c(5.1, 6.1, 4.1, 2.1))

boxplot(df2$ECONOMIC, main="Economic Impact of Storm Events", ylab="Cost of Damage (Billion US$)", xlab="Economic Damage", las=1, yaxt="n")

axis(2, at=c(0, 20000000000, 40000000000, 60000000000, 80000000000, 100000000000), labels=c("0", "20", "40", "60", "80", "100"), las=2)

The boxplot shows tha spread of the ECOMOMIC damage, where each dot represent the cost of crop damage plus property damage for each storm event. The highest value represents a single storm event causing US$115032500000 worth of damage to property and crops.

Outlier Removal: The total ECONOMIC damage to have occurred since 1996 is US$401520344110.
The highest single ECONOMIC damage value is US$115032500000, representing 28.6% of the total ECONOMIC damage since 1996. This event was a FLOOD which occurred in the state of CA on 1/1/2006. In fact, a preliminary assessment estimates that the storms resulted in $300 million in damages. Since the value of US$115 billion represents an error, it will be removed.

df3 <- sqldf("select * from df2 where ECONOMIC < 100000000000")

Results

The aim of step 1 is to identify the storm event types having the greatest impact on population health. The total injuries and fatalities will be plotted for each event type, and the highest 15 event types will be shown.

# find the strom events affecting injuries and fatalities 
inj <- aggregate(df3$INJURIES ~ df3$EVTYPE, df3, sum)
fat <- aggregate(df3$FATALITIES ~ df3$EVTYPE, df3, sum)
# merge together
harmmerged <- merge(inj, fat)
colnames(harmmerged) <- c("EVTYPE", "INJURIES", "FATALITIES")
# order by health impact
harmmelt <- head(harmmerged[order(-harmmerged[,2]),], 15)
harmmelted<- melt(harmmelt, id="EVTYPE")

harmmelted <- transform(harmmelted, EVTYPE = reorder(EVTYPE, -value))
# plot events having the greatest health impact
ggplot(harmmelted, aes(x=EVTYPE, y=value, fill=variable)) + geom_bar(stat="identity")  + theme(axis.text.x = element_text(angle=45, vjust=1, hjust=1), plot.title =element_text(face="bold"), legend.title=element_blank()) + ggtitle("Storm Events Most Harmful to Population Health") + ylab("Total Injuries and Fatalities") + xlab("Storm Event Type")

The bar chart shows the 15 storm event types with the greatest impact on human health. Injuries are shown in red and fatalities are shown in blue.

The cumulative total of fatalities and injuries from 1996 is 66707 and the cumulative total of the top 15 events is 59907, representing 89.8%. Tornados have the greatest impact on health in the US, followed by excessive heat and flood events.

The aim of step 2 is to identify the storm events having the greatest economic impact on property and crops.

# find the strom events affecting property and crops
prop <- aggregate(df3$PROPTOTAL ~ df3$EVTYPE, df3, sum)
crop <- aggregate(df3$CROPTOTAL ~ df3$EVTYPE, df3, sum)
cropmerged <- merge(prop, crop)

# find total economic cost column
colnames(cropmerged) <- c("EVTYPE", "PROPTOTAL", "CROPTOTAL")
cropmelt <- head(cropmerged[order(-rowSums(cropmerged[, c(2, 3)])),], 15)
cropmelted<- melt(cropmelt, id="EVTYPE")
# order by economic impact
cropmelted <- transform(cropmelted[, 1:3], EVTYPE = reorder(EVTYPE, -value))

ggplot(cropmelted, aes(x=EVTYPE, y=value, fill=variable)) + geom_bar(stat="identity")  + theme(axis.text.x = element_text(angle=45, vjust=1, hjust=1), plot.title =element_text(face="bold"), legend.title=element_blank()) + ggtitle("Storm Event Economic Impact\non Property and Crops") + ylab("Property and Crop Damage (Billion $US)") + xlab("Storm Event Type") + scale_y_continuous(breaks=c(0, 25000000000, 50000000000, 75000000000, 100000000000), labels=c("0", "25", "50", "75", "100")) + theme(plot.margin=unit(c(1,1,1,3), "cm"))

The bar chart shows the 15 storm event types causing the greatest economic damage to property (in red) and crops (in blue). Hurricane/Typhoon storm events cause the greatest economic damage to property, and overall damage, whereas drought events cause the greatest economic damage to crops.