Public Health and Economic Consequences from Natural Disasters

Synopsis

In this report, I attempt to describe the most serious natural disasters requiring the attention of local, state, and federal government agencies and emergency managers. Various natural disasters exact tolls in both human terms (injuries and deaths) and economic impact (property and crop loss). The report analyzes data from the National Weather Service to identify the most consequential of the natural disasters faced by the United States. Because the data are spread across a large time period, and are less accurate in earlier years, I focused only on the years 2000-2011. A full description of the data on which the analysis is performed is available here:

Loading and Processing the Raw Data

First we load the raw data (see link above) into a data frame, and (for convenience) change the column names to lower case.

Reading Data

data<-read.table('repdata_data_StormData.csv.bz2',header = TRUE, sep=',')
colnames(data)<-tolower(colnames(data))

Data Processing

As I chose to focus on the 21st Century data available (2000-2011), I first convert the “bgn_date” column into a date using the Lubridate package https://cran.r-project.org/web/packages/lubridate/index.html and the dplyr library https://cran.r-project.org/web/packages/dplyr/index.html and add a column to the data called “year” containing just the year from the date. Then I remove the data prior to 2000.

library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:lubridate':
## 
##     intersect, setdiff, union
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
data$year<-mdy_hms(data$bgn_date) %>% year 
data<-data[which(data$year>1999),]

The data.table package https://cran.r-project.org/web/packages/data.table/index.html makes many of the following transformations much easier and more efficient. The data set uses a notation for economic losses in which dollar amounts are written across two columns, such as propdmg and propdmgexp, where the second column explains whether the first is to be multiplied by 1,000, 1,000,000, or 1,000,000,000 (K,M,B). For the first set of transformations we convert this into a single numerical column.

library(data.table)
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
## The following objects are masked from 'package:lubridate':
## 
##     hour, isoweek, mday, minute, month, quarter, second, wday,
##     week, yday, year
data<-as.data.table(data)
data[propdmgexp=='K',propdmg:=propdmg*1000]
data[propdmgexp=='M',propdmg:=propdmg*1000000]
data[propdmgexp=='B',propdmg:=propdmg*1000000000]
data[cropdmgexp=='K',cropdmg:=cropdmg*1000]
data[cropdmgexp=='M',cropdmg:=cropdmg*1000000]
data[cropdmgexp=='B',cropdmg:=cropdmg*1000000000]

Next, I created new columns to describe the annual financial losses (combining losses by crops as well as property losses into a new column “totaldmg”

data[,totaldmg:=cropdmg+propdmg]

In order to understand the effect per year of each of the disasters, I created columns “fatalities_year_evtype”, “injuries_year_evtype”, and “totaldmg_year_evtype” where each of these can be read as the cariable (fatalities, injuries, total damage) per year per event type.

data[,fatalities_year_evtype:=sum(fatalities), by=list(year,evtype)]
data[,injuries_year_evtype:=sum(injuries), by=list(year,evtype)]
data[,totaldmg_year_evtype:=sum(totaldmg), by=list(year,evtype)]

I also derived columns to examine the total impact by event type.

data[,fatalities_evtype:=sum(fatalities), by=evtype]
data[,injuries_evtype:=sum(injuries), by=evtype]
data[,propdmg_evtype:=sum(propdmg), by=evtype]
data[,cropdmg_evtype:=sum(cropdmg), by=evtype]
data[,totaldmg_evtype:=sum(totaldmg), by=evtype]

With all of the derived columns added, analysis is far easier.

Results

With the data processed, I can find the most impactful event type for each of the years examined by type of casualty and type of damage. Fatalities are examined first:

data[,evtype[which.max(fatalities_year_evtype)], by=year]
##     year             V1
##  1: 2000 EXCESSIVE HEAT
##  2: 2001 EXCESSIVE HEAT
##  3: 2002 EXCESSIVE HEAT
##  4: 2003    FLASH FLOOD
##  5: 2004    FLASH FLOOD
##  6: 2005 EXCESSIVE HEAT
##  7: 2006 EXCESSIVE HEAT
##  8: 2007        TORNADO
##  9: 2008        TORNADO
## 10: 2009    RIP CURRENT
## 11: 2010    FLASH FLOOD
## 12: 2011        TORNADO

Excessive heat is the most common cause of death in each year. And now injuries:

data[,evtype[which.max(injuries_year_evtype)], by=year]
##     year                V1
##  1: 2000           TORNADO
##  2: 2001           TORNADO
##  3: 2002           TORNADO
##  4: 2003           TORNADO
##  5: 2004 HURRICANE/TYPHOON
##  6: 2005           TORNADO
##  7: 2006    EXCESSIVE HEAT
##  8: 2007           TORNADO
##  9: 2008           TORNADO
## 10: 2009           TORNADO
## 11: 2010           TORNADO
## 12: 2011           TORNADO

Clearly, Tornados are the most common source of injury by disaster during the period reviewed. Next, look at economic damage - as total crop and propery damage by year:

data[,evtype[which.max(totaldmg_year_evtype)], by=year]
##     year                V1
##  1: 2000           DROUGHT
##  2: 2001    TROPICAL STORM
##  3: 2002         HURRICANE
##  4: 2003          WILDFIRE
##  5: 2004 HURRICANE/TYPHOON
##  6: 2005 HURRICANE/TYPHOON
##  7: 2006             FLOOD
##  8: 2007           TORNADO
##  9: 2008  STORM SURGE/TIDE
## 10: 2009              HAIL
## 11: 2010             FLOOD
## 12: 2011           TORNADO

Notice tornados have appeared on all three lists - they are clearly a significant concern for much of the US.

For the final bit of analysis, I examined the top five overall most impactful disastors from both a public health and an economic impact perspective. Using gather from the tidyr library https://cran.r-project.org/web/packages/tidyr/index.html I reshaped the data to enable a plots of the most harmful event types from a public health perspective.

library(tidyr)
# Select the previously processed 'fatalities_evtype','injuries_evtype', and 'evtype' columns and remove duplicates
casualties<-data %>% select('fatalities_evtype','injuries_evtype','evtype') %>% unique
# create a sum of casualties (fatalities + injuries)
casualties[,total:=fatalities_evtype+injuries_evtype]
#rename the column names for clener labeling later
colnames(casualties)[1:2]<-c('fatalities','injuries')
# order the frame by the total casualty count, take the top five entries, and gather the data into a narrow frame with keys for type of casualty
casualties<-casualties[order(total, decreasing = TRUE)]%>%head(5) %>% gather(casualty_type,casualties,fatalities:injuries)

library(ggplot2)
# use ggplot2 to plot the data
p<-ggplot(casualties, aes(evtype,casualties, fill=casualty_type))+geom_bar(stat='identity')+ theme(axis.text.x = element_text(angle = 90)) + labs(title ="Fig. 1 - Top Five Causes of Casualties by Disaster Type 2000-2011", x = "Event Type", y = "Number of Casualties") 
print(p)

Now we look at the same plot for economic impact:

# Select the previously processed 'propdmg_evtype','cropdmg_evtype','totaldmg_evtype','evtype' columns and remove duplicates
losses<-data %>% select('propdmg_evtype','cropdmg_evtype','totaldmg_evtype','evtype') %>% unique
#rename the column names for clener labeling later
colnames(losses)[1:2]<-c('property','crop')
# order the frame by the total loss, take the top five entries, and gather the data into a narrow frame with keys for type of loss
losses<-losses[order(totaldmg_evtype, decreasing = TRUE)]%>%head(5) %>% gather(loss_type,losses,property:crop)

p<-ggplot(losses, aes(evtype,losses, fill=loss_type))+geom_bar(stat='identity')+ theme(axis.text.x = element_text(angle = 90)) + labs(title ="Fig. 2 - Top Five Causes of Loss by Disaster Type 2000-2011", x = "Event Type", y = "Amount of Loss") 
print(p)

Conclusion

As we can see from the graphs tornados are by far the most costly in terms of public health events. Floods, hurricanes, and storm surges cause more property damage, but tornados are also not insignificant contributors to economic damage from natural disasters.