A Preliminary Review of the Impact of Natural Disasters on Human Health and Economics in the U.S.

Synopsis

A high level-analysis on the impact of various natural disasters in the United States was performed using data provided by the National Weather Service. The purpose of the analysis was to determine which types of natural disasters have the greatest impact on human health and economics in the U.S. As the data contain information for some regions outside of the U.S., every part of the analysis utilizes on the data gathered for the 50 U.S. states. To determine the type of disaster with the greatest impact on human health, fatality and injury data was aggregated by disaster type and visualized such that conclusions could be drawn. A similar anlaysis was performed using property damage and crop damage estimates to determine economic impacts of disasters. There was minimal cleaning of the data during the analysis, including two major steps: 1) some entries were inconsistent with capialization of words or letters, and this was remedied by changing entries to all capitalized letters such that the aggregation technique would be more robust; 2) property and crop damage estimates were recorded using three significant figures and a suffix containing information about magnitude (for example, ‘B’ for billion), and these magnitudes were applied to make cost estimates numerical and standardized for aggregation.

Data Processing

Data processing includes loading the data into the R session and doing preliminary cleaning. The user should first download the data found in the zipped link on the Coursera Peer Assignment 2 website and unzip the data. The .csv file should then be moved into the user’s working directory and loaded into the R session.

storm.data <- read.csv('repdata-data-StormData.csv', header=TRUE, sep=',')

As the scope of the analysis is limited to the U.S., only data from the 50 U.S. states and the District of Colombia (‘DC’) are kept. The data should then be cleaned in two steps: 1) making event-type entries (EVTYPE) all capitalized to eliminate some inconsistencies in data entries; 2) using the suffix on expense-related entries to standardize cost estimates as numerical entires. Step 2 invovles using the CROPDMGEXP and PROPDMGEXP fields to determine the magnitude of the expense using an alphabetic character, i.e. ‘B’ for billion (x10^9) or ‘K’ for thousand (x10^3). These alphabetic characters also have inconsistencies in capitalization, such that capitalizing all entries is necessary for standardization.

The user should note that the cleaning done here does not correct for spelling errors in entries. For example, there are entries for “MUD SLIDES” and “MUDSLIDE”. A further step for future analysis could include additional data cleaning to correct these errors. That was not done in this analysis, as the results do not show that these mistakes are impactful on the outcomes of the analysis.

states <- c(state.abb,'DC')
storm.data <- storm.data[storm.data[,'STATE'] %in% states, ]
storm.data$EVTYPE <- toupper(storm.data$EVTYPE)
storm.data$CROPDMGEXP <- toupper(storm.data$CROPDMGEXP)
storm.data$PROPDMGEXP <- toupper(storm.data$PROPDMGEXP)

Analysis

The goal of the analysis is to provide a high-level overview of which type of natural disaster is: 1) most detrimental to human health, and; 2) most detrimental to the local economy. The data are paritioned based on relevance into health.data and damages.data sets. For the health impact, the total number of injuries and fatalities are considered using a summation of both varialbes for each event type. For economic impact, the estimates of property damaage and crop damage related monetary losses are summed by event type. The aggregate function is utilized for analysis.

# health related impact
health.data <- storm.data[ ,c('STATE','COUNTYNAME','EVTYPE','FATALITIES','INJURIES')]
health.agg <- with(health.data, aggregate(cbind(INJURIES,FATALITIES)~EVTYPE, FUN=sum))
health.agg <- health.agg[with(health.agg, order(-FATALITIES, -INJURIES)), ]

# economic impact
damage.data <- storm.data[ ,c('STATE','COUNTYNAME','EVTYPE','PROPDMG','PROPDMGEXP','CROPDMG','CROPDMGEXP')]
damage.data$PROPDMG <- with(damage.data, ifelse(PROPDMGEXP=='K', PROPDMG*1000, ifelse(PROPDMGEXP=='M', PROPDMG*1000000, ifelse(PROPDMGEXP=='B', PROPDMG*1000000000,PROPDMG))))
damage.data$CROPDMG <- with(damage.data, ifelse(CROPDMGEXP=='K', CROPDMG*1000, ifelse(CROPDMGEXP=='M', CROPDMG*1000000, ifelse(CROPDMGEXP=='B', CROPDMG*1000000000,CROPDMG))))
damage.agg <- with(damage.data, aggregate(cbind(PROPDMG, CROPDMG)~EVTYPE, FUN=sum))

Results

The partitioned and aggregated data are plotted in order to make conclusions about the analysis (the user must load the ggplot2 library to generate plots with this code). The results show that tornados are the most detrimental natural disaster to human health, with 91,346 associated injuries and 5,633 fatalities.

library(ggplot2)
ggplot(subset(health.agg, FATALITIES>100 & INJURIES>100), aes(x=EVTYPE, y=FATALITIES, fill=INJURIES)) + geom_bar(stat='identity') + coord_flip() + scale_fill_continuous(low='yellow', high='red') + labs(title='Impact of Natural Disasters on Human Health', x='Event Type', y='Fatalities')

Regarding economic impact, floods are significantly more expensive in terms of property damage, while draught is the most negatively impactful event for crop losses. However, summing total damage (crop + property damage expenses), flood-related expenses are approximately 10x more expensive than drought expenses ($150.3 billion in expenses vs. $15.0 billion).

ggplot(subset(damage.agg, PROPDMG>1000000000 & CROPDMG > 500000000), aes(x=CROPDMG, y=PROPDMG, color=EVTYPE)) + geom_point() + labs(title='Natural Disaster Impact on Economy', x='Crop Damage Related Expenses', y='Property Damage Related Expenses') + scale_color_manual(name='Event Type', values=c('red','orange','magenta','purple','green','blue','grey','black','yellow','cyan','dodgerblue'))

However, there is a significant ammount of difference in the number of flood events and drought events recorded in the data; there are 25,327 entries for flood and 2,488 entries for drought. Factoring this into the analysis, the cost damages associated with an individual flood are $5.9 million vs. $6.0 million for a drought event.

damages.norm <- damage.agg[damage.agg[,'EVTYPE'] %in% c('FLOOD','DROUGHT'), ]
damages.norm$TOTALDMG <- damages.norm$PROPDMG + damages.norm$CROPDMG
damages.norm[damages.norm[,'EVTYPE']=='FLOOD','COUNT'] <- sum(storm.data[,'EVTYPE']=='FLOOD')
damages.norm[damages.norm[,'EVTYPE']=='DROUGHT','COUNT'] <- sum(storm.data[,'EVTYPE']=='DROUGHT')
damages.norm$NORMDMG <- damages.norm$TOTALDMG/damages.norm$COUNT
ggplot(damages.norm, aes(x=EVTYPE, y=NORMDMG)) + geom_bar(stat='identity', fill='dodgerblue') + labs(title='Normalized Total Expenses for Flood and Drought Events', x='Event Type', y='Expenses ($)')