Effect of severe weather events in population health & economic impact across the United States

Synopsis

The purpose of this data analysis is to address the Effect of severe weather events in population health & economic impact across the United States.

This document describes the steps taken to address the (2) main questions, as well as the code chunks (written in R) for each step performed upon the original data set. The study was performed using the U.S. National Oceanic and Athmospheric Administration´s storm database. The analysis recodes certain variables and creates an Economic Indicator as well as a Health Indicator to assist the study.

The analysis answers:

Across the United States, which types of events are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?

Data Processing

To begin the analysis, load the required packages into R:

library(dplyr)
library(stringdist)
library(stargazer)
library(dplyr)
library(ggplot2)

1. Loading the Data

The data is loaded from the official download link provided and stored in the StormData object.

fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"

if (!file.exists("./repdata_data_StormData.csv")) { 
        download.file(fileURL, destfile = "repdata_data_StormData.csv.bz2")
        unzip("./repdata_data_StormData.csv.bz2")
}

StormData <- read.csv("repdata_data_StormData.csv", stringsAsFactors = FALSE, na.strings = c("NA", ""))

2. Reformatting variables

The beggining date (BGN_DATE) is changed into Date format. Notice that for every other date/time variable (such as BGN_TIME, END_DATE, etc.), the format is kept the same for sake of simplicity.

StormData$BGN_DATE <- as.Date(StormData$BGN_DATE, format = "%m/%d/%Y %H:%M:%S")

3. Event type classification

According to the official NOAA Storm Database documentation, there are only 48 official event types, a quick glance of our StormData data set shows 985 unique “categories” for the EVTYPE variable.

length(unique(StormData$EVTYPE))

## [1] 985

This means that there are certain event types in the original dataset not included in the official event type list.

The weather events will be reclassified so they match the official event type list:

Astronomical Low Tide
Avalanche
Blizzard
Coastal Flood
Cold/Wind Chill
Debris Flow
Dense Fog
Dense Smoke
Drought
Dust Devil
Dust Storm
Excessive Heat
Extreme Cold/Wind Chill
Flash Flood
Flood
Frost/Freeze
Funnel Cloud
Freezing Fog
Hail
Heat
Heavy Rain
Heavy Snow
High Surf
High Wind
Hurricane (Typhoon)
Ice Storm
Lake-Effect Snow
Lakeshore Flood
Lightning
Marine Hail
Marine High Wind
Marine Strong Wind
Marine Thunderstorm Wind
Rip Current
Seiche
Sleet
Storm Surge/Tide
Strong Wind
Thunderstorm Wind
Tornado
Tropical Depression
Tropical Storm
Tsunami
Volcanic Ash
Waterspout
Wildfire
Winter Storm
Winter Weather

The official list is built according to the documentation:

OEVT <- c("Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood", "Cold/Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke", "Drought", "Dust Devil", "Dust Storm", "Excessive Heat", "Extreme Cold/Wind Chill", "Flash Flood", "Flood", "Frost/Freeze", "Funnel Cloud", "Freezing Fog", "Hail", "Heat", "Heavy Rain", "Heavy Snow", "High Surf", "High Wind", "Hurricane (Typhoon)", "Ice Storm", "Lake-Effect Snow", "Lakeshore Flood", "Lightning", "Marine Hail", "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind", "Rip Current", "Seiche", "Sleet", "Storm Surge/Tide", "Strong Wind", "Thunderstorm Wind", "Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather")

Then, each observation of the EVTYPE column of the StormData dataset is renamed to its closest match using Approximate String Matching via “amatch” from the “stringdist” package.

StormData$EVTYPE <- OEVT[amatch(StormData$EVTYPE, table = OEVT, nomatch = "Other", maxDist = 15)]

Notice that this method will probably not be perfect for every observation and because the maxDist parameter is set to 15 (to avoid illogical cases), some events might be classified as NA(coerced).

Events coerced to NA´s:

sum(is.na(StormData$EVTYPE))

## [1] 11683

Events classified as NA will be renamed to “Other” (Event types that couldn´t be matched to an official event type.)

StormData$EVTYPE[is.na(StormData$EVTYPE)] <- "Other"

4. Handling exponent values for: crop damage & property damage.

The value for both CROPDMG and PROPDMG variables are in different scales specifeied by both CROPDMGEXP & PROPDMGEXP. A work (Provided by Usama A.F. Khalil under the discussion forums for this course) on How To Handle Exponent Value of PROPDMGEXP and CROPDMGEXP was done to determine the correct multiplier values for each “legend”.

4.1 Auxiliary dataframe

The following chunk of code creates an auxiliary dataframe that will be used to scale both crop damage and property damage variables:

expValues <- data.frame(legend = c(unique(StormData$PROPDMGEXP), "k"), multiplier = c(1000, 1000000, 0, 1000000000, 1000000, 1, 10, 10, 10, 0, 10, 10, 10, 100, 10, 100, 0, 10, 10, 1000))

The multiplier for each legend:

stargazer(expValues, summary = FALSE, type = "html", title = "Exponent Values and Multipliers")

**Exponent Values and Multipliers**

	legend	multiplier

1	K	1,000
2	M	1,000,000
3		0
4	B	1,000,000,000
5	m	1,000,000
6	+	1
7	0	10
8	5	10
9	6	10
10	?	0
11	4	10
12	2	10
13	3	10
14	h	100
15	7	10
16	H	100
17	-	0
18	1	10
19	8	10
20	k	1,000

4.2 Scaling

Property damage and crop damage are scaled to millions of USD (by first multipliying the value times the multiplier according to its legend) into new variables (PROPDMG.USD & CROPDMG.USD).

## PROPERTY DAMAGE:
StormData <- mutate(StormData, PROPDMG.USD = (StormData$PROPDMG * (expValues$multiplier[match(StormData$PROPDMGEXP, expValues$legend)]))/1000000)

## CROP DAMAGE:
StormData <- mutate(StormData, CROPDMG.USD = (StormData$CROPDMG * (expValues$multiplier[match(StormData$CROPDMGEXP, expValues$legend)]))/1000000)

5. Population Health: TV Indicator

Another variable is created: “Total Victims Indicator” ( TVI ) which is the sum of fatalities and injuries \[TVI = \text{Fatalities} + \text{Injuries}\]

StormData <- mutate(StormData, TVI = StormData$FATALITIES + StormData$INJURIES )

6. Economic Damage: TD Indicator

Another variable is created: Total Damage Indicator (TDI) which is the sum of property damage (in millions USD) and crop damage (in millions USD) \[TDI = \text{Property Damage }_{M.USD} + \text{Crop Damage }_{M.USD}\]

StormData <- mutate(StormData, TDI = StormData$PROPDMG.USD + StormData$CROPDMG.USD )

Analysis

Population Health

The data will be summarized calculating the mean TVI across each event type to obtain the average number of victims for each type. This method was chosen because it ignores the frequency of occurrence for each event type (the mean could be interpreted as an “expected number of victims”). In contrast; a summary by total number of victims (sum TVI), would accumulate fatalities and injuries from 1950 to 2011 for each event type.

The study will focus on the top 5 event types (with higher TVI values)

HealthSummary <- StormData %>% group_by(EVTYPE) %>% summarise(mean.TVI = mean(TVI))
HealthSummary <- arrange(HealthSummary, desc(mean.TVI))
HealthSummary <- HealthSummary[1:5, ]

Economic Impact

The data will be summarized calculating the mean TDI across each event type to obtain the average economic damage for each type. The reason this method was chosen is explained in the previous section.

NOTICE: The average TDI could be interpreted as the expected economic damage (in million USD) for each event type.

The study will focus on the top 5 event types (with higher TDI values)

EconSummary <- StormData %>% group_by(EVTYPE) %>% summarise(mean.TDI = mean(TDI))
EconSummary <- arrange(EconSummary, desc(mean.TDI))
EconSummary <- EconSummary[1:5, ]

Results

1. Across the United States, which types of events are most harmful with respect to population health?

From the analysis performed upon the original data, the event types that present (on average) higher values for TVI (Total Victims Indicator), or fatalities & injuries are: Excessive Heat, Heat, Seiches, Tornadoes and Rip Currents. Notice that the frequency of occurrence for this events is not taken into consideration.

g <- ggplot(HealthSummary, aes(x = EVTYPE, y = mean.TVI))
g + geom_col(aes(fill = EVTYPE)) + labs(title = "Weather Event Impact on Population Health", subtitle = "Top 5 - US: 1950-2011") + labs(x = "Event Type", y = "Mean TVI")

Figure 2: Health Impact

2. Across the United States, which types of events have the greatest economic consequences?

From the analysis performed upon the original data, the event types that present (on average) higher values for TDI (Total Damage Indicator), or Crop Damages & Property Damages are: Dense Smoke, Storm surge/tide, Other (events not matched to official event categories), Droughts and Floods. Notice that the frequency of occurrence for this events is not taken into consideration.

g <- ggplot(EconSummary, aes(x = EVTYPE, y = mean.TDI))
g + geom_col(aes(fill = EVTYPE)) + labs(title = "Weather Event Impact on Economic Damage", subtitle = "Top 5 - US: 1950-2011") + labs(x = "Event Type", y = "Mean TDI (millions of USD)") + scale_fill_brewer(palette = "Dark2")

Figure 3: Economic Damage