Synopsis

In this project, we analyze the storm database taken from the U.S. National Oceanic and Atmospheric Administration (NOAA). We estimate the fatalities, injuries, property damage, and crop damage for each type of event (e.g., Flood, Typhoon, Tornado, Hail, Hurricane, etc.). Our goal is to determine which event is most harmful to US population (health) and which event has the largest economic consequences. Our analysis on Fatalities and Injuries conclude that Tornado is the most harmful event in respect to the US health (population). On the other hand, based on the Property and Cost damage, we conclude that Flood has the greatest economic consequences to the US.

Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Questions

The data analysis address the following questions:

Across the United States, which types of events are most harmful with respect to population health? Across the United States, which types of events have the greatest economic consequences?

Data Processing

0. Setup (load libraries)

library(data.table) library(ggplot2)

1. Loading the data into R

First, we read the data by using read.csv: data <- read.csv(“repdata-data-StormData.csv.bz2”, header = TRUE, sep=“,”)

2. Inspecting the data

Use colnames to check the column names: colnames(data)

3. Subsetting the data

selection <- c(‘EVTYPE’, ‘FATALITIES’, ‘INJURIES’, ‘PROPDMG’, ‘PROPDMGEXP’, ‘CROPDMG’, ‘CROPDMGEXP’) data <- data[, selection] summary(data)

data <- as.data.table(data) data <- data[(EVTYPE != “?” & (INJURIES > 0 | FATALITIES > 0 | PROPDMG > 0 | CROPDMG > 0)), c(“EVTYPE”, “FATALITIES”, “INJURIES”, “PROPDMG”, “PROPDMGEXP”, “CROPDMG”, “CROPDMGEXP”)]

4. Converting the exponent columns (PROPDMGEXP and CROPDMGEXP)

cols <- c(“PROPDMGEXP”, “CROPDMGEXP”) data[, (cols) := c(lapply(.SD, toupper)), .SDcols = cols]

PROPDMGKey <- c(“""” = 10^0, “-” = 10^0, “+” = 10^0, “0” = 10^0, “1” = 10^1, “2” = 10^2, “3” = 10^3, “4” = 10^4, “5” = 10^5, “6” = 10^6, “7” = 10^7, “8” = 10^8, “9” = 10^9, “H” = 10^2, “K” = 10^3, “M” = 10^6, “B” = 10^9) CROPDMGKey <- c(“""” = 10^0, “?” = 10^0, “0” = 10^0, “K” = 10^3, “M” = 10^6, “B” = 10^9)

data[, PROPDMGEXP := PROPDMGKey[as.character(data[,PROPDMGEXP])]] data[is.na(PROPDMGEXP), PROPDMGEXP := 10^0 ]

data[, CROPDMGEXP := CROPDMGKey[as.character(data[,CROPDMGEXP])] ] data[is.na(CROPDMGEXP), CROPDMGEXP := 10^0 ]

5. Creating two new columns of Property Cost and Crop Cost

data <- data[, .(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, PROPCOST = PROPDMG * PROPDMGEXP, CROPDMG, CROPDMGEXP, CROPCOST = CROPDMG * CROPDMGEXP)]

Analysis

1. Estimating the total of Fatalities and Injuries (Health Impacts)

Health_Impact <- data[, .(FATALITIES = sum(FATALITIES), INJURIES = sum(INJURIES), TOTAL_HEALTH_IMPACTS = sum(FATALITIES) + sum(INJURIES)), by = .(EVTYPE)]

Health_Impact <- Health_Impact[order(-TOTAL_HEALTH_IMPACTS), ]

Health_Impact <- Health_Impact[1:10, ]

head(Health_Impact, 10)

2. Estimating the total of Property Cost and Crop Cost (Economic Impacts)

Eco_Impact <- data[, .(PROPCOST = sum(PROPCOST), CROPCOST = sum(CROPCOST), TOTAL_ECO_IMPACTS = sum(PROPCOST) + sum(CROPCOST)), by = .(EVTYPE)]

Eco_Impact <- Eco_Impact[order(-TOTAL_ECO_IMPACTS), ]

Eco_Impact <- Eco_Impact[1:10, ]

head(Eco_Impact, 10)

Results

1. Answer to Question 1: Events that are most harmful with respect to population health

Health_Consequences <- melt(Health_Impact, id.vars = “EVTYPE”, variable.name = “Fatalities_or_Injuries”)

ggplot(Health_Consequences, aes(x = reorder(EVTYPE, -value), y = value)) + geom_bar(stat = “identity”, aes(fill = Fatalities_or_Injuries), position = “dodge”) + ylab(“Total Injuries/Fatalities”) + xlab(“Event Type”) + theme(axis.text.x = element_text(angle=45, hjust=1)) + ggtitle(“Top 10 US Weather Events that are Most Harmful to Population”) + theme(plot.title = element_text(hjust = 0.5))

2. Answer to Question 2: Events that have the greatest economic consequences

Eco_Consequences <- melt(Eco_Impact, id.vars = “EVTYPE”, variable.name = “Damage_Type”)

ggplot(Eco_Consequences, aes(x = reorder(EVTYPE, -value), y = value/1e9)) + geom_bar(stat = “identity”, aes(fill = Damage_Type), position = “dodge”) + ylab(“Cost/Damage (in billion USD)”) + xlab(“Event Type”) + theme(axis.text.x = element_text(angle=45, hjust=1)) + ggtitle(“Top 10 US Weather Events that have the Greatest Economic consequences”) + theme(plot.title = element_text(hjust = 0.5))