Synopsis

This paper is an analysis of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The purpose of the analysis is to determine the following:
1. Which event types are most harmful with respect to population health?
2. Which event types have the greatest economic consequences?
The summary of the findings:
1. Tornados cause the most harm to population health over time allthough, wild fires cause the most harm to population health on a per event basis.
2. Tornados cause the most economic harm over time.

Data Processing

Downloading and reading the data from the NOAA database

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "repdata%2Fdata%2FStormData.csv.bz2")
raw_data <- read.csv("repdata%2Fdata%2FStormData.csv.bz2")

The event types in the data include reference to specific tropical storms (e.g. “TROPICAL STORM GORDON”). Because we want to evaluate the damage of different event types, but not of specific events, the following code replaces all specific tropical storm names, in the EVTYPE column with the general name “tropical storm”:

raw_data$EVTYPE[grep("TROPICAL STORM", raw_data$EVTYPE)] <- "TROPICAL STORM"

Creating a new dataframe, based on the raw data (with the replaced values), which is grouped by event types, and includes 6 new result variables, for each event type:

library(dplyr)
ev_types <- group_by(raw_data, EVTYPE)
pop_health_summary <- summarize(ev_types, count = n(), sum_fat = sum(FATALITIES, na.rm = T), sum_inj = sum(INJURIES, na.rm = T), mean_fat = mean(FATALITIES, na.rm = T), mean_inj = mean(INJURIES, na.rm = T), sum_dam = sum_fat + sum_inj, mean_dam = mean_fat + mean_inj)

Creating 2 new data frames, based on pop_health_summary. Each arranged in descending order of the total damage (sum_dam) and the mean damage (mean_dam), respectively.Each data frame will include the top 10 event types for the arranged parameter.

library(plyr)
top_sum_dam <- head(arrange(pop_health_summary, desc(sum_dam)), 10)
top_mean_dam <- head(arrange(pop_health_summary, desc(mean_dam)), 10)

Creating a new dataframe, based on the raw data , which is grouped by event types, and includes a new result variable, for each event type:

library(dplyr)
eco_summary <- summarize(ev_types, count = n(), sum_dam_eco = sum(PROPDMG, na.rm = T) + sum(CROPDMG, na.rm = T))

Creating a new data frame, based on eco_summary, arranged in descending order of the total damage (sum_dam_eco). The data frame will include the top 10 event types for the arranged parameter.

library(plyr)
top_sum_eco <- head(arrange(eco_summary, desc(sum_dam_eco)), 10)

Results

  1. For the purpose of this analysis, I have defined “harm to population health” as the sum of fatalities and injuries caused by a specific type of event. It is imperative to examin the harm overtime as well as the harm caused per event of a certain type. Following are 2 charts. The first illustrates the top 10 event types, in terms of harm overtime. The second chart iluustrates the top 10 event types, in terms of mean harm per event. Plotting the top event types by total damage next to the top event types by mean damage per event of each type
y_sum <- top_sum_dam$sum_dam
names(y_sum) <- top_sum_dam$EVTYPE
y_mean <- top_mean_dam$mean_dam
names(y_mean) <- top_mean_dam$EVTYPE
par(mfrow = c(2,1), mai = c(3,1.5,.5,.5))
barplot(y_sum, main = "Total population health damage by event type", ylab = "Total damage", las = 2, col = top_sum_dam$EVTYPE, legend.text = names(y_sum) )
barplot(y_mean, main = "Mean population health damage by event type", ylab = "Mean damage", las = 2, col = top_mean_dam$EVTYPE, legend.text = names(y_mean))

The charts show that tornados cause the most harm to population health over time allthough, wild fires cause the most harm to population health on a per event basis.

  1. For the purpose of this analysis, I have defined “economic consequences” as The sum of property damage and crop damage, caused directly or indirectly, by this event type. In this analysiss I examined only the damage overtime. Following is a chart which illustrates the top 10 event types, in terms of damage overtime.
y_sum_eco <- top_sum_eco$sum_dam_eco
names(y_sum_eco) <- top_sum_eco$EVTYPE
par(mfrow = c(1,1), mai = c(3,1.5,.5,.5))
barplot(y_sum_eco, main = "Total economic damage by event type", ylab = "Total damage", las = 2, col = top_sum_dam$EVTYPE, legend.text = names(y_sum) )

The chart shows that tornados cause the most economic damage over time.