This paper is an analysis of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The purpose of the analysis is to determine the following:
1. Which event types are most harmful with respect to population health?
2. Which event types have the greatest economic consequences?
The summary of the findings:
1. Tornados cause the most harm to population health over time allthough, wild fires cause the most harm to population health on a per event basis.
2. Tornados cause the most economic harm over time.
Downloading and reading the data from the NOAA database
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "repdata%2Fdata%2FStormData.csv.bz2")
raw_data <- read.csv("repdata%2Fdata%2FStormData.csv.bz2")
The event types in the data include reference to specific tropical storms (e.g. “TROPICAL STORM GORDON”). Because we want to evaluate the damage of different event types, but not of specific events, the following code replaces all specific tropical storm names, in the EVTYPE column with the general name “tropical storm”:
raw_data$EVTYPE[grep("TROPICAL STORM", raw_data$EVTYPE)] <- "TROPICAL STORM"
Creating a new dataframe, based on the raw data (with the replaced values), which is grouped by event types, and includes 6 new result variables, for each event type:
library(dplyr)
ev_types <- group_by(raw_data, EVTYPE)
pop_health_summary <- summarize(ev_types, count = n(), sum_fat = sum(FATALITIES, na.rm = T), sum_inj = sum(INJURIES, na.rm = T), mean_fat = mean(FATALITIES, na.rm = T), mean_inj = mean(INJURIES, na.rm = T), sum_dam = sum_fat + sum_inj, mean_dam = mean_fat + mean_inj)
Creating 2 new data frames, based on pop_health_summary. Each arranged in descending order of the total damage (sum_dam) and the mean damage (mean_dam), respectively.Each data frame will include the top 10 event types for the arranged parameter.
library(plyr)
top_sum_dam <- head(arrange(pop_health_summary, desc(sum_dam)), 10)
top_mean_dam <- head(arrange(pop_health_summary, desc(mean_dam)), 10)
Creating a new dataframe, based on the raw data , which is grouped by event types, and includes a new result variable, for each event type:
library(dplyr)
eco_summary <- summarize(ev_types, count = n(), sum_dam_eco = sum(PROPDMG, na.rm = T) + sum(CROPDMG, na.rm = T))
Creating a new data frame, based on eco_summary, arranged in descending order of the total damage (sum_dam_eco). The data frame will include the top 10 event types for the arranged parameter.
library(plyr)
top_sum_eco <- head(arrange(eco_summary, desc(sum_dam_eco)), 10)
y_sum <- top_sum_dam$sum_dam
names(y_sum) <- top_sum_dam$EVTYPE
y_mean <- top_mean_dam$mean_dam
names(y_mean) <- top_mean_dam$EVTYPE
par(mfrow = c(2,1), mai = c(3,1.5,.5,.5))
barplot(y_sum, main = "Total population health damage by event type", ylab = "Total damage", las = 2, col = top_sum_dam$EVTYPE, legend.text = names(y_sum) )
barplot(y_mean, main = "Mean population health damage by event type", ylab = "Mean damage", las = 2, col = top_mean_dam$EVTYPE, legend.text = names(y_mean))
The charts show that tornados cause the most harm to population health over time allthough, wild fires cause the most harm to population health on a per event basis.
y_sum_eco <- top_sum_eco$sum_dam_eco
names(y_sum_eco) <- top_sum_eco$EVTYPE
par(mfrow = c(1,1), mai = c(3,1.5,.5,.5))
barplot(y_sum_eco, main = "Total economic damage by event type", ylab = "Total damage", las = 2, col = top_sum_dam$EVTYPE, legend.text = names(y_sum) )
The chart shows that tornados cause the most economic damage over time.