This report uses data from the NOAA Storm Database, which details damages, injury and loss of life due to past incidences of extreme weather events.

We will use this data to find the events which are most harmful to public safety, and also identify which events result in the largest cost in terms of damages to property and to crops.

Data Processing

Get the data:

# Load the comma separated data into a dataframe for analysis

setwd("C:/Users/Gabriel/Documents/Coursera/5 Reproducible Research/week3")

csvfile <- bzfile("repdata-data-StormData.csv.bz2", 
                  open = "repdata-data-StormData.csv")

SD <- read.csv(csvfile, sep=',')

library(dplyr)

First we’ll find which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health in the US.

# using dplyr liabrary, we'll take just the data columns which show:
# - Type of event 
# - Number of Fatalities 
# - Number of Injuries

# We'll group by the type of event and add up the fatalities and injuries.

# We'll sort so the worst events are at the top of the dataset.

q1_data <- select(SD, EVTYPE, FATALITIES, INJURIES) %>%
           group_by(EVTYPE) %>%
           summarize(Fatalities = sum(FATALITIES), 
                     Injuries= sum(INJURIES)
           ) %>%
           arrange(desc(Injuries), desc(Fatalities))

# Next we take the Top 10 (worst)
q1_top_10 <- head(q1_data,10)

# reshape the data for plotting
library(reshape2)
DF1 <- melt(q1_top_10, id.var="EVTYPE")

# Make a plot showing each event type in our Top 10 (worst), 
# and a stacked count of injuries + fatalities for each type of event.
library(ggplot2)
ggplot(DF1, aes(x = EVTYPE, y = value, fill = variable)) + 
  geom_bar(stat = "identity") +
  labs(y = "Total Fatalities and Injuries ") +
  labs(x = "Type of Event") +
  labs(title = "Top 10 Events with the Greatest Risk to Public Safety") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Then we can do similar analysis to identify which events (as indicated in the EVTYPE variable) are most devastating with respect to cost in damages.

# 
# using dplyr liabrary, we'll take just the data columns which show:
# - Type of event 
# - Cost of Property Damages 
# - Cost of Crop Damages

# We'll group by the type of event and add up the cost in damages.

# We'll sort so the worst events are at the top of the dataset.
q2_data <- select(SD, EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
  mutate(pdollars = ifelse(PROPDMGEXP == "K",PROPDMG *1000, 
                           ifelse(PROPDMGEXP == "M",PROPDMG *1000000,0)),
         cdollars = ifelse(CROPDMGEXP == "K", CROPDMG *1000, 
                           ifelse(CROPDMGEXP == "M", CROPDMG *1000000,0)) 
  ) %>%
  group_by(EVTYPE) %>%
  summarize(Property = sum(pdollars), 
            Crop = sum(cdollars)
  ) %>%
  arrange(desc(Property + Crop))

# Next, we take the Top 10 (worst)
q2_top_10 <- head(q2_data,10)

# shape the data for plotting
library(reshape2)
DF2 <- melt(q2_top_10, id.var="EVTYPE")

# Make a plot showing each event type in our Top 10 (worst), 
# and a stacked sum of damages (property and crops) for each type of event.
library(ggplot2)
ggplot(DF2, aes(x = EVTYPE, y = value/1000000000, fill = variable)) + 
  geom_bar(stat = "identity") +
  labs(y = "Total Damages (in Billions USD)") +
  labs(x = "Type of Event") +
  labs(title = "Top 10 Events with the Greatest Economic Consequences ") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Results:

The Worst Events for Public Safety:

Worst Events in Terms of Cost in Damages: