Course Project 2 - Storm Database Analysis

Synopsis

The following analysis deals with data from the U.S.National Oceanic and Atmospheric Administration on characteristics of major storms and weather events in the United States. Based on this data, we intend to answer the following questions: 1. Across the United States, which types of events are most harmful with respect to population health? 2. Across the United States, which types of events have the greatest economic consequences? In the following sections, we document the steps taken in order to do so, as well as the results obtained by the analysis.

Data Processing

We start off by loading all packages and datasets that will be used throughout the analysis.

library(dplyr)
library(stringr)
library(ggplot2)
library(reshape2)
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "storm data.csv.bz2")
file <- read.csv("./storm data.csv.bz2")

We then subset the dataset to remove columns that won’t be used in the analysis and edit the column names and event types in order to make them more readable. The data is now ready to be used in the analysis of the two proposed questions.

data <- file[,c(1,7,8,23,24,25,26,27,28,29)]
data$EVTYPE <- str_to_title(data$EVTYPE)
names(data) <- str_to_title(names(data))
names(data)[3] <- "Event"

Across the United States, which types of events are most harmful with respect to population health?

First, we summarise the data by number of fatalities and injuries per event type, arrange it in decreasing order and subset the top 10 rows.

data_health <- data %>% group_by(Event) %>% 
               summarise("Fatalities" = sum(Fatalities), "Injuries" = sum(Injuries))
data_health <- data_health[order(data_health$Fatalities,data_health$Injuries, decreasing = TRUE ),]
data_health <- as.data.frame(data_health[1:10,])

At this point, we already know which event types represent the most injuries and fatalities. But in order to visualize this better, we prepare a second dataframe that will allow us to put this information in a plot.

data_health_plot <- melt(data_health)
data_health_plot$Event <- factor(data_health_plot$Event, levels = data_health_plot$Event[order(data_health$Injuries, data_health$Fatalities)])

Finally, we make a barplot with the data, exhibiting the top 10 event types per number of fatalities and injuries. The obtained plot is displayed in the results section.

Across the United States, which types of events have the greatest economic consequences?

The analysis of the second question follows the same steps as the previous one. First, we summarise the data by number of fatalities and injuries per event type, arrange it in decreasing order and subset the top 10 rows.

data_damage <- data %>% group_by(Event) %>% 
               summarise("Crop" = sum(Cropdmg), "Property" = sum(Propdmg))
data_damage <- data_damage[order(data_damage$Property,data_damage$Crop, decreasing = TRUE ),]
data_damage <- as.data.frame(data_damage[1:10,])

Once again, we prepare a second dataframe that will allow us to put this information in a plot.

data_damage_plot <- melt(data_damage)
data_damage_plot$Event <- factor(data_damage_plot$Event, levels = data_damage_plot$Event[order(data_damage$Property, data_damage$Crop)])

We then make another barplot with the data, exhibiting the top 10 event types per value of damage to crops and properties. The obtained plot is displayed in the results section.

Results

The results obtained by the previous analysis are summarised below. Based on them, we can observe that Tornados are the event type that represent highest level of damage, both health-wise and economically.

##             Event Fatalities Injuries
## 1         Tornado       5633    91346
## 2  Excessive Heat       1903     6525
## 3     Flash Flood        978     1777
## 4            Heat        937     2100
## 5       Lightning        816     5230
## 6       Tstm Wind        504     6957
## 7           Flood        470     6789
## 8     Rip Current        368      232
## 9       High Wind        248     1137
## 10      Avalanche        224      170

##                 Event      Crop  Property
## 1             Tornado 100018.52 3212258.2
## 2         Flash Flood 179200.46 1420124.6
## 3           Tstm Wind 109202.60 1335995.6
## 4               Flood 168037.88  899938.5
## 5   Thunderstorm Wind  66791.45  876844.2
## 6                Hail 579596.28  688693.4
## 7           Lightning   3580.61  603351.8
## 8  Thunderstorm Winds  18684.93  446293.2
## 9           High Wind  17283.21  324731.6
## 10       Winter Storm   1978.99  132720.6