The following analysis deals with data from the U.S.National Oceanic and Atmospheric Administration on characteristics of major storms and weather events in the United States. Based on this data, we intend to answer the following questions: 1. Across the United States, which types of events are most harmful with respect to population health? 2. Across the United States, which types of events have the greatest economic consequences? In the following sections, we document the steps taken in order to do so, as well as the results obtained by the analysis.
We start off by loading all packages and datasets that will be used throughout the analysis.
library(dplyr)
library(stringr)
library(ggplot2)
library(reshape2)
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "storm data.csv.bz2")
file <- read.csv("./storm data.csv.bz2")
We then subset the dataset to remove columns that won’t be used in the analysis and edit the column names and event types in order to make them more readable. The data is now ready to be used in the analysis of the two proposed questions.
data <- file[,c(1,7,8,23,24,25,26,27,28,29)]
data$EVTYPE <- str_to_title(data$EVTYPE)
names(data) <- str_to_title(names(data))
names(data)[3] <- "Event"
First, we summarise the data by number of fatalities and injuries per event type, arrange it in decreasing order and subset the top 10 rows.
data_health <- data %>% group_by(Event) %>%
summarise("Fatalities" = sum(Fatalities), "Injuries" = sum(Injuries))
data_health <- data_health[order(data_health$Fatalities,data_health$Injuries, decreasing = TRUE ),]
data_health <- as.data.frame(data_health[1:10,])
At this point, we already know which event types represent the most injuries and fatalities. But in order to visualize this better, we prepare a second dataframe that will allow us to put this information in a plot.
data_health_plot <- melt(data_health)
data_health_plot$Event <- factor(data_health_plot$Event, levels = data_health_plot$Event[order(data_health$Injuries, data_health$Fatalities)])
Finally, we make a barplot with the data, exhibiting the top 10 event types per number of fatalities and injuries. The obtained plot is displayed in the results section.
The analysis of the second question follows the same steps as the previous one. First, we summarise the data by number of fatalities and injuries per event type, arrange it in decreasing order and subset the top 10 rows.
data_damage <- data %>% group_by(Event) %>%
summarise("Crop" = sum(Cropdmg), "Property" = sum(Propdmg))
data_damage <- data_damage[order(data_damage$Property,data_damage$Crop, decreasing = TRUE ),]
data_damage <- as.data.frame(data_damage[1:10,])
Once again, we prepare a second dataframe that will allow us to put this information in a plot.
data_damage_plot <- melt(data_damage)
data_damage_plot$Event <- factor(data_damage_plot$Event, levels = data_damage_plot$Event[order(data_damage$Property, data_damage$Crop)])
We then make another barplot with the data, exhibiting the top 10 event types per value of damage to crops and properties. The obtained plot is displayed in the results section.
The results obtained by the previous analysis are summarised below. Based on them, we can observe that Tornados are the event type that represent highest level of damage, both health-wise and economically.
## Event Fatalities Injuries
## 1 Tornado 5633 91346
## 2 Excessive Heat 1903 6525
## 3 Flash Flood 978 1777
## 4 Heat 937 2100
## 5 Lightning 816 5230
## 6 Tstm Wind 504 6957
## 7 Flood 470 6789
## 8 Rip Current 368 232
## 9 High Wind 248 1137
## 10 Avalanche 224 170
## Event Crop Property
## 1 Tornado 100018.52 3212258.2
## 2 Flash Flood 179200.46 1420124.6
## 3 Tstm Wind 109202.60 1335995.6
## 4 Flood 168037.88 899938.5
## 5 Thunderstorm Wind 66791.45 876844.2
## 6 Hail 579596.28 688693.4
## 7 Lightning 3580.61 603351.8
## 8 Thunderstorm Winds 18684.93 446293.2
## 9 High Wind 17283.21 324731.6
## 10 Winter Storm 1978.99 132720.6