The following report analyzes data in the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to determine the most harmful types of weather events with respect to both population health and economic consequences. Based on our analysis, we draw the following conclusions: (1) the types of events that are most harmful to population health are heat-related events such as heat waves and excessive heat; (2) the types of events that have the greatest economic consequences are wind and rain storm events like hurricanes, tropical storms, and typhoons; and (3) overall, hurricanes and typhoons are the single most harmful type of weather event.
We begin by downloading the dataset from the Internet and reading it.
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "StormData.csv.bz2")
rawdata <- read.csv("StormData.csv.bz2")
We then subset the dataset to select only those columns that are relevant to our subsequent analysis: the event type, the variables indicating population harm (fatalities and injuries), and the variables indicating economic harm (property damage and crop damage).
data <- rawdata %>% select(EVTYPE, FATALITIES, INJURIES, PROPDMG, CROPDMG)
Finally, we convert all the event type labels to uppercase, to correct for duplicate event types that were written in both uppercase and lowercase in the original dataset.
levels(data$EVTYPE) <- toupper(levels(data$EVTYPE))
There are approximately 900 different event types recorded in the dataset. However, the vast majority of these event types (over 800) have very few data points recorded. As such, for the purposes of our analysis, we choose to ignore, as statistically insignificant, all event types that have fewer than 50 occurrences recorded. The following code performs this subsetting, which concludes our processing of the data.
grpddata <- data %>% group_by(EVTYPE) %>% summarize(OCCURRENCES = length(EVTYPE))
sigoccurs <- grpddata %>% filter(OCCURRENCES > 50)
sigevents <- as.character(sigoccurs$EVTYPE)
sigdata <- data %>% filter(as.character(EVTYPE) %in% sigevents)
To explore our data, we created two tables summarizing, for each event type, the average and maximum number of fatalities and injuries, as well as the average and maximum amount of property and crop damage, across all recorded occurrences.
healthstats <- sigdata %>% group_by(EVTYPE) %>% summarize(MEAN_FATALITIES = mean(FATALITIES), MAX_FATALTIES = max(FATALITIES), MEAN_INJURIES = mean(INJURIES), MAX_INJURIES = max(INJURIES))
econstats <- sigdata %>% group_by(EVTYPE) %>% summarize(MEAN_PROPDMG = mean(PROPDMG), MAX_PROPDMG = max(PROPDMG), MEAN_CROPDMG = mean(CROPDMG), MAX_CROPDMG = max(CROPDMG))
Looking over these tables, we concluded that the most relevant data points are the average calculations. In general, event types with high average numbers of fatalities also had a high maximum number of fatalities, and similarly for the other variables. More importantly, the average numbers indicate how harmful each event type is across all its occurrences, whereas the maximum numbers indicate only the most harmful occurrence of that event type to have occurred. Therefore, basing our conclusions on the average numbers allows us to offer a better general recommendation.
To discover which types of events are in general the most harmful, we created a method for selecting the most harmful event types based on our data. First, with our population health summary table, we created a new table listing only those event types that are in the top 15 of both average number of fatalities and average number of injuries.
cutfatal <- sort(healthstats$MEAN_FATALITIES, decreasing = TRUE)[15]
cutinj <- sort(healthstats$MEAN_INJURIES, decreasing = TRUE)[15]
sighealth <- healthstats %>% filter(MEAN_FATALITIES >= cutfatal & MEAN_INJURIES >= cutinj)
Similarly, with our economic summary table, we created a new table listing only those event types that are in the top 15 of both average amount of property damage and average amount of crop damage.
cutprop <- sort(econstats$MEAN_PROPDMG, decreasing = TRUE)[15]
cutcrop <- sort(econstats$MEAN_CROPDMG, decreasing = TRUE)[15]
sigecon <- econstats %>% filter(MEAN_PROPDMG >= cutprop & MEAN_CROPDMG >= cutcrop)
These two tables were then used to summarize and present our results.
Our results are summarized in the following two plots. Our first plot presents the weather events that are most harmful to population health, in terms of fatalities and injuries:
plotfatal <- ggplot(sighealth, aes(x = EVTYPE, y = MEAN_FATALITIES)) + labs(x = "", y = "Average Number of Fatalities", title = "Most Deadly Weather Events") + geom_bar(stat = "identity", fill = "tomato") + theme(axis.text.x = element_text(angle = 30, hjust = 1))
plotinj <- ggplot(sighealth, aes(x = EVTYPE, y = MEAN_INJURIES)) + labs(x = "", y = "Average Number of Injuries", title = "Most Injurious Weather Events") + geom_bar(stat = "identity", fill = "turquoise") + theme(axis.text.x = element_text(angle = 30, hjust = 1))
grid.arrange(plotfatal, plotinj, ncol = 2)
As can be seen from this plot, the weather events that are, on average, most fatal are all heat-related: heat, heat wave, and excessive heat. These same events also cause a high number of injuries. However, the weather events that cause, on average, the most injuries are hurricanes/typhoons.
As such, we conclude that the types of events that are most harmful to population health are heat-related events such as heat waves and excessive heat. Hurricanes and typhoons should also be kept in mind as the second most harmful event type.
Our second plot presents the weather events that have the greatest economic consequences, in terms of property damage and crop damage:
plotprop <- ggplot(sigecon, aes(x = EVTYPE, y = MEAN_PROPDMG)) + labs(x = "", y = "Average Amount of Property Damage", title = "Weather Event Property Damage") + geom_bar(stat = "identity", fill = "firebrick") + theme(axis.text.x = element_text(angle = 60, hjust = 1))
plotcrop <- ggplot(sigecon, aes(x = EVTYPE, y = MEAN_CROPDMG)) + labs(x = "", y = "Average Amount of Crop Damage", title = "Weather Event Crop Damage") + geom_bar(stat = "identity", fill = "goldenrod") + theme(axis.text.x = element_text(angle = 60, hjust = 1))
grid.arrange(plotprop, plotcrop, ncol = 2)
As can be seen from this plot, the weather events that cause, on average, the most property damage are all water-related, and are most damaging when they are storms which are combined with high winds, such as hurricanes, tropical storms, and typhoons. Similarly, the weather events that cause, on average, the most crop damage are events like hurricanes and typhoons.
As such, we conclude that the types of events that have the greatest economic consequences are wind- and rainstorm events like hurricanes, tropical storms, and typhoons. Water-related events like flooding should also be kept in mind as also very damaging, especially insofar as rainstorms can often lead to flooding.
The only type of event that appears on both of our plots is hurricanes and typhoons, and in both cases it ranks highly, being both the most injurious type of weather event and among the most damaging in terms of both property and crop damage. Therefore, we conclude that hurricanes and typhoons are the single most harmful type of weather event, in terms of both their harms to population health and economic consequences.