Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data processing

Loading all of the necessary libraries. If any of them are not installed you should start off with install.packages(“name of package”) to install before loading.

library(ggplot2)
library(dplyr)
library(gridExtra)

Reading data and quick look with str()

StormData <- read.csv("repdata_data_StormData.csv", header=TRUE, sep=",")

str(StormData)

Choosing only variables that are needed for this analysis.

storm_data = select(StormData, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

Question 1: Which types of events are most harmful to population health?

Our focus will be the FATALITIES and INJURIES. We will group the data by event type and calculate the total number of fatalities and injuries for each.

fatalities = storm_data %>% 
  group_by(EVTYPE) %>% 
  summarize(total_fatalities = sum(FATALITIES)) %>%
  arrange(desc(total_fatalities))

injuries = storm_data %>% 
  group_by(EVTYPE) %>% 
  summarize(total_injuries = sum(INJURIES)) %>%
  arrange(desc(total_injuries))

Releveling the factors based on the number of fatalities and injuries. This can be skipped, but shows plot in descending order, thus helps with reading data.

fatalities$EVTYPE = factor(fatalities$EVTYPE, levels = fatalities$EVTYPE)
injuries$EVTYPE = factor(injuries$EVTYPE, levels = injuries$EVTYPE)

Creating plots for fatalities and injuries, but we look only at the top 10 ones and adjust the format of ggplot so event names can be properly read (with coord_flip())

plot_fatalities = ggplot(data = fatalities[1:10,], aes(y = total_fatalities, x = EVTYPE)) + 
  geom_bar(stat="identity") + xlab("Event") + ylab("Total number of fatalities") +
  ggtitle("Most fatal events") + coord_flip()

From the plot_fatalities it’s clear that tornadoes are the most harmful with the most fatal outcome, followed by excessive heat and flash flood and heat again.

plot_injuries = ggplot(data = injuries[1:10,], aes(y = total_injuries, x = EVTYPE)) + 
  geom_bar(stat="identity") + xlab("Event") + ylab("Total number of injuries") +
  ggtitle("Most events that led to injury") + coord_flip()

From plot_injuries it’s obvious that the result is the same, but here the gap between 1st and 2nd event is much more significant. Thus we can confirm that the leading event by it’s impact to population health is tornadoes.

We can also do a quick calculation on how often tornadoes are responsible for fatalities and injuries as the previous plots only showed comparison with top 10 events.

p_fatalities_tornado = round(fatalities[1,2]/sum(fatalities[,2]) * 100, 2)
p_injuries_tornado = round(injuries[1,2]/sum(injuries[,2]) * 100, 2)

As confirmed by calculations above tornadoes account for very significant portion of fatalities (37%) and injuries (65%).

Question 2: Which types of events have the greatest economic consequences?

Here we focus on the four damage variables - PROPDMG, PROPDMGEXP, CROPDMG, and CROPDMGEXP. These exponents can also be represented by symbols:

Standardizing and converting the exponents in the CROPDMGEXP and PROPDMGEXP columns of the storm_data dataset into numeric values.

actual_value = c("2","3","6","9","0")
coded_value = c("[Hh]","[Kk]","[Mm]","[Bb]","[+-/?]")
for(i in 1:length(actual_value))
{
  storm_data$CROPDMGEXP = gsub(coded_value[i], actual_value[i], storm_data$CROPDMGEXP)
  storm_data$PROPDMGEXP = gsub(coded_value[i], actual_value[i], storm_data$PROPDMGEXP)
}
storm_data$CROPDMGEXP[storm_data$CROPDMGEXP == ""] = "0"
storm_data$PROPDMGEXP[storm_data$PROPDMGEXP == ""] = "0"
storm_data$CROPDMGEXP = as.numeric(storm_data$CROPDMGEXP)
storm_data$PROPDMGEXP = as.numeric(storm_data$PROPDMGEXP)

Calculating the actual crop and property damages in the storm_data by applying the exponents stored in CROPDMGEXP and PROPDMGEXP.

storm_data = storm_data %>%
  mutate(crop_dmg = CROPDMG * 10^CROPDMGEXP, prop_dmg = PROPDMG * 10^PROPDMGEXP)

With the damage values calculated, the next step is to group the data by event type and compute the total crop and property damages for each event.

crop = storm_data %>%
  group_by(EVTYPE) %>%
  summarize(crop_dmg = sum(crop_dmg)) %>%
  arrange(desc(crop_dmg))

prop = storm_data %>%
  group_by(EVTYPE) %>%
  summarize(prop_dmg = sum(prop_dmg)) %>%
  arrange(desc(prop_dmg))

Releveling for the plot.

crop$EVTYPE = factor(crop$EVTYPE, levels = crop$EVTYPE)
prop$EVTYPE = factor(prop$EVTYPE, levels = prop$EVTYPE)

Creating plots for crop and property damage, same approach here - we look only at top 10.

plot_crop = ggplot(data = crop[1:10,], aes(y = crop_dmg, x = EVTYPE)) + 
  geom_bar(stat="identity") + xlab("Event") + ylab("Total damages in crop in USD") +
  ggtitle("Most economically damaging events (crop damage)") + coord_flip()

plot_prop = ggplot(data = prop[1:10,], aes(y = prop_dmg, x = EVTYPE)) + 
  geom_bar(stat="identity") + xlab("Event") + ylab("Total damages in property in USD") +
  ggtitle("Most economically damaging events (property damage)") + coord_flip()

As here it may not be perfectly clear which is more damaging as both flood and drought appear - let’s summarize results and put them in one plot.

Calculating the total damage caused by all the events so we can look at both crop and property damage.

total_damage = storm_data %>%
  group_by(EVTYPE) %>%
  summarize(damage = sum(prop_dmg) + sum(crop_dmg)) %>%
  arrange(desc(damage))

Releveling and creating graph.

total_damage$EVTYPE = factor(total_damage$EVTYPE, levels = total_damage$EVTYPE)

ggplot(data = total_damage[1:10,], aes(y = damage, x = EVTYPE)) + 
  geom_bar(stat="identity") + xlab("Event") + ylab("Total damages in USD") +
  ggtitle("Most economically damaging events (total damage)") + coord_flip()

We can clearly see that floods are the most damaging events from all, followed by hurricanes/typhoons and tornadoes.

Results

To summarize the results from both analysis and plots:

  • Most damaging event to population health is tornadoes as it ranks 1st both in terms of fatalities and injuries
  • Most economically damaging event is flood, followed by hurricanes/typhoons and tornadoes.