Synopsis

Severe weather events that occurred across the U.S. between 1993 and 2011 had important impact on both the population’s health and the economy.

The data recorded by NOAA in their Storm Database show that, during the aforementioned period, the US population’s health has been mostly affected by tornados, excessive heat then by floods while the economy has been mostly affected by floods, hurricanes/typhoons then by storm surges.

The combined human and economic cost from floods and hurricanes/typhoons/tornados is the highest.

Data Processing

Load required packages

library(dplyr)
library(tidyr)
library(ggplot2)
library(ggsci)
library(reshape)

Load the NOAA data in R environment

StormData <- read.csv("repdata_data_StormData.csv")

Main Data

We will first tidy the main dataset : we want to get rid of data from early years, considering that there were initially fewer events recorded as well as a lack of proper recordings.

Update the date format of BGN_DATE so that it only shows the year when each event occurred:

StormData$BGN_DATE <- format(as.Date(StormData$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"),"%Y")

Design a histogram of number of events per year:

hist(as.numeric(StormData$BGN_DATE), main = "Nb of events per Year",xlab = "Year", col = "grey", breaks = seq(1950,2011,1))

In order to focus only on representative data, I choose to focus on years 1993 to 2011 only as before 1993 we had fewer events recorded.

StormData_Tidy <- StormData[as.numeric(StormData$BGN_DATE)>=1993,]

I will now subset the main data set in two lighter ones focused on relevant variables for each question.

Results

Most Harmful Events with Respect to Population Health

StormData_Health_Tidy <- pivot_longer(StormData_Health_Tidy, cols = 2:3, names_to = "Impact_per_Damage_Type", values_to = "Occurrence")
ggplot(StormData_Health_Tidy, aes(x = reorder(EVTYPE, -Occurrence), y = Occurrence, fill = Impact_per_Damage_Type)) +
  geom_bar(position = "dodge", stat = "identity") +
  scale_fill_npg() +
  theme(axis.text.y = element_text(size = 6)) +
  labs(color = "Damage Type") +
  labs (title = "Most Harmful Events with Respect to Population Health", x = "Type of Event") +
  guides(fill = guide_legend("Damage Type")) +
  coord_flip()

The highest impact of severe weather events between 1993 and 2011 across the US on health is by far due to tornado (23,310 injuries and 1,621 fatalities), followed by excessive heats (6,525 injuries and 1,903 fatalities), then by floods (6,789 injuries and 470 fatalities).

Events Having the Greatest Economic Impact

StormData_Econ_Tidy <- select(StormData_Econ_Tidy, EVTYPE, Prop_Damages, Crop_Damages)
StormData_Econ_Tidy <- pivot_longer(StormData_Econ_Tidy, cols = 2:3, names_to = "Impact_per_Damage_Type", values_to = "Cost")
ggplot(StormData_Econ_Tidy, aes(x = reorder(EVTYPE, -Cost), y = Cost, fill = Impact_per_Damage_Type)) +
  geom_bar(position = "dodge", stat = "identity") +
  ylab("Cost in Million USD") +
  scale_fill_npg() +
  theme(axis.text.y = element_text(size = 6)) +
  labs(color = "Damage Type") +
  labs (title = "Most Harmful Events with Respect to Economic Impact", x = "Type of Event") +
  guides(fill = guide_legend("Damage Type")) +
  coord_flip()

The highest economic impact of severe weather events between 1993 and 2011 across the US on properties is by far due to floods (144,657 million USD), followed by hurricanes/typhoons (69,305 million USD) then by storm surges (43,323 million USD).

The highest impact on crops are due first to droughts (13,972 million USD), followed by floods (5,661 million USD) then by river floods (5,029 million USD)