Synopsis
Severe weather events that occurred across the U.S. between 1993 and 2011 had important impact on both the population’s health and the economy.
The data recorded by NOAA in their Storm Database show that, during the aforementioned period, the US population’s health has been mostly affected by tornados, excessive heat then by floods while the economy has been mostly affected by floods, hurricanes/typhoons then by storm surges.
The combined human and economic cost from floods and hurricanes/typhoons/tornados is the highest.
Data Processing
Load required packages
library(dplyr)
library(tidyr)
library(ggplot2)
library(ggsci)
library(reshape)
Load the NOAA data in R environment
StormData <- read.csv("repdata_data_StormData.csv")
Main Data
We will first tidy the main dataset : we want to get rid of data from early years, considering that there were initially fewer events recorded as well as a lack of proper recordings.
Update the date format of BGN_DATE so that it only shows the year when each event occurred:
StormData$BGN_DATE <- format(as.Date(StormData$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"),"%Y")
Design a histogram of number of events per year:
hist(as.numeric(StormData$BGN_DATE), main = "Nb of events per Year",xlab = "Year", col = "grey", breaks = seq(1950,2011,1))

In order to focus only on representative data, I choose to focus on years 1993 to 2011 only as before 1993 we had fewer events recorded.
StormData_Tidy <- StormData[as.numeric(StormData$BGN_DATE)>=1993,]
I will now subset the main data set in two lighter ones focused on relevant variables for each question.
Economic Impact Related Data
The second one related to Economical impact will comprise the Event Type (“EVTYPE”), value of property damages (“PROPDMG”), Exponents for these values (“PROPDMGEXP”), value of crop damages (“CROPDMG”) and exponents for these values (“CROPDMGEXP”)
StormData_Econ <- select(StormData_Tidy, EVTYPE, PROPDMG:CROPDMGEXP)
Let’s first focus on the exponents variables (PROPDMGEXP and CROPDMGEXP).
Check Exponents list for Property and Crop Damages
unique(StormData_Econ$PROPDMGEXP)
## [1] "" "B" "K" "M" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
unique(StormData_Econ$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
Modify all letters to caps to facilitate the next step.
StormData_Econ$PROPDMGEXP <- toupper(StormData_Econ$PROPDMGEXP)
StormData_Econ$CROPDMGEXP <- toupper(StormData_Econ$CROPDMGEXP)
We will use the following Multipliers for each corresponding Exponent:
“-”, “?”, “” = 0
“0” to “8”, “+” = 1
“H” = 100
“K” = 1000
“M” = 1000000
“B” = 1000000000
StormData_Econ$PROPDMGEXP[StormData_Econ$PROPDMGEXP == "-"] <- 0
StormData_Econ$PROPDMGEXP[StormData_Econ$PROPDMGEXP == "?"] <- 0
StormData_Econ$PROPDMGEXP[StormData_Econ$PROPDMGEXP == ""] <- 0
StormData_Econ$PROPDMGEXP[!StormData_Econ$PROPDMGEXP %in% c("K", "M", "B", "-", "?", "")] <- 1
StormData_Econ$PROPDMGEXP[StormData_Econ$PROPDMGEXP == "H"] <- 100
StormData_Econ$PROPDMGEXP[StormData_Econ$PROPDMGEXP == "K"] <- 1000
StormData_Econ$PROPDMGEXP[StormData_Econ$PROPDMGEXP == "M"] <- 1000000
StormData_Econ$PROPDMGEXP[StormData_Econ$PROPDMGEXP == "B"] <- 1000000000
StormData_Econ$CROPDMGEXP[StormData_Econ$CROPDMGEXP == ""] <- 0
StormData_Econ$CROPDMGEXP[StormData_Econ$CROPDMGEXP == "?"] <- 0
StormData_Econ$CROPDMGEXP[!StormData_Econ$CROPDMGEXP %in% c("K", "M", "B", "?", "")] <- 1
StormData_Econ$CROPDMGEXP[StormData_Econ$CROPDMGEXP == "K"] <- 1000
StormData_Econ$CROPDMGEXP[StormData_Econ$CROPDMGEXP == "M"] <- 1000000
StormData_Econ$CROPDMGEXP[StormData_Econ$CROPDMGEXP == "B"] <- 1000000000
Modified columns are converted to numeric values and the Property and Crop Damage costs are calculated by multiplying the value column by the recently modified exponent column.
StormData_Econ$PROPDMGEXP <- as.numeric(StormData_Econ$PROPDMGEXP)
StormData_Econ$CROPDMGEXP <- as.numeric(StormData_Econ$CROPDMGEXP)
StormData_Econ <- StormData_Econ %>%
mutate(StormData_Econ,
Prop_Damages_Costs = PROPDMG*PROPDMGEXP,
Crop_Damages_Costs = CROPDMG*CROPDMGEXP,
CROPDMG = NULL, CROPDMGEXP = NULL, PROPDMG =NULL, PROPDMGEXP = NULL
)
We will now further tidy the economical impact related dataset. After grouping the set by type of event, we will summarize it by total cost of property damages and crop damages for each type of event, then adding a third column (property + injuries costs per type of event) based on which we will pick the top ten events to allow us building a readable plot for our analysis. All costs translated in million USD.
StormData_Econ_Tidy <- StormData_Econ %>%
group_by(EVTYPE) %>%
summarize(Prop_Damages = sum(Prop_Damages_Costs / 1000000), Crop_Damages = sum(Crop_Damages_Costs / 1000000), Total_Impact_in_MUSD = sum(Prop_Damages_Costs/1000000) + sum(Crop_Damages_Costs/1000000)) %>%
arrange(desc(Total_Impact_in_MUSD)) %>% slice(1:10)
Results
Most Harmful Events with Respect to Population Health
StormData_Health_Tidy <- pivot_longer(StormData_Health_Tidy, cols = 2:3, names_to = "Impact_per_Damage_Type", values_to = "Occurrence")
ggplot(StormData_Health_Tidy, aes(x = reorder(EVTYPE, -Occurrence), y = Occurrence, fill = Impact_per_Damage_Type)) +
geom_bar(position = "dodge", stat = "identity") +
scale_fill_npg() +
theme(axis.text.y = element_text(size = 6)) +
labs(color = "Damage Type") +
labs (title = "Most Harmful Events with Respect to Population Health", x = "Type of Event") +
guides(fill = guide_legend("Damage Type")) +
coord_flip()

The highest impact of severe weather events between 1993 and 2011 across the US on health is by far due to tornado (23,310 injuries and 1,621 fatalities), followed by excessive heats (6,525 injuries and 1,903 fatalities), then by floods (6,789 injuries and 470 fatalities).
Events Having the Greatest Economic Impact
StormData_Econ_Tidy <- select(StormData_Econ_Tidy, EVTYPE, Prop_Damages, Crop_Damages)
StormData_Econ_Tidy <- pivot_longer(StormData_Econ_Tidy, cols = 2:3, names_to = "Impact_per_Damage_Type", values_to = "Cost")
ggplot(StormData_Econ_Tidy, aes(x = reorder(EVTYPE, -Cost), y = Cost, fill = Impact_per_Damage_Type)) +
geom_bar(position = "dodge", stat = "identity") +
ylab("Cost in Million USD") +
scale_fill_npg() +
theme(axis.text.y = element_text(size = 6)) +
labs(color = "Damage Type") +
labs (title = "Most Harmful Events with Respect to Economic Impact", x = "Type of Event") +
guides(fill = guide_legend("Damage Type")) +
coord_flip()

The highest economic impact of severe weather events between 1993 and 2011 across the US on properties is by far due to floods (144,657 million USD), followed by hurricanes/typhoons (69,305 million USD) then by storm surges (43,323 million USD).
The highest impact on crops are due first to droughts (13,972 million USD), followed by floods (5,661 million USD) then by river floods (5,029 million USD)