Synopsis: Severe weather events like tornadoes, floods etc can have great impact on population health as well as cause economic damage. These form key concerns around such severe weather events.

This report uses data from the NOAA storm database comprising data points between the years 1950-2011. In our analysis, we are seeking to find answers to two questions to aid the decision making of government/municipal managers and helping them prepare for such severe weather events.

The two questions are: 1. Across the United States, which types of events are most harmful with respect to population health? 2. Across the United States, which types of events have the greatest economic consequences?

Results: Based on the analysis conducted, it is found that: 1. Among the severe weather events, Tornadoes have the maximum impact on population health which is seen in the numbers of fatalities and injuries caused. 2. Among the severe weather events, Floods have the greatest economic consequences which is seen in the cumulative number of crop and property damage caused.

Let us start by loading required libraries for this assignment.

library(dplyr)
library(magrittr)
library(ggplot2)
library(gridExtra)
library(plotly)
library(knitr)

DATA PROCESSING:

We will now download data file for this assignment and storing its data in a R object “data_storm”.

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "repdata%2Fdata%2FStormData.csv.bz2")
data_storm <- read.csv("repdata%2Fdata%2FStormData.csv.bz2")
##Checking header of the data downloaded
head(data_storm)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6

We will now modify the data by adding two columns which have the numbers for Property and Crop Damage which are calculated based on the values in the columns PROPDMG, PROPDMEXP, CROPDMG and CROPDMGEXP. While doing this, we only consider the rows with PROPDMGEXP/CROPDMGEXP having values among - h/H, k/K, m/M and b/B.

##Calculate Property Damage in Dollars and populate in column PROPERTYDAMAGEAMOUNT
data_storm$PROPERTYDAMAGEAMOUNT <- 0
data_storm[data_storm$PROPDMGEXP == "H"|data_storm$PROPDMGEXP=="h",]$PROPERTYDAMAGEAMOUNT <- data_storm[data_storm$PROPDMGEXP == "H"|data_storm$PROPDMGEXP=="h",]$PROPDMG*10^2
data_storm[data_storm$PROPDMGEXP == "M"|data_storm$PROPDMGEXP=="m",]$PROPERTYDAMAGEAMOUNT <- data_storm[data_storm$PROPDMGEXP == "M"|data_storm$PROPDMGEXP=="m",]$PROPDMG*10^6
data_storm[data_storm$PROPDMGEXP == "B"|data_storm$PROPDMGEXP=="b",]$PROPERTYDAMAGEAMOUNT <- data_storm[data_storm$PROPDMGEXP == "B"|data_storm$PROPDMGEXP=="b",]$PROPDMG*10^9
data_storm[data_storm$PROPDMGEXP == "K"|data_storm$PROPDMGEXP=="k",]$PROPERTYDAMAGEAMOUNT <- data_storm[data_storm$PROPDMGEXP == "K"|data_storm$PROPDMGEXP=="k",]$PROPDMG*10^3

#Calculate Crop Damage in Dollars and populate in column CROPDAMAGEAMOUNT
data_storm$CROPDAMAGEAMOUNT <- 0
data_storm[data_storm$CROPDMGEXP == "K"|data_storm$CROPDMGEXP=="k",]$CROPDAMAGEAMOUNT <- data_storm[data_storm$CROPDMGEXP == "K"|data_storm$CROPDMGEXP=="k",]$CROPDMG*10^3
data_storm[data_storm$CROPDMGEXP == "H"|data_storm$CROPDMGEXP=="h",]$CROPDAMAGEAMOUNT <- data_storm[data_storm$CROPDMGEXP == "H"|data_storm$CROPDMGEXP=="h",]$CROPDMG*10^2
data_storm[data_storm$CROPDMGEXP == "M"|data_storm$CROPDMGEXP=="m",]$CROPDAMAGEAMOUNT <- data_storm[data_storm$CROPDMGEXP == "M"|data_storm$CROPDMGEXP=="m",]$CROPDMG*10^6
data_storm[data_storm$CROPDMGEXP == "B"|data_storm$CROPDMGEXP=="b",]$CROPDAMAGEAMOUNT <- data_storm[data_storm$CROPDMGEXP == "B"|data_storm$CROPDMGEXP=="b",]$CROPDMG*10^9
data_storm$TOTALDAMAGEAMOUNT <- data_storm$PROPERTYDAMAGEAMOUNT + data_storm$CROPDAMAGEAMOUNT

We will now proceed to answer the questions based on the data prepared.

Across the United States, which types of events are most harmful with respect to population health?

##Creating datasets on fatalities and injuries grouped by type of events
impactofevents_fatalities <- data_storm %>% group_by(EVTYPE) %>% summarize(TOTALFATALITIES = sum(FATALITIES)) %>% arrange(desc(TOTALFATALITIES))
impactofevents_injuries <- data_storm %>% group_by(EVTYPE) %>% summarize(TOTALINJURIES = sum(INJURIES)) %>% arrange(desc(TOTALINJURIES))

##Modifying the datasets so that the values are plotted in descending order in the graphs that will be plotted
impactofevents_fatalities$EVTYPE <- factor(impactofevents_fatalities$EVTYPE, levels = impactofevents_fatalities$EVTYPE[order(desc(impactofevents_fatalities$TOTALFATALITIES))])
impactofevents_injuries$EVTYPE <- factor(impactofevents_injuries$EVTYPE, levels = impactofevents_injuries$EVTYPE[order(desc(impactofevents_injuries$TOTALINJURIES))])

Now, we proceed to plot graphs for the Top 10 severe weather events (in terms of injuries and fatalities caused).

##Creating plots to show most harmful events for Population Health
plot_fatalities <- ggplot(impactofevents_fatalities[1:10,], aes(x = EVTYPE, y = TOTALFATALITIES)) + ggtitle("Events - Maximum Fatalities") + geom_col(fill = "blue") +  theme(axis.text.x = element_text(angle = 90, size = 10), plot.title = element_text(hjust = 0.5,  colour = "black", size = 14), axis.title= element_text(colour = "purple", face = "bold", size = 12), axis.text = element_text(size = 10, hjust = 0)) + ylab("Total Fatalities") + xlab("Type of Event")
plot_injuries <- ggplot(impactofevents_injuries[1:10,], aes(x = EVTYPE, y = TOTALINJURIES)) + ggtitle("Events - Maximum Injuries") + geom_col(fill = "blue") +  theme(axis.text.x = element_text(angle = 90, size = 10), plot.title = element_text(hjust = 0.5,  colour = "black", size = 14), axis.title= element_text(colour = "purple", face = "bold", size = 12), axis.text = element_text(size = 10, hjust = 0)) + ylab("Total Injuries") + xlab("Type of Event")

##Plotting the two graphs side by side
grid.arrange(plot_fatalities, plot_injuries, ncol = 2, top = "Most Harmful Events for Population Health")

By seeing the above graphs, we can see that among severe weather events, Tornadoes have the highest impact on population health with maximum injuries and fatalities.

We proceed to the next question now.

Across the United States, which types of events have the greatest economic consequences?

##Creating datasets on fatalities and injuries grouped by type of events
impactofevents_damage <- data_storm %>% group_by(EVTYPE) %>% summarize(TOTALPROPERTYDAMAGEAMOUNT=sum(PROPERTYDAMAGEAMOUNT), TOTALCROPDAMAGEAMOUNT = sum(CROPDAMAGEAMOUNT), OVERALLDAMAGEAMOUNT = sum(TOTALDAMAGEAMOUNT)) %>% arrange(desc(OVERALLDAMAGEAMOUNT))

##Modifying the datasets so that the values are plotted in descending order in the graphs that will be plotted
impactofevents_damage$EVTYPE <- factor(impactofevents_damage$EVTYPE, levels = impactofevents_damage$EVTYPE[order(desc(impactofevents_damage$OVERALLDAMAGEAMOUNT))])

Now, we will plot a stacked bar chart for Crop and Property Damage for Top 10 severe weather events with high total damage.

plot_ly(data=impactofevents_damage[1:10,], x = ~EVTYPE, y = ~TOTALPROPERTYDAMAGEAMOUNT, type = 'bar', name = 'Property Damage') %>%    add_trace(y = ~TOTALCROPDAMAGEAMOUNT, name = 'Crop Damage') %>%   layout(yaxis = list(title = 'Total Damage in US Dollars'), xaxis=list(title='Type of Event'),barmode = 'stack')

We see from the chart that among severe weather events, Flood has the greatest economic consequences with maximum damage incurred in monetary terms.We also see something else that is interesting - among other severe weather events with economic impact in Top 10 group, Drought has the highest proportion of damage coming from Crop Damage.