Summary

In this document, we analyse Storm-related data provided by the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. We analyze provided data in order to track the storm events with the worst impact respectively on population health and on the economy. We conclude that while tornados are the events that cause the most of the injuries and fatalities, floods are the events that lead to the highest economic damages. Next, we present the different steps required for data processing and we present the main results that we have reached.

Data Processing

We start by unzipping the original file and placing the obtained .csv file in workplace directory. We then load/read the .csv file.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(grid)
StormData <- read.csv("repdata-data-StormData.csv")

For assesing the health impact of storms, we filter from the storm data the “EVTYPE”, “FATALITIES” and “INJURIES” variables corresponding to the type of a given storm event and the amount of deaths and injuries caused by this event. Subsetted data is grouped based on event type and then summarized such as we get, for each event type, the total amount of deaths and injuries caused by this event type. Summarized data is then ordered per decreasing number of injuries and fatalities, respectively. The goal is to have the event types causing the most health damage at the top of the so-obtained matrices.

library(dplyr)
StormData <- read.csv("repdata-data-StormData.csv")
healthset <- subset(StormData,select=c(EVTYPE,FATALITIES,INJURIES))
groupedSet <- group_by(healthset,EVTYPE)
SummaryData <- summarise_each(groupedSet,funs(sum))
orderfatal <- SummaryData[order(SummaryData$FATALITIES,decreasing = TRUE),]
orderInjury <- SummaryData[order(SummaryData$INJURIES,decreasing = TRUE),]

For assessing the economic impact of storms, we filter the storm events which property damages are estimated to be in Billions of dollars. We do so by using the “PROPDMGEXP” variable which points to the unit used for estimating the property damage “PROPDMG” of a given storm event. After filtering the events which property damages are in Billions of dollars, we select the “EVTYPE” and “PROPDMG” variables associated to these events. Once done, we group selected events based on event type and we calculate the total property damage value caused by each event type. Summarized data is then ordered such as the event types causing the most property damage in economic terms are at the top of the so-obtained matrix. Please note that we did not consider crop damage in our analysis since a summary analysis of the data shows us that the property damage impact on economy is much more significant. As a consequence, we assume that property damage is quite representative of the overall economic impact of storm events.

economyset <- subset(StormData,PROPDMGEXP=="B",select = c(EVTYPE,PROPDMG))
groupedPropSet <- group_by(economyset,EVTYPE)
PropDmgSummary <- summarise_each(groupedPropSet,funs(sum))
orderPropDmg <- PropDmgSummary[order(PropDmgSummary$PROPDMG,decreasing = TRUE),]

Results

In this section, we show the main results obtained when displaying the data processed in the past section. We start by selecting the five event types causing the highest numbers of injuries and deaths, respectively. Obtained results are displayed in below figure.

library(ggplot2)
library(grid)
fatalevents <- head(orderfatal, n = 5)
Injuryevents <- head(orderInjury, n = 5)
g <- ggplot(fatalevents, aes(x = EVTYPE,y = FATALITIES))
g <- g + geom_bar(stat="identity")+ labs(title= "Most Harmful Storm events impact on population health") + labs(y="Nb of Fatalities") + labs(x="") 
g2 <- ggplot(Injuryevents, aes(x = EVTYPE,y = INJURIES))
g2 <- g2 + geom_bar(stat="identity")+ labs(y="Nb of Injuries") + labs(x="") 
grid.newpage()
grid.draw(rbind(ggplotGrob(g), ggplotGrob(g2), size = "last"))

As can be noticed, Tornados are the events with the highest negative impact on population health as these are the events causing the highest number of population injuries and fatalities, respectively. In terms of number of caused fatalities, excessive heat, heat, flash flood and lightning come after Tornados. In terms of caused injuries, flood, TSTM wind, excessive heat and lightning come after Tornados. As a generic conclusion, tornados, excessive heat and lightning can be considered as the storm events with the most negative impact on health.

We then select the five event types causing the most property damage in economic terms. We display obtained results in figure below.

economicevents <- head(orderPropDmg, n=5)
g3 <- ggplot(economicevents, aes(x = EVTYPE,y =PROPDMG))
g3 <- g3 + geom_bar(stat="identity")+ labs(title= "Most Harmful Storm events impact on Economy") + labs(y="Economic Damage in Billion Dollars") + labs(x="") 
print(g3)

As can be noticed, Floods cause the highest economic damage, estimated to be around 125 Billions of dollars, followed by Typhoons, storm surges, hurricanes and Tornados.