Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

1. Synopsis

The purpose of this analysis is to identify the extreme weather events causing the most damage in terms of health damage (fatalities and injuries) and in terms of material damage (property damage and crop damage). As it will be seens, tornados are the extreme events causing the most damages in the US both in terms of health and material damage

2. Data loading and processing

if(require(dplyr)==FALSE)(install.packages("dplyr")); library(dplyr)
if(require(ggplot2)==FALSE)(install.packages("ggplot2")); library(ggplot2)
if(require(reshape2)==FALSE)(install.packages("reshape2")); library(reshape2)

2.1 Loading the Data from the Data folder

        data <- read.csv("Data/repdata-data-StormData.csv.bz2")
        names(data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

2.2 Selecting the variables we need

As we see we have 37 variables in the dataset. However we’ll only need a subset of them. Therefore let’s create a smaller dataset containing only the variables we need i.e. EVTYPE, FATALITIES, INJURIES, PROPDMG and CROPDMG. Let’s rename it using camelCase convention

data <- data %>% select(eventType = EVTYPE, fatalities = FATALITIES, injuries = INJURIES, propertyDamage = PROPDMG, cropDamage = CROPDMG)
head(data)
##   eventType fatalities injuries propertyDamage cropDamage
## 1   TORNADO          0       15           25.0          0
## 2   TORNADO          0        0            2.5          0
## 3   TORNADO          0        2           25.0          0
## 4   TORNADO          0        2            2.5          0
## 5   TORNADO          0        2            2.5          0
## 6   TORNADO          0        6            2.5          0

To ease the process of analysing the data but also when using ggplot2, let’s restrucutre the data using the function * melt *. Indeed the different type of damage are just values of cateogrical variable. The resulting dataset will contain 3 variables: - eventType - categoryOfDamage - valueOfTheDamage

data2 <- melt(data, id=c('eventType'), measure.vars = c('fatalities', 'injuries','cropDamage', 'propertyDamage'), variable.name = "categoryOfDamage", value.name = "valueOfTheDamage")

3. Results

Before answering the questions let’s compare the damages per event type X damage category In order to get an idea of the damages caused across the four damage categories we first compute the average and sum of damage value per event type per damage category.

- Per Event X Category
DamEventCat <- data2 %>% group_by(eventType,categoryOfDamage) %>%   summarize(mean = mean(valueOfTheDamage, na.rm = TRUE),sum=sum(valueOfTheDamage))
head(DamEventCat)
## # A tibble: 6 x 4
## # Groups:   eventType [2]
##   eventType               categoryOfDamage  mean   sum
##   <fct>                   <fct>            <dbl> <dbl>
## 1 "   HIGH SURF ADVISORY" fatalities           0     0
## 2 "   HIGH SURF ADVISORY" injuries             0     0
## 3 "   HIGH SURF ADVISORY" cropDamage           0     0
## 4 "   HIGH SURF ADVISORY" propertyDamage     200   200
## 5 " COASTAL FLOOD"        fatalities           0     0
## 6 " COASTAL FLOOD"        injuries             0     0

3.1 Which types of events are most harmful with respect to population health?

To answer that quesiton we’ll select the 10 events with the most fatalities/injuries

First, let’s calculate the top10

topDamEventCat <- DamEventCat[DamEventCat$categoryOfDamage == "fatalities" | DamEventCat$categoryOfDamage == "injuries",] %>% group_by(categoryOfDamage) %>% top_n(n = 10, wt = sum) %>% data.frame()
## Warning: package 'bindrcpp' was built under R version 3.3.2
ggplot(data = topDamEventCat[topDamEventCat$categoryOfDamage == "fatalities",], aes(reorder(eventType,sum), y=sum)) +
        geom_bar(stat ="identity", position = "dodge") +
        theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
        coord_flip() +
        labs(x = "Type of extreme event", y = "Sum", title = "Top 10 fatalitites of extreme weather events in the USA")

The barplot shows that the events causing the most fatalities are in descending order tornados, excessive heat, and fflash flood.

ggplot(data = topDamEventCat[topDamEventCat$categoryOfDamage == "injuries",], aes(reorder(eventType,sum), y=sum)) +
        geom_bar(stat ="identity", position = "dodge") +
        theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
        coord_flip() +
        labs(x = "Type of extreme event", y = "Sum", title = "Top 10 injuries of extreme weather events in the USA")

The events causing the most injuries are in descending order Tornadoes, TSTM wind and hail and Floods

3.2 Which types of events have the greatest economic conseauences?

To answer that quesiton we’ll select the 10 events with the most crop/property damages.

First, let’s calculate the top10

topDamEventCat2 <- DamEventCat[DamEventCat$categoryOfDamage == "cropDamage" | DamEventCat$categoryOfDamage == "propertyDamage",] %>% group_by(categoryOfDamage) %>% top_n(n = 10, wt = sum) %>% data.frame()
ggplot(data = topDamEventCat2, aes(reorder(eventType,sum), y=sum)) +
        geom_bar(stat ="identity", position = "dodge") +
        facet_grid(categoryOfDamage~.)+
        theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
        coord_flip() +
        labs(x = "Type of extreme event", y = "Sum", title = "Top 10 crop/properties damages of extreme weather events in the US")

The barplot shows that the events causing the most fatalities are in descending order hail, flash floods, and **flood*. The barplot shows that the events causing the most fatalities are in descending order tornados, flash floods, and tstm wind.