Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data Processing

The first step is to download the database

url<- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destfile<- "C:/Users/Ricardo Carranza/Desktop/20200713 - Rcarranza/documentos/Data Science/Coursera/Reproducible Research/Course Project 2/sd.csv"
download.file(url,destfile)
df<-read.csv("sd.csv")

Questions to answer:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

Health Impact

The two variables considered to analyze which event causes the most harm to population health were fatalities and injuries.

library(dplyr)
df.fatalities<- df %>% select(EVTYPE, FATALITIES, INJURIES) %>% group_by(EVTYPE) %>% summarise(total.fatalities=sum(FATALITIES), total.injuries = sum(INJURIES)) %>% arrange(-total.fatalities, -total.injuries) %>% mutate(Total=total.fatalities+total.injuries)
head(df.fatalities,10)
## # A tibble: 10 x 4
##    EVTYPE         total.fatalities total.injuries Total
##    <chr>                     <dbl>          <dbl> <dbl>
##  1 TORNADO                    5633          91346 96979
##  2 EXCESSIVE HEAT             1903           6525  8428
##  3 FLASH FLOOD                 978           1777  2755
##  4 HEAT                        937           2100  3037
##  5 LIGHTNING                   816           5230  6046
##  6 TSTM WIND                   504           6957  7461
##  7 FLOOD                       470           6789  7259
##  8 RIP CURRENT                 368            232   600
##  9 HIGH WIND                   248           1137  1385
## 10 AVALANCHE                   224            170   394

Economic Impact

The base data categorize the economic impact in two columns: - property damage (PROPDMG) - crop damage (CROPDMG).

The total damage caused by each event type is calculated with the following code:

df.damage <- df %>% select(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

Symbol<- sort(unique(as.character(df.damage$PROPDMGEXP)))
Multiplier<- c(0,0,0,1,10,10,10,10,10,10,10,10,10,10^9,10^2,10^2,10^3,10^6,10^6)
convert.Multiplier<- data.frame(Symbol, Multiplier)

df.damage$Prop.Multiplier<- convert.Multiplier$Multiplier[match(df.damage$PROPDMGEXP, convert.Multiplier$Symbol)]
df.damage$Crop.Multiplier<- convert.Multiplier$Multiplier[match(df.damage$PROPDMGEXP, convert.Multiplier$Symbol)]

df.damage<- df.damage %>% mutate (PROPDMG = PROPDMG*Prop.Multiplier) %>% mutate(CROPDMG = CROPDMG*Crop.Multiplier) %>% mutate(TOTAL.DMG = PROPDMG+CROPDMG)

df.damage.total <- df.damage %>% group_by(EVTYPE) %>% summarize(TOTAL.DMG.EVTYPE = sum(TOTAL.DMG)) %>% arrange(-TOTAL.DMG.EVTYPE)

head(df.damage.total,10)
## # A tibble: 10 x 2
##    EVTYPE            TOTAL.DMG.EVTYPE
##    <chr>                        <dbl>
##  1 HURRICANE             814750235010
##  2 HURRICANE/TYPHOON     802074291330
##  3 FLOOD                 231909682070
##  4 TORNADO                85207035607
##  5 FLASH FLOOD            54962957791
##  6 STORM SURGE            43328536000
##  7 HAIL                   31046432377
##  8 HIGH WIND              12444111890
##  9 TSTM WIND              12169568890
## 10 WILDFIRE               11938922200

Results

Health Impact

The events with highest health impact effects are shown below

library(ggplot2)
g <- ggplot(df.fatalities[1:10,], aes(x=reorder(EVTYPE,-Total), y=Total))+geom_bar(stat = "identity", fill="steelblue")+ geom_text(aes(label=Total), vjust=-0.1, color="black",size=3.5)+ theme(axis.text.x = element_text(angle=90, vjus=0.5,hjust=1))+ggtitle("Events with Highest Health Impacts")+labs(x="EVENT TYPE", y="Total Health Impact")
g

As shown in the graph, the events that cause the highest health impacts are the tornados

Economic Impact

g <- ggplot(df.damage.total[1:10,], aes(x=reorder(EVTYPE, - TOTAL.DMG.EVTYPE), y=TOTAL.DMG.EVTYPE)) + geom_bar(stat = "identity")+ theme(axis.text.x = element_text(angle=90, vjust = 0.5, hjust = 1)) + ggtitle("Events with Highest Economic Impact")+ labs(x="Event Type", y="Total Economic Impact ($USD)")
g

As shown, the events that causes the major economic impacts are hurricanes and typhoons