Synopsis

This study focuses in determining which natural disaster has had the most negative impact in the United States. We take into account two variables from the Storm database from the National Weather Service: deaths and damage to property. The results show that tornadoes have the highest death toll of all and in 45 out of the past 61 years they have caused the most deaths. Although in recent years deaths by heat waves have increased. On the other hand, natural disaster caused by hurricanes and heavy storms cause the most damage to property.

Data Processing

I set up the r code for saving the CACHE and turn outdependencies on so that I can write different pieces of code more fluently. Warning and messages are dissabled to that it does not create noise when printing results.

library(knitr)
# set global chunk options: 
opts_chunk$set(cache=TRUE, autodep = TRUE,warning=FALSE,message=FALSE)

Processing

Since this is a big dataset I decided to just keep the columns I was interested in to get preliminary results. Then in order to have all the damages in the same values I created a table based on the exponential factors presented in the data so that I could left join them and multiply them by the correct exponent. I deleted some columns that we usedfull for the transformation in order to get the dataset I would finally use which is the data frame called df.

#Load necessary packages for manipulating and visualizing data. 
library(readr)
library(ggplot2)
library(dplyr)
library(lubridate)
library(htmlTable)
library(htmltools)

#Read the Data
storm=read_csv("storm.bz2")

storm$CROPDMGEXP=as.character(storm$CROPDMGEXP)

#Keep useful information
storm=storm %>% select(c(2,3,"STATE","EVTYPE","FATALITIES","INJURIES","CROPDMG","CROPDMGEXP","PROPDMG","PROPDMGEXP"))

#Create dataframe with exponent variables to have standarized values (thousands, millions, billions)
exp=data.frame(letter=c("K","M","B"),number=c(1000,1000000,1000000000))

#Left join the data and rename the columns to process the number
storm=left_join(storm,exp,by=c("PROPDMGEXP"="letter"))
storm=storm %>% rename(prop=number)
storm=left_join(storm,exp,by=c("CROPDMGEXP"="letter"))
storm=storm %>% rename(crop=number)
storm$date=as.Date(storm$BGN_DATE,"%m/%d/%Y 0:00:00")
storm=storm %>% select(c(ncol(storm),3:(ncol(storm)-1)))

storm=storm %>% mutate(crop=CROPDMG*crop,prop=PROPDMG*prop) %>% select(-c(6:9))

#summarize the data by year and climate type
df=storm %>% group_by(year(date),EVTYPE) %>% summarise(death=sum(FATALITIES,na.rm = T),damage=sum(prop,na.rm = T)) 

df=df %>% rename(Year=`year(date)`)

Analysis of the sample

I wanted to know which event caused the most deaths and cost more to the country per year. This helped create the frequency tables and determine the natural disaster with the greater impact.

Then I also looked for top 5 events with the most deaths and with the higher damages in order to get a better insight.

#get which event produced and most death and the most economic impact per year. To see if there is any event that calls for more detailed attention.
mostfatal= df %>% group_by(Year) %>% slice_max(death,n=1,with_ties = TRUE)
mostcost= df %>% group_by(Year) %>% slice_max(damage,n=1,with_ties = TRUE)

totaldeaths= df %>% group_by(EVTYPE) %>% summarize(deaths=sum(death)) %>% slice_max(deaths,n=5,with_ties=TRUE)%>% arrange(desc(deaths))

totaldamage= df %>% group_by(EVTYPE) %>% summarize(damages=sum(damage)) %>% slice_max(damages,n=5,with_ties=TRUE) %>% arrange(desc(damages))

deadbytype=as.data.frame(table(mostfatal$EVTYPE)) %>% arrange(desc(Freq))%>% rename("Natural disaster"=Var1,"Frequency"=Freq)

damagebytype=as.data.frame(table(mostcost$EVTYPE))%>% arrange(desc(Freq)) %>% rename("Natural disaster"=Var1,"Frequency"=Freq)

Results

For the result I created 2 graphs using ggplot and four html tables. this results are summarized in the synopsys of this document.

FLOOD, HURRICANE/TYPHOON, TORNADO, STORM SURGE, FLASH FLOOD account for damages totalling $330,353,607,790.

ggplot(totaldeaths,aes(reorder(EVTYPE,-deaths),deaths,fill=EVTYPE))+geom_col(show.legend = FALSE)+xlab("Natural Disaster")+labs(title="Top 5 deaths by Natural Disaster",subtitle ="1950 - 2011")

ggplot(totaldamage,aes(reorder(EVTYPE,-damages),damages,fill=EVTYPE))+geom_col(show.legend = FALSE)+xlab("Natural Disaster")+ylab("Damage")+labs(title="Top 5 costs by Natural Disaster",subtitle ="1950 - 2011")

totaldamage$damages=scales::dollar(totaldamage$damages)
htmlTable(totaldamage)
EVTYPE damages
1 FLOOD $144,657,709,800
2 HURRICANE/TYPHOON $69,305,840,000
3 TORNADO $56,925,660,480
4 STORM SURGE $43,323,536,000
5 FLASH FLOOD $16,140,861,510
htmlTable(totaldeaths)
EVTYPE deaths
1 TORNADO 5633
2 EXCESSIVE HEAT 1903
3 FLASH FLOOD 978
4 HEAT 937
5 LIGHTNING 816

How many times an event cause the most death in a year?

htmlTable(deadbytype)
Natural disaster Frequency
1 TORNADO 45
2 EXCESSIVE HEAT 8
3 FLASH FLOOD 4
4 BLIZZARD 1
5 HEAT 1
6 LIGHTNING 1
7 RIP CURRENT 1
8 TSTM WIND 1

How many times an event cause the most damage in a year?

htmlTable(damagebytype)
Natural disaster Frequency
1 TORNADO 47
2 HURRICANE 3
3 FLOOD 2
4 HAIL 2
5 HURRICANE/TYPHOON 2
6 HURRICANE OPAL 1
7 STORM SURGE/TIDE 1
8 TROPICAL STORM 1
9 WILD/FOREST FIRE 1
10 WILDFIRE 1
11 WINTER STORM 1