This study focuses in determining which natural disaster has had the most negative impact in the United States. We take into account two variables from the Storm database from the National Weather Service: deaths and damage to property. The results show that tornadoes have the highest death toll of all and in 45 out of the past 61 years they have caused the most deaths. Although in recent years deaths by heat waves have increased. On the other hand, natural disaster caused by hurricanes and heavy storms cause the most damage to property.
I set up the r code for saving the CACHE and turn outdependencies on so that I can write different pieces of code more fluently. Warning and messages are dissabled to that it does not create noise when printing results.
library(knitr)
# set global chunk options:
opts_chunk$set(cache=TRUE, autodep = TRUE,warning=FALSE,message=FALSE)
Since this is a big dataset I decided to just keep the columns I was interested in to get preliminary results. Then in order to have all the damages in the same values I created a table based on the exponential factors presented in the data so that I could left join them and multiply them by the correct exponent. I deleted some columns that we usedfull for the transformation in order to get the dataset I would finally use which is the data frame called df.
#Load necessary packages for manipulating and visualizing data.
library(readr)
library(ggplot2)
library(dplyr)
library(lubridate)
library(htmlTable)
library(htmltools)
#Read the Data
storm=read_csv("storm.bz2")
storm$CROPDMGEXP=as.character(storm$CROPDMGEXP)
#Keep useful information
storm=storm %>% select(c(2,3,"STATE","EVTYPE","FATALITIES","INJURIES","CROPDMG","CROPDMGEXP","PROPDMG","PROPDMGEXP"))
#Create dataframe with exponent variables to have standarized values (thousands, millions, billions)
exp=data.frame(letter=c("K","M","B"),number=c(1000,1000000,1000000000))
#Left join the data and rename the columns to process the number
storm=left_join(storm,exp,by=c("PROPDMGEXP"="letter"))
storm=storm %>% rename(prop=number)
storm=left_join(storm,exp,by=c("CROPDMGEXP"="letter"))
storm=storm %>% rename(crop=number)
storm$date=as.Date(storm$BGN_DATE,"%m/%d/%Y 0:00:00")
storm=storm %>% select(c(ncol(storm),3:(ncol(storm)-1)))
storm=storm %>% mutate(crop=CROPDMG*crop,prop=PROPDMG*prop) %>% select(-c(6:9))
#summarize the data by year and climate type
df=storm %>% group_by(year(date),EVTYPE) %>% summarise(death=sum(FATALITIES,na.rm = T),damage=sum(prop,na.rm = T))
df=df %>% rename(Year=`year(date)`)
I wanted to know which event caused the most deaths and cost more to the country per year. This helped create the frequency tables and determine the natural disaster with the greater impact.
Then I also looked for top 5 events with the most deaths and with the higher damages in order to get a better insight.
#get which event produced and most death and the most economic impact per year. To see if there is any event that calls for more detailed attention.
mostfatal= df %>% group_by(Year) %>% slice_max(death,n=1,with_ties = TRUE)
mostcost= df %>% group_by(Year) %>% slice_max(damage,n=1,with_ties = TRUE)
totaldeaths= df %>% group_by(EVTYPE) %>% summarize(deaths=sum(death)) %>% slice_max(deaths,n=5,with_ties=TRUE)%>% arrange(desc(deaths))
totaldamage= df %>% group_by(EVTYPE) %>% summarize(damages=sum(damage)) %>% slice_max(damages,n=5,with_ties=TRUE) %>% arrange(desc(damages))
deadbytype=as.data.frame(table(mostfatal$EVTYPE)) %>% arrange(desc(Freq))%>% rename("Natural disaster"=Var1,"Frequency"=Freq)
damagebytype=as.data.frame(table(mostcost$EVTYPE))%>% arrange(desc(Freq)) %>% rename("Natural disaster"=Var1,"Frequency"=Freq)
For the result I created 2 graphs using ggplot and four html tables. this results are summarized in the synopsys of this document.
FLOOD, HURRICANE/TYPHOON, TORNADO, STORM SURGE, FLASH FLOOD account for damages totalling $330,353,607,790.
ggplot(totaldeaths,aes(reorder(EVTYPE,-deaths),deaths,fill=EVTYPE))+geom_col(show.legend = FALSE)+xlab("Natural Disaster")+labs(title="Top 5 deaths by Natural Disaster",subtitle ="1950 - 2011")
ggplot(totaldamage,aes(reorder(EVTYPE,-damages),damages,fill=EVTYPE))+geom_col(show.legend = FALSE)+xlab("Natural Disaster")+ylab("Damage")+labs(title="Top 5 costs by Natural Disaster",subtitle ="1950 - 2011")
totaldamage$damages=scales::dollar(totaldamage$damages)
htmlTable(totaldamage)
| EVTYPE | damages | |
|---|---|---|
| 1 | FLOOD | $144,657,709,800 |
| 2 | HURRICANE/TYPHOON | $69,305,840,000 |
| 3 | TORNADO | $56,925,660,480 |
| 4 | STORM SURGE | $43,323,536,000 |
| 5 | FLASH FLOOD | $16,140,861,510 |
htmlTable(totaldeaths)
| EVTYPE | deaths | |
|---|---|---|
| 1 | TORNADO | 5633 |
| 2 | EXCESSIVE HEAT | 1903 |
| 3 | FLASH FLOOD | 978 |
| 4 | HEAT | 937 |
| 5 | LIGHTNING | 816 |
htmlTable(deadbytype)
| Natural disaster | Frequency | |
|---|---|---|
| 1 | TORNADO | 45 |
| 2 | EXCESSIVE HEAT | 8 |
| 3 | FLASH FLOOD | 4 |
| 4 | BLIZZARD | 1 |
| 5 | HEAT | 1 |
| 6 | LIGHTNING | 1 |
| 7 | RIP CURRENT | 1 |
| 8 | TSTM WIND | 1 |
htmlTable(damagebytype)
| Natural disaster | Frequency | |
|---|---|---|
| 1 | TORNADO | 47 |
| 2 | HURRICANE | 3 |
| 3 | FLOOD | 2 |
| 4 | HAIL | 2 |
| 5 | HURRICANE/TYPHOON | 2 |
| 6 | HURRICANE OPAL | 1 |
| 7 | STORM SURGE/TIDE | 1 |
| 8 | TROPICAL STORM | 1 |
| 9 | WILD/FOREST FIRE | 1 |
| 10 | WILDFIRE | 1 |
| 11 | WINTER STORM | 1 |