Dear Governor,
I have prepared a presentation for you to view that details human and economic costs of natural disasters in the United States since 1980 so as to help you to make wise decisions for how to allocate disaster prevention funds. When it comes to deaths and injuries: heat, storms and tornadoes have the biggest impacts. Heat-related deaths are more prevalent than any other weather-related deaths while tornadoes injure far more people than does any other type of weather event. Floods and lightning also figure prominently into these figures. Storms include factors that are classified by high winds or directly by “storms”, not including tropical storms/hurricanes or dust storms. When it comes to economic costs: heavy rains are the most costly (the figures combine property damage and crop damage). Heavy rains include hurricanes and tropical storms (though a vast majority or cost comes from other heavy rains) and are far more damaging than hail and storms, which are in turn far more damaging than anything else. The work here shows how I calculated these figures and includes 3 bar plots, each of which show the total cost of the most damaging weather events by category.
To start, I maade sure I was in the proper directly and I downloaded the data from the NOAA. This was a very large data frame, so I pared it down to columns I cared about, such as Beginning Date, Event Type, death and injury columns and economic columns. I also downloaded the lubridate and dplyr packages to get started.
# Set the working directory
setwd("~/Desktop/r")
# Download data frame with all the data
orgFile<-read.csv("repdata_data_StormData.csv.bz2")
# Reduce dataframe to pertinent columns
RorgFile<-orgFile[,c(2,8,23:28)]
# Download necessary packages
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:lubridate':
##
## intersect, setdiff, union
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
I then continued to clean up the data frame. I sorted by date so as to include only events from 1980 or later, since the first few decades were not as reliable.
# Include only records from 1980 or later, since earlier records were not as complete or trustworthy.
RorgFile[,1]=as.character(RorgFile[,1])
RorgFile[,1]=as.Date(RorgFile[,1],format="%m/%d/%Y %H:%M:%S")
RorgFile<-RorgFile[order(RorgFile$BGN_DATE),]
head(which(RorgFile[,1]>="1980-01-01")) ## The first row is 75367
## [1] 75367 75368 75369 75370 75371 75372
RorgFile2<-RorgFile[75367:902297,]
The first goal was to find the deadliest event types. I grouped the data frame by event type and then summarized to find out how many deaths were attributed to each type.
# Group data frame by event type
File_by_type<-group_by(RorgFile2,EVTYPE)
# Find deadliest by fatality. First, summarize dataframe by event type and total fatalities
Fatal<-summarize(File_by_type,Fatalities=sum(FATALITIES))
# Then, make the "EVTYPE" column a character vector
Fatal<-as.data.frame(Fatal)
Fatal[,1]<-as.character(Fatal[,1])
# Pare down to only those with fatalities
pared_fatal<-as.data.frame(subset(Fatal,Fatalities!=0))
pared_fatal<-pared_fatal[order(pared_fatal$Fatalities,decreasing = TRUE),]
# Snapshot/preview of which event types have a lot of fatalities
sum(pared_fatal[,2]) ## (This is the number of total fatlities)
## [1] 11786
subset(Fatal,Fatalities>=250)
## EVTYPE Fatalities
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 170 FLOOD 470
## 275 HEAT 937
## 464 LIGHTNING 816
## 585 RIP CURRENT 368
## 834 TORNADO 2274
## 856 TSTM WIND 504
Now I have the data I want, but the problem is that there are a lot of overlapping names. For example “TSTM Winds” and “Thunderstorm Winds” mean the same thing but are classifired separately. After studying the data frame, I was able to reasonably combine entries that were of the same or similar types.
# Combine like terms
# Wind/Storm events have been combined:
WindRows<-which((grepl("wind",pared_fatal[,1],ignore.case = TRUE)|grepl("storm",pared_fatal[,1],ignore.case = TRUE)|grepl("blizzard",pared_fatal[,1],ignore.case = TRUE)|grepl("bad weather",pared_fatal[,1],ignore.case = TRUE))&!((grepl("chill",pared_fatal[,1],ignore.case = TRUE)|grepl("dust",pared_fatal[,1],ignore.case = TRUE))&grepl("storm",pared_fatal[,1],ignore.case = TRUE)))
# "Cold" events, "Hot" events, flood events, tornadoes and lightning all have high casualty numbers, too
ColdRows<-which(grepl("cold",pared_fatal[,1],ignore.case = TRUE))
HotRows<-which(grepl("heat",pared_fatal[,1],ignore.case = TRUE)|grepl("hot",pared_fatal[,1],ignore.case = TRUE))
TornadoRows<-which(grepl("tornado",pared_fatal[,1],ignore.case = TRUE)|grepl("funnel",pared_fatal[,1],ignore.case = TRUE)|grepl("waterspout",pared_fatal[,1],ignore.case = TRUE))
FloodRows<-which(grepl("flood",pared_fatal[,1],ignore.case = TRUE)|grepl("heavy rain",pared_fatal[,1],ignore.case = TRUE))
Lightning<-which(grepl("lightning",pared_fatal[,1],ignore.case = TRUE))
Other<-sum(pared_fatal[,2])-(sum(pared_fatal[HotRows,2])+sum(pared_fatal[WindRows,2])+sum(pared_fatal[TornadoRows,2])+sum(pared_fatal[FloodRows,2])+sum(pared_fatal[Lightning,2]))
Next, I organized a data frame by category which included the deadliest event types.
# Display the top 5 event types by fatalities
emptydf<-data.frame(nrow=0,ncol=2)
names(emptydf)<-c("Event_Type","Total_Fatalities")
emptydf<-rbind(emptydf,c("Heat",sum(pared_fatal[HotRows,2])),c("Tornados",sum(pared_fatal[TornadoRows,2])),c("Storms",sum(pared_fatal[WindRows,2])),c("Floods",sum(pared_fatal[FloodRows,2])),c("Lightning",sum(pared_fatal[Lightning,2])),c("Other",Other))
emptydf<-emptydf[2:7,]
emptydf[,2]<-as.numeric(emptydf[,2])
head(emptydf)
## Event_Type Total_Fatalities
## 2 Heat 3138
## 3 Tornados 2305
## 4 Storms 1952
## 5 Floods 1623
## 6 Lightning 817
## 7 Other 1951
# Plot deadliest
barplot(height=emptydf$Total_Fatalities,names=emptydf$Event_Type,col="red",xlab="Event Types",ylab="Total Fatalities",main="Total Fatalities by Event Type, 1980-Present")
## Data Processing: Event Types with Most Injuries The second goal was to find the deadliest event types. I grouped the data frame by event type and then summarized to find out how many deaths were attributed to each type.
# Find most costly to health by injury. First, summarize dataframe by event type and total injuries
Injure<-summarize(File_by_type,Injuries=sum(INJURIES))
# Then, make the "EVTYPE" column a character vector
Injure<-as.data.frame(Injure)
Injure[,1]<-as.character(Injure[,1])
# Pare down to only those that have injuries on record
pared_injure<-as.data.frame(subset(Injure,Injuries!=0))
pared_injure<-pared_injure[order(pared_injure$Injuries,decreasing = TRUE),]
# Snapshot/preview of which event types have a lot of injuries
sum(pared_injure[,2]) ## (This is the number of total injuries)
## [1] 87153
subset(Injure,Injuries>=1000)
## EVTYPE Injuries
## 130 EXCESSIVE HEAT 6525
## 153 FLASH FLOOD 1777
## 170 FLOOD 6789
## 244 HAIL 1361
## 275 HEAT 2100
## 310 HEAVY SNOW 1021
## 359 HIGH WIND 1137
## 411 HURRICANE/TYPHOON 1275
## 427 ICE STORM 1975
## 464 LIGHTNING 5230
## 760 THUNDERSTORM WIND 1488
## 834 TORNADO 37971
## 856 TSTM WIND 6957
## 972 WINTER STORM 1321
Now I have the data I want, but the problem is that there are a lot of overlapping names. For example “TSTM Winds” and “Thunderstorm Winds” mean the same thing but are classifired separately. After studying the data frame, I was able to reasonably combine entries that were of the same or similar types.
# Combine like terms
# Wind/Storm events have been combined:
WindRows2<-which((grepl("wind",pared_injure[,1],ignore.case = TRUE)|grepl("storm",pared_injure[,1],ignore.case = TRUE)|grepl("blizzard",pared_injure[,1],ignore.case = TRUE)|grepl("bad weather",pared_injure[,1],ignore.case = TRUE))&!((grepl("chill",pared_injure[,1],ignore.case = TRUE)|grepl("dust",pared_injure[,1],ignore.case = TRUE))&grepl("storm",pared_injure[,1],ignore.case = TRUE)))
# "Cold" events, "Hot" events, flood events, tornadoes and lightning all have high casualty numbers, too
ColdRows2<-which(grepl("cold",pared_injure[,1],ignore.case = TRUE))
HotRows2<-which(grepl("heat",pared_injure[,1],ignore.case = TRUE)|grepl("hot",pared_injure[,1],ignore.case = TRUE))
TornadoRows2<-which(grepl("tornado",pared_injure[,1],ignore.case = TRUE)|grepl("funnel",pared_injure[,1],ignore.case = TRUE)|grepl("waterspout",pared_injure[,1],ignore.case = TRUE))
FloodRows2<-which(grepl("flood",pared_injure[,1],ignore.case = TRUE)|grepl("heavy rain",pared_injure[,1],ignore.case = TRUE))
Lightning2<-which(grepl("lightning",pared_injure[,1],ignore.case = TRUE))
Other2<-sum(pared_injure[,2])-(sum(pared_injure[HotRows2,2])+sum(pared_injure[WindRows2,2])+sum(pared_injure[TornadoRows2,2])+sum(pared_injure[FloodRows2,2])+sum(pared_injure[Lightning2,2]))
Next, I organized a data frame by category which included the event types with the most injuries.
# Display the top 5 event types by injuries
emptydf2<-data.frame(nrow=0,ncol=2)
names(emptydf2)<-c("Event_Type","Total_Injuries")
emptydf2<-rbind(emptydf2,c("Tornados",sum(pared_injure[TornadoRows2,2])),c("Storms",sum(pared_injure[WindRows2,2])),c("Heat",sum(pared_injure[HotRows2,2])),c("Floods",sum(pared_injure[FloodRows2,2])),c("Lightning",sum(pared_injure[Lightning2,2])),c("Other",Other2))
emptydf2<-emptydf2[2:7,]
emptydf2[,2]<-as.numeric(emptydf2[,2])
emptydf2
## Event_Type Total_Injuries
## 2 Tornados 38064
## 3 Storms 16102
## 4 Heat 9224
## 5 Floods 8859
## 6 Lightning 5232
## 7 Other 9672
# Plot this information
barplot(height=emptydf2$Total_Injuries,names=emptydf2$Event_Type,col="blue",xlab="Event Types",ylab="Total Injuries",main="Total Injuries by Event Type, 1980-Present")
## Data Processing: Finding Most Expensive Event Types First, we will find the most costly events to the economy. We will create a new dataframe.
# Take only the rows and columns that are pertinent to economic cost
newRorgFile<-orgFile[,c(2,8,23:28)]
moneydf<-newRorgFile[75367:902297,5:8]
moneydf[,2]<-as.character(moneydf[,2])
moneydf[,4]<-as.character(moneydf[,4])
# Code the cost values by which letter they have. E.g.: every PROPDMGEXP column with a "B" means you multiply the PROPDMG number by one billion
hRows<-which(grepl("h",moneydf[,2],ignore.case = TRUE))
MRows<-which(grepl("m",moneydf[,2],ignore.case = TRUE))
BRows<-which(grepl("b",moneydf[,2],ignore.case = TRUE))
KRows<-which(grepl("k",moneydf[,2],ignore.case = TRUE))
# Code must be added here to account for the fact that 22 rows had a "K" code but had a number for property damage that was itself 1000 or more. I didn't want to multiply this by 1000 again, thus this next line of code.
KRows<-subset(KRows, moneydf[KRows,1]<1000)
moneydf[hRows,1]=moneydf[hRows,1]*100
moneydf[MRows,1]=moneydf[MRows,1]*1000000
moneydf[BRows,1]=moneydf[BRows,1]*1000000000
moneydf[KRows,1]=moneydf[KRows,1]*1000
# Do the same for Crop Damage, another economic component
hRows2<-which(grepl("h",moneydf[,4],ignore.case = TRUE))
MRows2<-which(grepl("m",moneydf[,4],ignore.case = TRUE))
BRows2<-which(grepl("b",moneydf[,4],ignore.case = TRUE))
KRows2<-which(grepl("k",moneydf[,4],ignore.case = TRUE))
# Code must be added here to account for the fact that 22 rows had a "K" code but had a number for property damage that was itself 1000 or more. I didn't want to multiply this by 1000 again, thus this next line of code.
KRows2<-subset(KRows, moneydf[KRows2,3]<1000)
moneydf[hRows2,3]=moneydf[hRows2,3]*100
moneydf[MRows2,3]=moneydf[MRows2,3]*1000000
moneydf[BRows2,3]=moneydf[BRows2,3]*1000000000
moneydf[KRows2,3]=moneydf[KRows2,3]*1000
# Reduce data frame to just the two pertinent columns and then add them to find the total economic cost
moneydf=moneydf[,c(1,3)]
EcoCost<-moneydf[,1]+moneydf[,2]
Ecodf<-cbind(as.data.frame(File_by_type[,2]),EcoCost)
names(Ecodf)<-c("Event_Type","ECODMG")
head(Ecodf,20)
## Event_Type ECODMG
## 1 TSTM WIND 0
## 2 TSTM WIND 0
## 3 TSTM WIND 0
## 4 TSTM WIND 0
## 5 TSTM WIND 0
## 6 TSTM WIND 0
## 7 TSTM WIND 0
## 8 TORNADO 0
## 9 TSTM WIND 0
## 10 TSTM WIND 0
## 11 TSTM WIND 0
## 12 TSTM WIND 0
## 13 TORNADO 0
## 14 HAIL 0
## 15 HAIL 0
## 16 HAIL 0
## 17 TSTM WIND 0
## 18 TSTM WIND 0
## 19 TSTM WIND 0
## 20 TSTM WIND 0
Next, we will group the data frame by event type and summarize to find the overall economic cost of each event type
# Group data frame by event type and then summarize
Eco<-group_by(Ecodf,Event_Type)
Eco<-summarize(Eco,ECODMG=sum(ECODMG))
# Then, make the "EVTYPE" column a character vector
Eco<-as.data.frame(Eco)
Eco[,1]<-as.character(Eco[,1])
Then, we will clean up the data frame to show the event types that have provided the most economic costs to the economy.
# Pare down to only those that have economic damage on record
dmg<-as.data.frame(subset(Eco,ECODMG!=0))
dmg<-dmg[order(dmg$ECODMG,decreasing = TRUE),]
# Snapshot/preview of which event types have a lot of economic damage
sum(dmg[,2]) ## (This is the number of total economic cost)
## [1] 1.26351e+13
subset(dmg,ECODMG>=1000000000)
## Event_Type ECODMG
## 290 HEAVY RAIN 5.235243e+12
## 244 HAIL 2.582759e+12
## 760 THUNDERSTORM WIND 1.316997e+12
## 153 FLASH FLOOD 1.011911e+12
## 834 TORNADO 4.315718e+11
## 856 TSTM WIND 4.273852e+11
## 978 WINTER WEATHER 3.120672e+11
## 786 THUNDERSTORM WINDS 2.421071e+11
## 464 LIGHTNING 2.200814e+11
## 972 WINTER STORM 1.794639e+11
## 359 HIGH WIND 1.353791e+11
## 310 HEAVY SNOW 9.184351e+10
## 955 WILD/FOREST FIRE 7.312498e+10
## 170 FLOOD 6.755644e+10
## 676 STRONG WIND 6.734107e+10
## 886 UNSEASONABLY WARM 5.571687e+10
## 489 MARINE THUNDERSTORM WIND 4.061617e+10
## 671 STORM SURGE/TIDE 1.502340e+10
## 95 DROUGHT 1.425448e+10
## 936 WATERSPOUT 1.238740e+10
## 216 FUNNEL CLOUD 1.152321e+10
## 376 HIGH WINDS 1.031900e+10
## 848 TROPICAL STORM 7.834502e+09
## 117 DUST STORM 6.387927e+09
## 957 WILDFIRE 6.008750e+09
## 140 EXTREME COLD 5.136926e+09
## 19 AVALANCHE 5.128749e+09
## 642 SNOW FREEZING RAIN 5.000702e+09
## 490 MARINE TSTM WIND 4.569452e+09
## 275 HEAT 4.222493e+09
## 54 COASTAL FLOOD 4.030404e+09
## 30 BLIZZARD 3.460153e+09
## 919 URBAN/SML STREAM FLD 3.214792e+09
## 130 EXCESSIVE HEAT 3.212971e+09
## 967 WIND GUSTS 3.000100e+09
## 427 ICE STORM 2.167837e+09
## 89 DENSE FOG 2.069869e+09
## 115 DUST DEVIL 2.002002e+09
## 488 MARINE STRONG WIND 2.000182e+09
## 485 MARINE HAIL 1.711862e+09
## 177 FLOOD/FLASH FLOOD 1.551561e+09
## 221 Glaze 1.400560e+09
## 440 LAKE-EFFECT SNOW 1.045724e+09
## 585 RIP CURRENT 1.031174e+09
# Combine like terms
# Wind/Storm events have been combined. "Heavy rain" includes hurricanes and tropical storms.
RainRows3<-which(grepl("rain",dmg[,1],ignore.case = TRUE)|grepl("hurricane",dmg,ignore.case=TRUE)|grepl("tropical",dmg,ignore.case = TRUE))
HailRows3<-which(grepl("hail",dmg[,1],ignore.case=TRUE))
WindRows3<-which((grepl("wind",dmg[,1],ignore.case = TRUE)|grepl("storm",dmg[,1],ignore.case = TRUE)|grepl("blizzard",dmg[,1],ignore.case = TRUE)|grepl("bad weather",dmg[,1],ignore.case = TRUE))&!(grepl("chill",dmg[,1],ignore.case = TRUE)|grepl("dust",dmg[,1],ignore.case = TRUE)|grepl("tropical",dmg[,1],ignore.case = TRUE)))
ColdRows3<-which((grepl("cold",dmg[,1],ignore.case = TRUE)|grepl("Winter",dmg,ignore.case = TRUE)))
HotRows3<-which(grepl("heat",dmg[,1],ignore.case = TRUE)|grepl("hot",dmg[,1],ignore.case = TRUE))
TornadoRows3<-which(grepl("tornado",dmg[,1],ignore.case = TRUE)|grepl("funnel",dmg[,1],ignore.case = TRUE)|grepl("waterspout",dmg[,1],ignore.case = TRUE))
FloodRows3<-which(grepl("flood",dmg[,1],ignore.case = TRUE)|grepl("heavy rain",dmg[,1],ignore.case = TRUE))
Lightning3<-which(grepl("lightning",dmg[,1],ignore.case = TRUE))
Other3<-sum(dmg[,2])-(sum(dmg[WindRows3,2])+sum(dmg[TornadoRows3,2])+sum(dmg[RainRows3,2])+sum(dmg[HailRows3,2]))
Then, we will create a data frame with this information
# Display the top events by cost
emptydf3<-data.frame(nrow=0,ncol=2)
names(emptydf3)<-c("Event_Type","Total_Cost")
emptydf3<-rbind(emptydf3,c("Heavy Rain",sum(dmg[RainRows3,2])/sum(dmg[,2])),c("Hail",sum(dmg[HailRows3,2])/sum(dmg[,2])),c("Storms",sum(dmg[WindRows3,2])/sum(dmg[,2])),c("Tornados",sum(dmg[TornadoRows3,2])/sum(dmg[,2])))
emptydf3<-emptydf3[2:5,]
emptydf3[,2]<-as.numeric(emptydf3[,2])
emptydf3
## Event_Type Total_Cost
## 2 Heavy Rain 0.6257163
## 3 Hail 0.2045542
## 4 Storms 0.1939226
## 5 Tornados 0.0360506
# Plot this information
barplot(height=emptydf3$Total_Cost,names=emptydf3$Event_Type,col="green",xlab="Event Types",ylab="Cost as Percentage",main="Percentage Cost by Event Type, 1980-Present")