Synopsis

Dear Governor,

I have prepared a presentation for you to view that details human and economic costs of natural disasters in the United States since 1980 so as to help you to make wise decisions for how to allocate disaster prevention funds. When it comes to deaths and injuries: heat, storms and tornadoes have the biggest impacts. Heat-related deaths are more prevalent than any other weather-related deaths while tornadoes injure far more people than does any other type of weather event. Floods and lightning also figure prominently into these figures. Storms include factors that are classified by high winds or directly by “storms”, not including tropical storms/hurricanes or dust storms. When it comes to economic costs: heavy rains are the most costly (the figures combine property damage and crop damage). Heavy rains include hurricanes and tropical storms (though a vast majority or cost comes from other heavy rains) and are far more damaging than hail and storms, which are in turn far more damaging than anything else. The work here shows how I calculated these figures and includes 3 bar plots, each of which show the total cost of the most damaging weather events by category.

Pre-Processing the Data

To start, I maade sure I was in the proper directly and I downloaded the data from the NOAA. This was a very large data frame, so I pared it down to columns I cared about, such as Beginning Date, Event Type, death and injury columns and economic columns. I also downloaded the lubridate and dplyr packages to get started.

# Set the working directory
setwd("~/Desktop/r")

# Download data frame with all the data
orgFile<-read.csv("repdata_data_StormData.csv.bz2")

# Reduce dataframe to pertinent columns
RorgFile<-orgFile[,c(2,8,23:28)]

# Download necessary packages
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:lubridate':
## 
##     intersect, setdiff, union
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

I then continued to clean up the data frame. I sorted by date so as to include only events from 1980 or later, since the first few decades were not as reliable.

# Include only records from 1980 or later, since earlier records were not as complete or trustworthy.
RorgFile[,1]=as.character(RorgFile[,1])
RorgFile[,1]=as.Date(RorgFile[,1],format="%m/%d/%Y %H:%M:%S")
RorgFile<-RorgFile[order(RorgFile$BGN_DATE),]
head(which(RorgFile[,1]>="1980-01-01"))  ## The first row is 75367
## [1] 75367 75368 75369 75370 75371 75372
RorgFile2<-RorgFile[75367:902297,]

Data Processing: Finding Deadliest Event Types

The first goal was to find the deadliest event types. I grouped the data frame by event type and then summarized to find out how many deaths were attributed to each type.

# Group data frame by event type
File_by_type<-group_by(RorgFile2,EVTYPE)

# Find deadliest by fatality.  First, summarize dataframe by event type and total fatalities
Fatal<-summarize(File_by_type,Fatalities=sum(FATALITIES))

# Then, make the "EVTYPE" column a character vector
Fatal<-as.data.frame(Fatal)
Fatal[,1]<-as.character(Fatal[,1])

# Pare down to only those with fatalities
pared_fatal<-as.data.frame(subset(Fatal,Fatalities!=0))
pared_fatal<-pared_fatal[order(pared_fatal$Fatalities,decreasing = TRUE),]

# Snapshot/preview of which event types have a lot of fatalities
sum(pared_fatal[,2]) ## (This is the number of total fatlities)
## [1] 11786
subset(Fatal,Fatalities>=250)
##             EVTYPE Fatalities
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 170          FLOOD        470
## 275           HEAT        937
## 464      LIGHTNING        816
## 585    RIP CURRENT        368
## 834        TORNADO       2274
## 856      TSTM WIND        504

Now I have the data I want, but the problem is that there are a lot of overlapping names. For example “TSTM Winds” and “Thunderstorm Winds” mean the same thing but are classifired separately. After studying the data frame, I was able to reasonably combine entries that were of the same or similar types.

# Combine like terms
# Wind/Storm events have been combined:
WindRows<-which((grepl("wind",pared_fatal[,1],ignore.case = TRUE)|grepl("storm",pared_fatal[,1],ignore.case = TRUE)|grepl("blizzard",pared_fatal[,1],ignore.case = TRUE)|grepl("bad weather",pared_fatal[,1],ignore.case = TRUE))&!((grepl("chill",pared_fatal[,1],ignore.case = TRUE)|grepl("dust",pared_fatal[,1],ignore.case = TRUE))&grepl("storm",pared_fatal[,1],ignore.case = TRUE)))
# "Cold" events, "Hot" events, flood events, tornadoes and lightning all have high casualty numbers, too
ColdRows<-which(grepl("cold",pared_fatal[,1],ignore.case = TRUE))
HotRows<-which(grepl("heat",pared_fatal[,1],ignore.case = TRUE)|grepl("hot",pared_fatal[,1],ignore.case = TRUE))
TornadoRows<-which(grepl("tornado",pared_fatal[,1],ignore.case = TRUE)|grepl("funnel",pared_fatal[,1],ignore.case = TRUE)|grepl("waterspout",pared_fatal[,1],ignore.case = TRUE))
FloodRows<-which(grepl("flood",pared_fatal[,1],ignore.case = TRUE)|grepl("heavy rain",pared_fatal[,1],ignore.case = TRUE))
Lightning<-which(grepl("lightning",pared_fatal[,1],ignore.case = TRUE))
Other<-sum(pared_fatal[,2])-(sum(pared_fatal[HotRows,2])+sum(pared_fatal[WindRows,2])+sum(pared_fatal[TornadoRows,2])+sum(pared_fatal[FloodRows,2])+sum(pared_fatal[Lightning,2]))

Next, I organized a data frame by category which included the deadliest event types.

# Display the top 5 event types by fatalities
emptydf<-data.frame(nrow=0,ncol=2)
names(emptydf)<-c("Event_Type","Total_Fatalities")
emptydf<-rbind(emptydf,c("Heat",sum(pared_fatal[HotRows,2])),c("Tornados",sum(pared_fatal[TornadoRows,2])),c("Storms",sum(pared_fatal[WindRows,2])),c("Floods",sum(pared_fatal[FloodRows,2])),c("Lightning",sum(pared_fatal[Lightning,2])),c("Other",Other))
emptydf<-emptydf[2:7,]
emptydf[,2]<-as.numeric(emptydf[,2])
head(emptydf)
##   Event_Type Total_Fatalities
## 2       Heat             3138
## 3   Tornados             2305
## 4     Storms             1952
## 5     Floods             1623
## 6  Lightning              817
## 7      Other             1951

Results: Finding the Deadliest Event Types

# Plot deadliest
barplot(height=emptydf$Total_Fatalities,names=emptydf$Event_Type,col="red",xlab="Event Types",ylab="Total Fatalities",main="Total Fatalities by Event Type, 1980-Present")

## Data Processing: Event Types with Most Injuries The second goal was to find the deadliest event types. I grouped the data frame by event type and then summarized to find out how many deaths were attributed to each type.

# Find most costly to health by injury.  First, summarize dataframe by event type and total injuries
Injure<-summarize(File_by_type,Injuries=sum(INJURIES))

# Then, make the "EVTYPE" column a character vector
Injure<-as.data.frame(Injure)
Injure[,1]<-as.character(Injure[,1])

# Pare down to only those that have injuries on record
pared_injure<-as.data.frame(subset(Injure,Injuries!=0))
pared_injure<-pared_injure[order(pared_injure$Injuries,decreasing = TRUE),]

# Snapshot/preview of which event types have a lot of injuries
sum(pared_injure[,2]) ## (This is the number of total injuries)
## [1] 87153
subset(Injure,Injuries>=1000)
##                EVTYPE Injuries
## 130    EXCESSIVE HEAT     6525
## 153       FLASH FLOOD     1777
## 170             FLOOD     6789
## 244              HAIL     1361
## 275              HEAT     2100
## 310        HEAVY SNOW     1021
## 359         HIGH WIND     1137
## 411 HURRICANE/TYPHOON     1275
## 427         ICE STORM     1975
## 464         LIGHTNING     5230
## 760 THUNDERSTORM WIND     1488
## 834           TORNADO    37971
## 856         TSTM WIND     6957
## 972      WINTER STORM     1321

Now I have the data I want, but the problem is that there are a lot of overlapping names. For example “TSTM Winds” and “Thunderstorm Winds” mean the same thing but are classifired separately. After studying the data frame, I was able to reasonably combine entries that were of the same or similar types.

# Combine like terms
# Wind/Storm events have been combined:
WindRows2<-which((grepl("wind",pared_injure[,1],ignore.case = TRUE)|grepl("storm",pared_injure[,1],ignore.case = TRUE)|grepl("blizzard",pared_injure[,1],ignore.case = TRUE)|grepl("bad weather",pared_injure[,1],ignore.case = TRUE))&!((grepl("chill",pared_injure[,1],ignore.case = TRUE)|grepl("dust",pared_injure[,1],ignore.case = TRUE))&grepl("storm",pared_injure[,1],ignore.case = TRUE)))
# "Cold" events, "Hot" events, flood events, tornadoes and lightning all have high casualty numbers, too
ColdRows2<-which(grepl("cold",pared_injure[,1],ignore.case = TRUE))
HotRows2<-which(grepl("heat",pared_injure[,1],ignore.case = TRUE)|grepl("hot",pared_injure[,1],ignore.case = TRUE))
TornadoRows2<-which(grepl("tornado",pared_injure[,1],ignore.case = TRUE)|grepl("funnel",pared_injure[,1],ignore.case = TRUE)|grepl("waterspout",pared_injure[,1],ignore.case = TRUE))
FloodRows2<-which(grepl("flood",pared_injure[,1],ignore.case = TRUE)|grepl("heavy rain",pared_injure[,1],ignore.case = TRUE))
Lightning2<-which(grepl("lightning",pared_injure[,1],ignore.case = TRUE))
Other2<-sum(pared_injure[,2])-(sum(pared_injure[HotRows2,2])+sum(pared_injure[WindRows2,2])+sum(pared_injure[TornadoRows2,2])+sum(pared_injure[FloodRows2,2])+sum(pared_injure[Lightning2,2]))

Next, I organized a data frame by category which included the event types with the most injuries.

# Display the top 5 event types by injuries
emptydf2<-data.frame(nrow=0,ncol=2)
names(emptydf2)<-c("Event_Type","Total_Injuries")
emptydf2<-rbind(emptydf2,c("Tornados",sum(pared_injure[TornadoRows2,2])),c("Storms",sum(pared_injure[WindRows2,2])),c("Heat",sum(pared_injure[HotRows2,2])),c("Floods",sum(pared_injure[FloodRows2,2])),c("Lightning",sum(pared_injure[Lightning2,2])),c("Other",Other2))
emptydf2<-emptydf2[2:7,]
emptydf2[,2]<-as.numeric(emptydf2[,2])
emptydf2
##   Event_Type Total_Injuries
## 2   Tornados          38064
## 3     Storms          16102
## 4       Heat           9224
## 5     Floods           8859
## 6  Lightning           5232
## 7      Other           9672

Results: Plotting the events with most injuries

# Plot this information
barplot(height=emptydf2$Total_Injuries,names=emptydf2$Event_Type,col="blue",xlab="Event Types",ylab="Total Injuries",main="Total Injuries by Event Type, 1980-Present")

## Data Processing: Finding Most Expensive Event Types First, we will find the most costly events to the economy. We will create a new dataframe.

# Take only the rows and columns that are pertinent to economic cost
newRorgFile<-orgFile[,c(2,8,23:28)]
moneydf<-newRorgFile[75367:902297,5:8]
moneydf[,2]<-as.character(moneydf[,2])
moneydf[,4]<-as.character(moneydf[,4])

# Code the cost values by which letter they have.  E.g.: every PROPDMGEXP column with a "B" means you multiply the PROPDMG number by one billion
hRows<-which(grepl("h",moneydf[,2],ignore.case = TRUE))
MRows<-which(grepl("m",moneydf[,2],ignore.case = TRUE))
BRows<-which(grepl("b",moneydf[,2],ignore.case = TRUE))
KRows<-which(grepl("k",moneydf[,2],ignore.case = TRUE))

# Code must be added here to account for the fact that 22 rows had a "K" code but had a number for property damage that was itself 1000 or more.  I didn't want to multiply this by 1000 again, thus this next line of code.
KRows<-subset(KRows, moneydf[KRows,1]<1000)
moneydf[hRows,1]=moneydf[hRows,1]*100
moneydf[MRows,1]=moneydf[MRows,1]*1000000
moneydf[BRows,1]=moneydf[BRows,1]*1000000000
moneydf[KRows,1]=moneydf[KRows,1]*1000

# Do the same for Crop Damage, another economic component
hRows2<-which(grepl("h",moneydf[,4],ignore.case = TRUE))
MRows2<-which(grepl("m",moneydf[,4],ignore.case = TRUE))
BRows2<-which(grepl("b",moneydf[,4],ignore.case = TRUE))
KRows2<-which(grepl("k",moneydf[,4],ignore.case = TRUE))

# Code must be added here to account for the fact that 22 rows had a "K" code but had a number for property damage that was itself 1000 or more.  I didn't want to multiply this by 1000 again, thus this next line of code.
KRows2<-subset(KRows, moneydf[KRows2,3]<1000)
moneydf[hRows2,3]=moneydf[hRows2,3]*100
moneydf[MRows2,3]=moneydf[MRows2,3]*1000000
moneydf[BRows2,3]=moneydf[BRows2,3]*1000000000
moneydf[KRows2,3]=moneydf[KRows2,3]*1000

# Reduce data frame to just the two pertinent columns and then add them to find the total economic cost
moneydf=moneydf[,c(1,3)]
EcoCost<-moneydf[,1]+moneydf[,2]
Ecodf<-cbind(as.data.frame(File_by_type[,2]),EcoCost)
names(Ecodf)<-c("Event_Type","ECODMG")
head(Ecodf,20)
##    Event_Type ECODMG
## 1   TSTM WIND      0
## 2   TSTM WIND      0
## 3   TSTM WIND      0
## 4   TSTM WIND      0
## 5   TSTM WIND      0
## 6   TSTM WIND      0
## 7   TSTM WIND      0
## 8     TORNADO      0
## 9   TSTM WIND      0
## 10  TSTM WIND      0
## 11  TSTM WIND      0
## 12  TSTM WIND      0
## 13    TORNADO      0
## 14       HAIL      0
## 15       HAIL      0
## 16       HAIL      0
## 17  TSTM WIND      0
## 18  TSTM WIND      0
## 19  TSTM WIND      0
## 20  TSTM WIND      0

Next, we will group the data frame by event type and summarize to find the overall economic cost of each event type

# Group data frame by event type and then summarize
Eco<-group_by(Ecodf,Event_Type)
Eco<-summarize(Eco,ECODMG=sum(ECODMG))

# Then, make the "EVTYPE" column a character vector
Eco<-as.data.frame(Eco)
Eco[,1]<-as.character(Eco[,1])

Then, we will clean up the data frame to show the event types that have provided the most economic costs to the economy.

# Pare down to only those that have economic damage on record
dmg<-as.data.frame(subset(Eco,ECODMG!=0))
dmg<-dmg[order(dmg$ECODMG,decreasing = TRUE),]

# Snapshot/preview of which event types have a lot of economic damage
sum(dmg[,2]) ## (This is the number of total economic cost)
## [1] 1.26351e+13
subset(dmg,ECODMG>=1000000000)
##                   Event_Type       ECODMG
## 290               HEAVY RAIN 5.235243e+12
## 244                     HAIL 2.582759e+12
## 760        THUNDERSTORM WIND 1.316997e+12
## 153              FLASH FLOOD 1.011911e+12
## 834                  TORNADO 4.315718e+11
## 856                TSTM WIND 4.273852e+11
## 978           WINTER WEATHER 3.120672e+11
## 786       THUNDERSTORM WINDS 2.421071e+11
## 464                LIGHTNING 2.200814e+11
## 972             WINTER STORM 1.794639e+11
## 359                HIGH WIND 1.353791e+11
## 310               HEAVY SNOW 9.184351e+10
## 955         WILD/FOREST FIRE 7.312498e+10
## 170                    FLOOD 6.755644e+10
## 676              STRONG WIND 6.734107e+10
## 886        UNSEASONABLY WARM 5.571687e+10
## 489 MARINE THUNDERSTORM WIND 4.061617e+10
## 671         STORM SURGE/TIDE 1.502340e+10
## 95                   DROUGHT 1.425448e+10
## 936               WATERSPOUT 1.238740e+10
## 216             FUNNEL CLOUD 1.152321e+10
## 376               HIGH WINDS 1.031900e+10
## 848           TROPICAL STORM 7.834502e+09
## 117               DUST STORM 6.387927e+09
## 957                 WILDFIRE 6.008750e+09
## 140             EXTREME COLD 5.136926e+09
## 19                 AVALANCHE 5.128749e+09
## 642       SNOW FREEZING RAIN 5.000702e+09
## 490         MARINE TSTM WIND 4.569452e+09
## 275                     HEAT 4.222493e+09
## 54             COASTAL FLOOD 4.030404e+09
## 30                  BLIZZARD 3.460153e+09
## 919     URBAN/SML STREAM FLD 3.214792e+09
## 130           EXCESSIVE HEAT 3.212971e+09
## 967               WIND GUSTS 3.000100e+09
## 427                ICE STORM 2.167837e+09
## 89                 DENSE FOG 2.069869e+09
## 115               DUST DEVIL 2.002002e+09
## 488       MARINE STRONG WIND 2.000182e+09
## 485              MARINE HAIL 1.711862e+09
## 177        FLOOD/FLASH FLOOD 1.551561e+09
## 221                    Glaze 1.400560e+09
## 440         LAKE-EFFECT SNOW 1.045724e+09
## 585              RIP CURRENT 1.031174e+09
# Combine like terms
# Wind/Storm events have been combined.  "Heavy rain" includes hurricanes and tropical storms.
RainRows3<-which(grepl("rain",dmg[,1],ignore.case = TRUE)|grepl("hurricane",dmg,ignore.case=TRUE)|grepl("tropical",dmg,ignore.case = TRUE))
HailRows3<-which(grepl("hail",dmg[,1],ignore.case=TRUE))
WindRows3<-which((grepl("wind",dmg[,1],ignore.case = TRUE)|grepl("storm",dmg[,1],ignore.case = TRUE)|grepl("blizzard",dmg[,1],ignore.case = TRUE)|grepl("bad weather",dmg[,1],ignore.case = TRUE))&!(grepl("chill",dmg[,1],ignore.case = TRUE)|grepl("dust",dmg[,1],ignore.case = TRUE)|grepl("tropical",dmg[,1],ignore.case = TRUE)))
ColdRows3<-which((grepl("cold",dmg[,1],ignore.case = TRUE)|grepl("Winter",dmg,ignore.case = TRUE)))
HotRows3<-which(grepl("heat",dmg[,1],ignore.case = TRUE)|grepl("hot",dmg[,1],ignore.case = TRUE))
TornadoRows3<-which(grepl("tornado",dmg[,1],ignore.case = TRUE)|grepl("funnel",dmg[,1],ignore.case = TRUE)|grepl("waterspout",dmg[,1],ignore.case = TRUE))
FloodRows3<-which(grepl("flood",dmg[,1],ignore.case = TRUE)|grepl("heavy rain",dmg[,1],ignore.case = TRUE))
Lightning3<-which(grepl("lightning",dmg[,1],ignore.case = TRUE))
Other3<-sum(dmg[,2])-(sum(dmg[WindRows3,2])+sum(dmg[TornadoRows3,2])+sum(dmg[RainRows3,2])+sum(dmg[HailRows3,2]))

Then, we will create a data frame with this information

# Display the top events by cost
emptydf3<-data.frame(nrow=0,ncol=2)
names(emptydf3)<-c("Event_Type","Total_Cost")
emptydf3<-rbind(emptydf3,c("Heavy Rain",sum(dmg[RainRows3,2])/sum(dmg[,2])),c("Hail",sum(dmg[HailRows3,2])/sum(dmg[,2])),c("Storms",sum(dmg[WindRows3,2])/sum(dmg[,2])),c("Tornados",sum(dmg[TornadoRows3,2])/sum(dmg[,2])))
emptydf3<-emptydf3[2:5,]
emptydf3[,2]<-as.numeric(emptydf3[,2])
emptydf3
##   Event_Type Total_Cost
## 2 Heavy Rain  0.6257163
## 3       Hail  0.2045542
## 4     Storms  0.1939226
## 5   Tornados  0.0360506

Results: Economic Damage

# Plot this information
barplot(height=emptydf3$Total_Cost,names=emptydf3$Event_Type,col="green",xlab="Event Types",ylab="Cost as Percentage",main="Percentage Cost by Event Type, 1980-Present")