From the NOAA Storm Damages Database, what natural disasters cause the most damage in the U.S; both economically and humanitarian wise?
The results of this data analysis conclude that across all the various natural disasters, tornados cause the most injuries by far. All other natural disasters, i.e. avalanches, extreme cold, dust, fog, rain, and even hurricanes do not cause many injuries overall. At the same time, the top disaster that caused the most fatalities was tornados again. Other disasters have overall less fatalities overall. The reason these results are as they are may be because there are fewer occurences or records of other disasters. Because of this, it may be useful to see the average injuries and fatalities per record across all major disasters. On average, Extreme heat, tornados, and hurricanes cause the most fatalities by far and most injuries , though followed closely by fog, flood, dust, and winter storms.
Overall, high winds, floods, and thunder storms cause the most property damage by far; though on average, extreme cold, fog, hurricane, and winter storms cause the most property damage per occurrence.
Data processing for this analysis began with downloading the data from the link below, and loading the data into R.
##downloading the file
download.file(url = "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "./StormData.csv")
##reading in file
stormData <- read.csv(file = "StormData.csv.bz2")
##removing EVTYPE entries that say "Summary" as it is not possible to use them for this analysis
stormData2 <- stormData[-grep("Summary", stormData$EVTYPE),]
Here, the code creates a new table called WeatherConditions1 which records total fatalities, injuries, and occurrences of the 11 different disasters. The 11 disasters chosen to graph were Avalanche, Cold, Dust, Flood, Fog, Heat, Hurricane, Rain, Thunderstorm, Wind, and Winter storm. there were over 900 different disasters recorded under EVTYPE in the original data. The following code attemps to use functions such as grep to scrape as much data from the original data and automatically consolidate it under the 11 chosen disasters. For example, anything with ice and snow was catagorized under Winter Storm for simplicity sake. It simply would not be effective nor meaningful to graph over 900 different EVTYPE’s, many of which were repeated, just spelled or phrased differently. The terms used to scrape the original data were left intentionally vague and simple in order to try and accomodate even misspellings of some of the EVTYPE’s. The following code has been commented often to aid the reader in understanding it.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##removing all entries that have no fatalities or no injuries as it is not relevent to this analysis
stormData3 <- filter(stormData2, FATALITIES > 0 | INJURIES > 0)
##grouping data by EVTYPE for easy tabling
stormDataGrouped1 <- group_by(stormData3, EVTYPE)
##creating a table of EVTYPE
Evtabs2 <- data.frame(table(stormData3$EVTYPE))
##cleaning table
Evtabs2 <- filter(Evtabs2, Freq > 0)
##create table of the sums of fatalities and injuries by grouped EVTYPE
humanCostData <- summarize(stormDataGrouped1, fat = sum(FATALITIES), inj = sum(INJURIES))
##appending frequency counts to data table
humanCostData$freq <- Evtabs2$Freq
##catagories of events to be used
##most EV types will be consolidated to fall under following catagories
ListOfWeatherConditions1 <- c("Avalanche", "Flood", "Cold", "Dust", "Heat", "Rain", "Wind", "Hurricane", "ThunderStorm", "WinterStorm", "Fog", "Tornado")
##transforming event catagories into dataframe
##weatherConditions1 dataframe will be the main dataframe used for analysis
WeatherConditions1 <- data.frame(ListOfWeatherConditions1)
##creating and innitializing total fatilities and injuries for easy editing later on
WeatherConditions1 <- mutate(WeatherConditions1, fat = 0, inj = 0, freq = 0)
##function to scrape EVTYPE's from humanCostData and consolidate fatalities and injuries under preset catagories in WeatherConditions1
conditionNumsCalc <- function(x,y) {
## creating words to look for
conditions <- c("flood", "cold", "dust", "heat", "rain", "wind", "hurri", "thund", "wint", "fog","torn")
## x = humanCostData
##cycle through the various conditions
for(i in 1:length(conditions)){
##find indexes of all the conditions that match the first condition
Indx1 <- grep(conditions[i], x$EVTYPE, ignore.case = TRUE)
## y = WeatherConditions1
##for the weathercondition1 data, find the row that cooresponds with the current condition, and sum the fats injs
fatSums <- x[Indx1,2]
injSums <- x[Indx1,3]
freqSums <- x[Indx1,4]
y$fat[grep(conditions[i], y$ListOfWeatherConditions1, ignore.case = TRUE)] <- colSums(fatSums)
y$inj[grep(conditions[i], y$ListOfWeatherConditions1, ignore.case = TRUE)] <- colSums(injSums)
y$freq[grep(conditions[i], y$ListOfWeatherConditions1, ignore.case = TRUE)] <- colSums(freqSums)
}
y
}
WeatherConditions1 <- conditionNumsCalc(x = humanCostData, y = WeatherConditions1)
##adding in Avalanche separately
##creating an index of rows from humanCostData that will consolidate under "Avalanche" in WeatherConditions1
AvalancheIndx1 <- c(grep("AVALANCHE", humanCostData$EVTYPE,ignore.case = TRUE),grep("AVALANCE", humanCostData$EVTYPE),grep("LANDSLIDE", humanCostData$EVTYPE, ignore.case = TRUE),grep("Mudslide", humanCostData$EVTYPE, ignore.case = TRUE))
WeatherConditions1$fat[WeatherConditions1$ListOfWeatherConditions1 == "Avalanche"] <- sum(humanCostData[AvalancheIndx1,2])
WeatherConditions1$inj[WeatherConditions1$ListOfWeatherConditions1 == "Avalanche"] <- sum(humanCostData[AvalancheIndx1,3])
WeatherConditions1$freq[WeatherConditions1$ListOfWeatherConditions1 == "Avalanche"] <- sum(humanCostData[AvalancheIndx1,4])
##adding in lightning data separetely
##lightning occurs with thunderstorms, so the two will be aggregated as one for the sake of simplicity
LightIndx1 <- grep("Light", humanCostData$EVTYPE, ignore.case = TRUE)
##fatalities
WeatherConditions1$fat[WeatherConditions1$ListOfWeatherConditions1 == "ThunderStorm"] <- WeatherConditions1$fat[WeatherConditions1$ListOfWeatherConditions1 == "ThunderStorm"] + sum(humanCostData[LightIndx1,2])
##injuries
WeatherConditions1$inj[WeatherConditions1$ListOfWeatherConditions1 == "ThunderStorm"] <- WeatherConditions1$inj[WeatherConditions1$ListOfWeatherConditions1 == "ThunderStorm"] + sum(humanCostData[LightIndx1,3])
WeatherConditions1$freq[WeatherConditions1$ListOfWeatherConditions1 == "ThunderStorm"] <- WeatherConditions1$freq[WeatherConditions1$ListOfWeatherConditions1 == "ThunderStorm"] + sum(humanCostData[LightIndx1,4])
##adding in snow and ice separetly
##snow and ice are being aggregated into WinterStorm for simplicity
SnowIndx1 <- c(grep("snow", humanCostData$EVTYPE,ignore.case = TRUE), grep("ice", humanCostData$EVTYPE, ignore.case = TRUE))
##fatalities
WeatherConditions1$fat[WeatherConditions1$ListOfWeatherConditions1 == "WinterStorm"] <- WeatherConditions1$fat[WeatherConditions1$ListOfWeatherConditions1 == "WinterStorm"] + sum(humanCostData[SnowIndx1,2])
##injuries
WeatherConditions1$inj[WeatherConditions1$ListOfWeatherConditions1 == "WinterStorm"] <- WeatherConditions1$inj[WeatherConditions1$ListOfWeatherConditions1 == "WinterStorm"] + sum(humanCostData[SnowIndx1,3])
WeatherConditions1$freq[WeatherConditions1$ListOfWeatherConditions1 == "WinterStorm"] <- WeatherConditions1$freq[WeatherConditions1$ListOfWeatherConditions1 == "WinterStorm"] + sum(humanCostData[SnowIndx1,4])
This code was written specifically for creating the WeatherConditions2 table, which catagorized the amount of property damage total per disaster, as well as the frequency. The disaster titles chosen here are the same as above used for WeatherConditions1. Much of the code here is similar to above, though some changes were made to better handle and transform the data specifically for property damage monetary values.
library(dplyr)
##removing all entries with no property damage as they are irrelevant to analysis
stormData4 <- filter(stormData2, PROPDMG > 0)
##factor in the PROPDMGEXP
dmgMultiplier <- function(x){
for(i in 1:length(x)){
##checking if k, then mult by 1000
if(x$PROPDMGEXP[i] == "K"){
x$PROPDMG[i] = x$PROPDMG[i] * 1000
}
##checking if m, then mult by 1000000
if(x$PROPDMGEXP[i] == "M"){
x$PROPDMG[i] = x$PROPDMG[i] * 1000000
}
}
x
}
##apply dmgMultiplier
stormData4 <- dmgMultiplier(stormData4)
##grouping by EVTYPE
stormData4 <- group_by(stormData4, EVTYPE)
##aggregating EVTYPEs
Evtabs5 <- data.frame(table(stormData4$EVTYPE))
##cleaning table
Evtabs5 <- filter(Evtabs5, Freq > 0)
##creating dataframe of aggregated prop damage by EVTYPE
propDmgsData <- summarize(stormData4, propDMG = sum(PROPDMG))
##appending frequency counts to data table
propDmgsData$freq <- Evtabs5$Freq
##catagories of events to be used
##most EV types will be consolidated to fall under following catagories
ListOfWeatherConditions2 <- c("Avalanche", "Flood", "Cold", "Dust", "Heat", "Rain", "Wind", "Hurricane", "ThunderStorm", "WinterStorm", "Fog", "Tornado")
##transforming event catagories into dataframe
##weatherConditions2 dataframe will be the main dataframe used for analysis
WeatherConditions2 <- data.frame(ListOfWeatherConditions2)
##creating and innitializing total damages for easy editing later on
WeatherConditions2 <- mutate(WeatherConditions2, DMG = 0, freq = 0)
##function to scrape EVTYPE's from humanCostData and consolidate fatalities and injuries under preset catagories in WeatherConditions1
conditionNumsCalc2 <- function(x,y) {
## creating words to look for
conditions2 <- c("flood", "cold", "dust", "heat", "rain", "wind", "hurri", "thund", "wint", "fog", "torn")
## x = propDmgsData
##cycle through the various conditions
for(i in 1:length(conditions2)){
##find indexes of all the conditions that match the first condition
Indx1 <- grep(conditions2[i], x$EVTYPE, ignore.case = TRUE)
## y = WeatherConditions2
##for the weathercondition1 data, find the row that cooresponds with the current condition, and sum the fats injs
dmgSums <- x[Indx1,2]
freqSums <- x[Indx1,3]
y$DMG[grep(conditions2[i], y$ListOfWeatherConditions2, ignore.case = TRUE)] <- colSums(dmgSums)
y$freq[grep(conditions2[i], y$ListOfWeatherConditions2, ignore.case = TRUE)] <- colSums(freqSums)
}
y
}
WeatherConditions2 <- conditionNumsCalc2(x = propDmgsData, y = WeatherConditions2)
##adding in Avalanche separately
##creating an index of rows from propDmgsData that will consolidate under "Avalanche" in WeatherConditions2
AvalancheIndx1 <- c(grep("AVALANCHE", propDmgsData$EVTYPE,ignore.case = TRUE),grep("AVALANCE", propDmgsData$EVTYPE),grep("LANDSLIDE", propDmgsData$EVTYPE, ignore.case = TRUE),grep("Mudslide", propDmgsData$EVTYPE, ignore.case = TRUE))
WeatherConditions2$DMG[WeatherConditions2$ListOfWeatherConditions2 == "Avalanche"] <- sum(propDmgsData[AvalancheIndx1,2])
WeatherConditions2$freq[WeatherConditions2$ListOfWeatherConditions2 == "Avalanche"] <- sum(propDmgsData[AvalancheIndx1,3])
##adding in lightning data separetely
##lightning occurs with thunderstorms, so the two will be aggregated as one for the sake of simplicity
LightIndx1 <- grep("Light", propDmgsData$EVTYPE, ignore.case = TRUE)
WeatherConditions2$DMG[WeatherConditions2$ListOfWeatherConditions2 == "ThunderStorm"] <- WeatherConditions2$DMG[WeatherConditions2$ListOfWeatherConditions2 == "ThunderStorm"] + sum(propDmgsData[LightIndx1,2])
WeatherConditions2$freq[WeatherConditions2$ListOfWeatherConditions2 == "ThunderStorm"] <- WeatherConditions2$freq[WeatherConditions2$ListOfWeatherConditions2 == "ThunderStorm"] + sum(propDmgsData[LightIndx1,3])
##adding in snow and ice separetly
##snow and ice are being aggregated into WinterStorm for simplicity
SnowIndx1 <- c(grep("snow", propDmgsData$EVTYPE,ignore.case = TRUE), grep("ice", propDmgsData$EVTYPE, ignore.case = TRUE))
WeatherConditions2$DMG[WeatherConditions2$ListOfWeatherConditions2 == "WinterStorm"] <- WeatherConditions2$DMG[WeatherConditions2$ListOfWeatherConditions2 == "WinterStorm"] + sum(propDmgsData[SnowIndx1,2])
WeatherConditions2$freq[WeatherConditions2$ListOfWeatherConditions2 == "WinterStorm"] <- WeatherConditions2$freq[WeatherConditions2$ListOfWeatherConditions2 == "WinterStorm"] + sum(propDmgsData[SnowIndx1,3])
The following code was used to create a panel plot of total fatalities and injuries, and average fatalities and injuries per occurence.
library(reshape)
##
## Attaching package: 'reshape'
## The following object is masked from 'package:dplyr':
##
## rename
library(ggplot2)
require(gridExtra)
## Loading required package: gridExtra
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
##melt data for better graphing
moltenFatsInjs <- melt(WeatherConditions1, id = c("ListOfWeatherConditions1", "freq"))
##total fatalities and injuries
a <- ggplot(moltenFatsInjs, aes(factor(ListOfWeatherConditions1), value, fill = variable)) + geom_bar(stat="identity", position = "dodge") + scale_fill_brewer(palette = "Set1") + ggtitle("Fatalities and Injuries Across different Weather Conditions") + xlab("Weather Conditions") + ylab("Fatalities or Injuries") + scale_fill_discrete(name = "", breaks = c("fat", "inj"), labels = c("Fatalities", "Injuries"))
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
##average fatalities
b <- ggplot(moltenFatsInjs, aes(factor(ListOfWeatherConditions1), value/freq, fill = variable)) + geom_bar(stat="identity", position = "dodge") + scale_fill_brewer(palette = "Set1") + ggtitle("Average Fatalities and Injuries Across different Weather Conditions") + xlab("Weather Conditions") + ylab("Average Fatalities or Injuries") + scale_fill_discrete(name = "", breaks = c("fat", "inj"), labels = c("Fatalities", "Injuries"))
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
grid.arrange(a,b, ncol = 1)
The Graphs here clearly show that in total, tornados cause the most injuries. No other condition comes close when it comes to tornados. Though fatality figures for tornados are much smaller in comparison, they still cause the most compared to the others. On average per occurence though, Hurricanes cause more injuries and extreme heat is the most deadly.
WeatherConditions1
## ListOfWeatherConditions1 fat inj freq
## 1 Avalanche 269 226 265
## 2 Flood 1525 8604 1411
## 3 Cold 451 320 332
## 4 Dust 24 483 60
## 5 Heat 3138 9224 938
## 6 Rain 114 305 136
## 7 Wind 1451 11498 5155
## 8 Hurricane 135 1328 69
## 9 ThunderStorm 1030 7714 4376
## 10 WinterStorm 550 5300 653
## 11 Fog 81 1077 127
## 12 Tornado 5661 91407 7935
The following code was used to create a panel plot of total damages to property, and average property damage per occurence.
library(reshape)
library(ggplot2)
require(gridExtra)
##melt data for better graphing
c <- ggplot(WeatherConditions2, aes(factor(ListOfWeatherConditions2), DMG)) + geom_bar(stat="identity", position = "dodge", fill = "steelblue") + scale_fill_brewer(palette = "Set1") + ggtitle("Property Damages Across Different Weather Conditions") + xlab("Weather Conditions") + ylab("Total Damages in Dollars")
d <- ggplot(WeatherConditions2, aes(factor(ListOfWeatherConditions2), DMG/freq)) + geom_bar(stat="identity", position = "dodge", fill = "steelblue") + scale_fill_brewer(palette = "Set1") + ggtitle("Average Property Damages Across Different Weather Conditions") + xlab("Weather Conditions") + ylab("Average Damages in Dollars")
grid.arrange(c,d, ncol = 1)
Here, we see very clearly that Tornados cause the most property damage overall, and on average. No other disaster comes even close on either graph.
WeatherConditions2
## ListOfWeatherConditions2 DMG freq
## 1 Avalanche 21072.84 254
## 2 Flood 2436131.51 31585
## 3 Cold 15171.09 105
## 4 Dust 5838.63 165
## 5 Heat 3232.86 44
## 6 Rain 59426.21 1083
## 7 Wind 3134605.26 125825
## 8 Hurricane 23757.25 209
## 9 ThunderStorm 1937663.93 65837
## 10 WinterStorm 380107.14 4385
## 11 Fog 17259.26 138
## 12 Tornado 11744712.51 39061