Question

From the NOAA Storm Damages Database, what natural disasters cause the most damage in the U.S; both economically and humanitarian wise?

Synopsis

The results of this data analysis conclude that across all the various natural disasters, tornados cause the most injuries by far. All other natural disasters, i.e. avalanches, extreme cold, dust, fog, rain, and even hurricanes do not cause many injuries overall. At the same time, the top disaster that caused the most fatalities was tornados again. Other disasters have overall less fatalities overall. The reason these results are as they are may be because there are fewer occurences or records of other disasters. Because of this, it may be useful to see the average injuries and fatalities per record across all major disasters. On average, Extreme heat, tornados, and hurricanes cause the most fatalities by far and most injuries , though followed closely by fog, flood, dust, and winter storms.

Overall, high winds, floods, and thunder storms cause the most property damage by far; though on average, extreme cold, fog, hurricane, and winter storms cause the most property damage per occurrence.

Data Processing

Data processing for this analysis began with downloading the data from the link below, and loading the data into R.

##downloading the file
download.file(url = "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "./StormData.csv")
##reading in file
stormData <- read.csv(file = "StormData.csv.bz2")
##removing EVTYPE entries that say "Summary" as it is not possible to use them for this analysis
stormData2 <- stormData[-grep("Summary", stormData$EVTYPE),]

Here, the code creates a new table called WeatherConditions1 which records total fatalities, injuries, and occurrences of the 11 different disasters. The 11 disasters chosen to graph were Avalanche, Cold, Dust, Flood, Fog, Heat, Hurricane, Rain, Thunderstorm, Wind, and Winter storm. there were over 900 different disasters recorded under EVTYPE in the original data. The following code attemps to use functions such as grep to scrape as much data from the original data and automatically consolidate it under the 11 chosen disasters. For example, anything with ice and snow was catagorized under Winter Storm for simplicity sake. It simply would not be effective nor meaningful to graph over 900 different EVTYPE’s, many of which were repeated, just spelled or phrased differently. The terms used to scrape the original data were left intentionally vague and simple in order to try and accomodate even misspellings of some of the EVTYPE’s. The following code has been commented often to aid the reader in understanding it.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
##removing all entries that have no fatalities or no injuries as it is not relevent to this analysis
stormData3 <- filter(stormData2, FATALITIES > 0 | INJURIES > 0)
##grouping data by EVTYPE for easy tabling
stormDataGrouped1 <- group_by(stormData3, EVTYPE)
##creating a table of EVTYPE
Evtabs2 <- data.frame(table(stormData3$EVTYPE))
##cleaning table
Evtabs2 <- filter(Evtabs2, Freq > 0)
##create table of the sums of fatalities and injuries by grouped EVTYPE
humanCostData <- summarize(stormDataGrouped1, fat = sum(FATALITIES), inj = sum(INJURIES))
##appending frequency counts to data table
humanCostData$freq <- Evtabs2$Freq
##catagories of events to be used
##most EV types will be consolidated to fall under following catagories
ListOfWeatherConditions1 <- c("Avalanche", "Flood", "Cold", "Dust", "Heat", "Rain", "Wind", "Hurricane", "ThunderStorm", "WinterStorm", "Fog", "Tornado")
##transforming event catagories into dataframe
##weatherConditions1 dataframe will be the main dataframe used for analysis
WeatherConditions1 <- data.frame(ListOfWeatherConditions1)

##creating and innitializing total fatilities and injuries for easy editing later on
WeatherConditions1 <- mutate(WeatherConditions1, fat = 0, inj = 0, freq = 0)


##function to scrape EVTYPE's from humanCostData and consolidate fatalities and injuries under preset catagories in WeatherConditions1
conditionNumsCalc <- function(x,y) {
  
  ## creating words to look for
  conditions <- c("flood", "cold", "dust", "heat", "rain", "wind", "hurri", "thund", "wint", "fog","torn")
  
  ## x = humanCostData
  
  ##cycle through the various conditions
  for(i in 1:length(conditions)){
    
  ##find indexes of all the conditions that match the first condition  
  
  Indx1 <- grep(conditions[i], x$EVTYPE, ignore.case = TRUE)
  
  
  ## y = WeatherConditions1
  ##for the weathercondition1 data, find the row that cooresponds with the current condition, and sum the fats injs
  fatSums <- x[Indx1,2]
  injSums <- x[Indx1,3]
  freqSums <- x[Indx1,4]

  
  y$fat[grep(conditions[i], y$ListOfWeatherConditions1, ignore.case = TRUE)] <- colSums(fatSums)
  y$inj[grep(conditions[i], y$ListOfWeatherConditions1, ignore.case = TRUE)] <- colSums(injSums)
  y$freq[grep(conditions[i], y$ListOfWeatherConditions1, ignore.case = TRUE)] <- colSums(freqSums)

  }
  y
  
}

WeatherConditions1 <- conditionNumsCalc(x = humanCostData, y = WeatherConditions1)

##adding in Avalanche separately
##creating an index of rows from humanCostData that will consolidate under "Avalanche" in WeatherConditions1
AvalancheIndx1 <- c(grep("AVALANCHE", humanCostData$EVTYPE,ignore.case = TRUE),grep("AVALANCE", humanCostData$EVTYPE),grep("LANDSLIDE", humanCostData$EVTYPE, ignore.case = TRUE),grep("Mudslide", humanCostData$EVTYPE, ignore.case = TRUE))

    WeatherConditions1$fat[WeatherConditions1$ListOfWeatherConditions1 == "Avalanche"] <- sum(humanCostData[AvalancheIndx1,2])

  WeatherConditions1$inj[WeatherConditions1$ListOfWeatherConditions1 == "Avalanche"] <- sum(humanCostData[AvalancheIndx1,3])
  
  WeatherConditions1$freq[WeatherConditions1$ListOfWeatherConditions1 == "Avalanche"] <- sum(humanCostData[AvalancheIndx1,4])

##adding in lightning data separetely
##lightning occurs with thunderstorms, so the two will be aggregated as one for the sake of simplicity
LightIndx1 <- grep("Light", humanCostData$EVTYPE, ignore.case = TRUE)
##fatalities
WeatherConditions1$fat[WeatherConditions1$ListOfWeatherConditions1 == "ThunderStorm"] <- WeatherConditions1$fat[WeatherConditions1$ListOfWeatherConditions1 == "ThunderStorm"] + sum(humanCostData[LightIndx1,2])
##injuries
WeatherConditions1$inj[WeatherConditions1$ListOfWeatherConditions1 == "ThunderStorm"] <- WeatherConditions1$inj[WeatherConditions1$ListOfWeatherConditions1 == "ThunderStorm"] + sum(humanCostData[LightIndx1,3])

WeatherConditions1$freq[WeatherConditions1$ListOfWeatherConditions1 == "ThunderStorm"] <- WeatherConditions1$freq[WeatherConditions1$ListOfWeatherConditions1 == "ThunderStorm"] + sum(humanCostData[LightIndx1,4])


##adding in snow and ice separetly
##snow and ice are being aggregated into WinterStorm for simplicity
SnowIndx1 <- c(grep("snow", humanCostData$EVTYPE,ignore.case = TRUE), grep("ice", humanCostData$EVTYPE, ignore.case = TRUE))
##fatalities
WeatherConditions1$fat[WeatherConditions1$ListOfWeatherConditions1 == "WinterStorm"] <- WeatherConditions1$fat[WeatherConditions1$ListOfWeatherConditions1 == "WinterStorm"] + sum(humanCostData[SnowIndx1,2])
##injuries
WeatherConditions1$inj[WeatherConditions1$ListOfWeatherConditions1 == "WinterStorm"] <- WeatherConditions1$inj[WeatherConditions1$ListOfWeatherConditions1 == "WinterStorm"] + sum(humanCostData[SnowIndx1,3])

WeatherConditions1$freq[WeatherConditions1$ListOfWeatherConditions1 == "WinterStorm"] <- WeatherConditions1$freq[WeatherConditions1$ListOfWeatherConditions1 == "WinterStorm"] + sum(humanCostData[SnowIndx1,4])

This code was written specifically for creating the WeatherConditions2 table, which catagorized the amount of property damage total per disaster, as well as the frequency. The disaster titles chosen here are the same as above used for WeatherConditions1. Much of the code here is similar to above, though some changes were made to better handle and transform the data specifically for property damage monetary values.

library(dplyr)
##removing all entries with no property damage as they are irrelevant to analysis
stormData4 <- filter(stormData2, PROPDMG > 0)
##factor in the PROPDMGEXP
dmgMultiplier <- function(x){
  for(i in 1:length(x)){
    ##checking if k, then mult by 1000
    if(x$PROPDMGEXP[i] == "K"){
      x$PROPDMG[i] = x$PROPDMG[i] * 1000
    }
    ##checking if m, then mult by 1000000
    if(x$PROPDMGEXP[i] == "M"){
      x$PROPDMG[i] = x$PROPDMG[i] * 1000000
    }
  }
  x
}
##apply dmgMultiplier
stormData4 <- dmgMultiplier(stormData4)
##grouping by EVTYPE
stormData4 <- group_by(stormData4, EVTYPE)
##aggregating EVTYPEs
Evtabs5 <- data.frame(table(stormData4$EVTYPE))
##cleaning table
Evtabs5 <- filter(Evtabs5, Freq > 0)
##creating dataframe of aggregated prop damage by EVTYPE
propDmgsData <- summarize(stormData4, propDMG = sum(PROPDMG))
##appending frequency counts to data table
propDmgsData$freq <- Evtabs5$Freq

##catagories of events to be used
##most EV types will be consolidated to fall under following catagories
ListOfWeatherConditions2 <- c("Avalanche", "Flood", "Cold", "Dust", "Heat", "Rain", "Wind", "Hurricane", "ThunderStorm", "WinterStorm", "Fog", "Tornado")
##transforming event catagories into dataframe
##weatherConditions2 dataframe will be the main dataframe used for analysis
WeatherConditions2 <- data.frame(ListOfWeatherConditions2)

##creating and innitializing total damages for easy editing later on
WeatherConditions2 <- mutate(WeatherConditions2, DMG = 0, freq = 0)



##function to scrape EVTYPE's from humanCostData and consolidate fatalities and injuries under preset catagories in WeatherConditions1
conditionNumsCalc2 <- function(x,y) {
  
  ## creating words to look for
  conditions2 <- c("flood", "cold", "dust", "heat", "rain", "wind", "hurri", "thund", "wint", "fog", "torn")
  ## x = propDmgsData
  ##cycle through the various conditions
  for(i in 1:length(conditions2)){
  ##find indexes of all the conditions that match the first condition  
  Indx1 <- grep(conditions2[i], x$EVTYPE, ignore.case = TRUE)
  ## y = WeatherConditions2
  ##for the weathercondition1 data, find the row that cooresponds with the current condition, and sum the fats injs
  dmgSums <- x[Indx1,2]
  freqSums <- x[Indx1,3]
  
  y$DMG[grep(conditions2[i], y$ListOfWeatherConditions2, ignore.case = TRUE)] <- colSums(dmgSums)
  y$freq[grep(conditions2[i], y$ListOfWeatherConditions2, ignore.case = TRUE)] <- colSums(freqSums)

  }
  y
}

WeatherConditions2 <- conditionNumsCalc2(x = propDmgsData, y = WeatherConditions2)

##adding in Avalanche separately
##creating an index of rows from propDmgsData that will consolidate under "Avalanche" in WeatherConditions2
AvalancheIndx1 <- c(grep("AVALANCHE", propDmgsData$EVTYPE,ignore.case = TRUE),grep("AVALANCE", propDmgsData$EVTYPE),grep("LANDSLIDE", propDmgsData$EVTYPE, ignore.case = TRUE),grep("Mudslide", propDmgsData$EVTYPE, ignore.case = TRUE))

    WeatherConditions2$DMG[WeatherConditions2$ListOfWeatherConditions2 == "Avalanche"] <- sum(propDmgsData[AvalancheIndx1,2])

    WeatherConditions2$freq[WeatherConditions2$ListOfWeatherConditions2 == "Avalanche"] <- sum(propDmgsData[AvalancheIndx1,3])


##adding in lightning data separetely
##lightning occurs with thunderstorms, so the two will be aggregated as one for the sake of simplicity
LightIndx1 <- grep("Light", propDmgsData$EVTYPE, ignore.case = TRUE)

WeatherConditions2$DMG[WeatherConditions2$ListOfWeatherConditions2 == "ThunderStorm"] <- WeatherConditions2$DMG[WeatherConditions2$ListOfWeatherConditions2 == "ThunderStorm"] + sum(propDmgsData[LightIndx1,2])

WeatherConditions2$freq[WeatherConditions2$ListOfWeatherConditions2 == "ThunderStorm"] <- WeatherConditions2$freq[WeatherConditions2$ListOfWeatherConditions2 == "ThunderStorm"] + sum(propDmgsData[LightIndx1,3])

##adding in snow and ice separetly
##snow and ice are being aggregated into WinterStorm for simplicity
SnowIndx1 <- c(grep("snow", propDmgsData$EVTYPE,ignore.case = TRUE), grep("ice", propDmgsData$EVTYPE, ignore.case = TRUE))

WeatherConditions2$DMG[WeatherConditions2$ListOfWeatherConditions2 == "WinterStorm"] <- WeatherConditions2$DMG[WeatherConditions2$ListOfWeatherConditions2 == "WinterStorm"] + sum(propDmgsData[SnowIndx1,2])

WeatherConditions2$freq[WeatherConditions2$ListOfWeatherConditions2 == "WinterStorm"] <- WeatherConditions2$freq[WeatherConditions2$ListOfWeatherConditions2 == "WinterStorm"] + sum(propDmgsData[SnowIndx1,3])

Results

The following code was used to create a panel plot of total fatalities and injuries, and average fatalities and injuries per occurence.

library(reshape)
## 
## Attaching package: 'reshape'
## The following object is masked from 'package:dplyr':
## 
##     rename
library(ggplot2)
require(gridExtra)
## Loading required package: gridExtra
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
##melt data for better graphing 
moltenFatsInjs <- melt(WeatherConditions1, id = c("ListOfWeatherConditions1", "freq"))

##total fatalities and injuries
a <- ggplot(moltenFatsInjs, aes(factor(ListOfWeatherConditions1), value, fill = variable)) + geom_bar(stat="identity", position = "dodge") + scale_fill_brewer(palette = "Set1") + ggtitle("Fatalities and Injuries Across different Weather Conditions") + xlab("Weather Conditions") + ylab("Fatalities or Injuries") + scale_fill_discrete(name = "", breaks = c("fat", "inj"), labels = c("Fatalities", "Injuries"))
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
##average fatalities
b <- ggplot(moltenFatsInjs, aes(factor(ListOfWeatherConditions1), value/freq, fill = variable)) + geom_bar(stat="identity", position = "dodge") + scale_fill_brewer(palette = "Set1") + ggtitle("Average Fatalities and Injuries Across different Weather Conditions") + xlab("Weather Conditions") + ylab("Average Fatalities or Injuries") + scale_fill_discrete(name = "", breaks = c("fat", "inj"), labels = c("Fatalities", "Injuries"))
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
grid.arrange(a,b, ncol = 1)

The Graphs here clearly show that in total, tornados cause the most injuries. No other condition comes close when it comes to tornados. Though fatality figures for tornados are much smaller in comparison, they still cause the most compared to the others. On average per occurence though, Hurricanes cause more injuries and extreme heat is the most deadly.

WeatherConditions1
##    ListOfWeatherConditions1  fat   inj freq
## 1                 Avalanche  269   226  265
## 2                     Flood 1525  8604 1411
## 3                      Cold  451   320  332
## 4                      Dust   24   483   60
## 5                      Heat 3138  9224  938
## 6                      Rain  114   305  136
## 7                      Wind 1451 11498 5155
## 8                 Hurricane  135  1328   69
## 9              ThunderStorm 1030  7714 4376
## 10              WinterStorm  550  5300  653
## 11                      Fog   81  1077  127
## 12                  Tornado 5661 91407 7935

The following code was used to create a panel plot of total damages to property, and average property damage per occurence.

library(reshape)
library(ggplot2)
require(gridExtra)

##melt data for better graphing 

c <- ggplot(WeatherConditions2, aes(factor(ListOfWeatherConditions2), DMG)) + geom_bar(stat="identity", position = "dodge", fill = "steelblue") + scale_fill_brewer(palette = "Set1") + ggtitle("Property Damages Across Different Weather Conditions") + xlab("Weather Conditions") + ylab("Total Damages in Dollars")

d <- ggplot(WeatherConditions2, aes(factor(ListOfWeatherConditions2), DMG/freq)) + geom_bar(stat="identity", position = "dodge", fill = "steelblue") + scale_fill_brewer(palette = "Set1") + ggtitle("Average Property Damages Across Different Weather Conditions") + xlab("Weather Conditions") + ylab("Average Damages in Dollars")

grid.arrange(c,d, ncol = 1)

Here, we see very clearly that Tornados cause the most property damage overall, and on average. No other disaster comes even close on either graph.

WeatherConditions2
##    ListOfWeatherConditions2         DMG   freq
## 1                 Avalanche    21072.84    254
## 2                     Flood  2436131.51  31585
## 3                      Cold    15171.09    105
## 4                      Dust     5838.63    165
## 5                      Heat     3232.86     44
## 6                      Rain    59426.21   1083
## 7                      Wind  3134605.26 125825
## 8                 Hurricane    23757.25    209
## 9              ThunderStorm  1937663.93  65837
## 10              WinterStorm   380107.14   4385
## 11                      Fog    17259.26    138
## 12                  Tornado 11744712.51  39061