Population health and damage costs of severe weather events

Synopsis

Analysis of the NOAA Storm Database from 1950 to November of 2011 was used to answer two basic questions:

  1. Which types of events are most harmful with respect to population health?
  2. Which types of events have the greatest economic consequences?

The top five events with population health consequences were determined and ranked by either the total number of injuries and the total number of fatalities. Similarly, the top five events with economic consequences were determined and ranked by the total damages to crops and property. While tornados cause the greatest number of injuries and fatalities, floods cause the greatest economic damage as measured by the sum of crop and property damage.

Data Processing

Data from NOAA Storm Database from 1950 to November of 2011 was read into R as a zip file. The database tracks major storms and severe weather event characteristics, including property and crop damage estimates as well as any associated injuries and fatalities. The EVTYPE, event type, variable was used to create subsets of the entire dataset to answer specific questions regarding population health, as measured by injuries and fatalities, and regarding economic consequences, as measured by the sum of crop and property damage estimates for each incident.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(lattice)
library(ggplot2)
library(knitr)
# https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2
newdata<- read.csv("repdata-data-StormData.csv.bz2")
#dim(newdata) check size of data set
#names(newdata) check variables included in data set
#make variable names uniformly upper case
newdata$EVTYPE <- toupper(as.character(newdata$EVTYPE))

#Setup analysis by event types as factors
newdata$EVTYPE <- as.factor(newdata$EVTYPE)

#create subsets of data set for answers to questions
health <- group_by(newdata, EVTYPE)

y <- colnames(newdata)
#find columns numbers for events and damages
damage_logic <- grep("*[A-Z]DMG|EVTYPE", y)
#subset dataset for events and damages

DMGdata <- newdata[damage_logic]

factorTOnumber <- function(x) {
  # function to convert order of magnitude factor to number
  facts <- c("B", "M", "K", "H")
  num <- c("1000000000", "1000000", "1000", "100")
  #convert lower case entries to upper case
  x <- toupper(as.character(x))
  
  #assume entries which are not in facts variable are typos
  #set typos to 1
  x <- gsub("[^MBKH]","1", x)
  #convert factors to numbers
  for (i in 1:length(facts)) {
    x <- gsub(facts[i], num[i], as.character(x))
  }
  
  x<- as.numeric(x)
  # check for NAs
  for(i in 1:length(x)) {
    if(is.na(x[i])) {
      x[i] <- 1
    }
  }
  x
}

#cleanup damages dataset
DMGdata$PROPDMGEXP <- factorTOnumber(DMGdata$PROPDMGEXP)
DMGdata$CROPDMGEXP <- factorTOnumber(DMGdata$CROPDMGEXP)
# Question 1
#sort q1 dataset by total injuries or fatalities in EVTYPE

health_sum <- health %>% group_by (EVTYPE) %>% summarize( sum_fatalities = sum (FATALITIES), sum_injured = sum (INJURIES) )
# order dataset by events causing the greatest number of fatalities
fatality_sum <- arrange (health_sum, desc (sum_fatalities) )
# order dataset by events causing the greatest number of injuries
injured_sum <- arrange (health_sum, desc (sum_injured) )
# determine top five event types
topfive_dead <- head(fatality_sum, 5) 
topfive_injury <- head(injured_sum, 5)
# Question 2
# calculate damages in millions of USD

DMGdata <- mutate(DMGdata, property_damage = (PROPDMG * PROPDMGEXP) / 10^6, 
                  crop_damage = (CROPDMG * CROPDMGEXP) /10^6, total_damage = crop_damage + property_damage)

crop_tot_damageDF <- DMGdata %>% group_by (EVTYPE) %>% summarize(tot_crop_damage = sum (crop_damage) )
crop_tot_damageDF<- arrange (crop_tot_damageDF, desc (tot_crop_damage) )

property_tot_damageDF <- DMGdata %>% group_by (EVTYPE) %>% summarize(tot_property_damage = sum (property_damage) )
property_tot_damageDF <- arrange (property_tot_damageDF, desc(tot_property_damage) )

sum_tot_damageDF <- DMGdata %>% group_by(EVTYPE) %>% summarize(tot_damage = sum(total_damage) )
sum_tot_damageDF <- arrange (sum_tot_damageDF, desc (tot_damage) )


# determine top five event types
topfive_crop <- head (crop_tot_damageDF, 5)
topfive_prop <- head (property_tot_damageDF, 5) 
topfive_cropprop <- head (sum_tot_damageDF, 5)

Results

Although the NOAA database contains a number of faulty entries, these entries did not appear to affect the rankings of the top five event types for either population health or total economic consequences. Tornados are responsible for the greatest number of injuries and deaths by far. While associated with a high number of fatalities, excessive heat is responsible for about one-third of the fatalities caused by tornados. Floods are associated with the greatest losses in crops and property damages.

Question 1: Which types of events are most harmful with respect to population health?
Tornados are most harmful to population health as measured by the number of either injuries and deaths (Figures 1 and 2).

#Figure 1
ggplot(data = topfive_injury, aes(EVTYPE,sum_injured)) + 
  labs(x = "Event type", y = "Total injuries", title = "Figure 1: Top five events causing greatest number of injuries") + geom_col()

#Figure 2
ggplot(data = topfive_dead, aes(EVTYPE,sum_fatalities)) + labs(x = "Event type", y =
         "Total deaths", title = "Figure 2: Top five events causing greatest number of fatalities") + geom_col()

Question 2: Which types of events have the greatest economic consequences?
Floods are associated with the greatest damage in economic terms, followed by hurricanes and tornados. (Figure 3)

# Figure 3
ggplot(data = topfive_cropprop, aes(EVTYPE,tot_damage)) + labs(x = "Event type", y =
        "Total damage (in millions USD)", title = "Figure 3: Top five events causing greatest economic damage") + geom_col()