Effect of Severe Weather Events on Public Health and Economics

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

This report extracted the relevant information from the data and condensed it by year and event type. Two histograms were then plotted to attempt to answer two questions about the United States:
. What weather events are the most destructive to the population’s health? . What weather events are the most destructive to the economy?

Libraries

The following libraries were used throughout the code.

library(ggplot2)
library(Hmisc)

## Loading required package: grid
## Loading required package: lattice
## Loading required package: survival
## Loading required package: splines
## Loading required package: Formula
## 
## Attaching package: 'Hmisc'
## 
## The following objects are masked from 'package:base':
## 
##     format.pval, round.POSIXt, trunc.POSIXt, units

library(knitr)
library(reshape2)

Data Processing

A zip file contatining the data was downloaded from Amazon’s cloudfront on the 22/08/2014 into a data folder in the working directory.

# check if a data folder exists; if not then create one
if (!file.exists("data")) {dir.create("data")}

# file URL and destination file
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destfile <- "./data/stormdata.zip"

# download the file and note the time
download.file(fileUrl, destfile = destfile)
dateDownloaded <- date()

The relevant file was then loaded directly into R and subsetted to get only the columns relevant to the scope of this report.

# read the csv file
data_ <- read.csv("./data/stormdata.csv.bz2", header = TRUE)

# subset the data with only the relevant rows
data_ = subset(data_, select = c("EVTYPE","FATALITIES","INJURIES","PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))

For the first part of the report the three relevant data columns were broken up into three vectors: events, fatalities, injuries.

# assign variables to the relevant columns
events <- data_$EVTYPE
fatalities <- data_$FATALITIES
injuries <- data_$INJURIES

Many event types overlapped with each other, for example “EXCESSIVE HEAT” and “HEAT” are noted as two separate different events. A function was made to combine similar events and rename them for clarity. This reduced the amount of event types from 985 to 252.

# standardize the event types and group similar ones together (985 - 252 factors)
events <- sapply(events, FUN = function(x){
  x <- tolower(x)
  if (grepl("storm surge", x)){
    return("Storm surge")
  }
  if (grepl("flood", x)){
    return("Flood")
  }
  if (grepl("tornado", x)){
    return("Tornado")
  }
  if (grepl("snow|ice|wintry|freez|blizzard|cold|winter", x)){
    return("Wintry")
  }
  if (grepl("rain|shower", x)){
    return("Rain")
  }
  if (grepl("thunder|lightning", x)){
    return("Lightning")
  } 
  if (grepl("wind", x)){
    return("Wind")
  } 
  if (grepl("hurricane|tropical|typhoon", x)){
    return("Hurricane")
  }
  if (grepl("dry|drought", x)){
    return("Dry weather")
  }
  if (grepl("heat|warm", x)){
    return("Heat")
  }
  if (grepl("hail", x)){
    return("Hail")
  }
  if (grepl("fire", x)){
    return("Fire")
  }
  else{
    return(capitalize(x))
  }
})

The processed events were then factored in order to get the sum of the fatality and injuries vectors. A data frame was created and ordered so that the ten weather events with most injuries and fatalities could be obtained.

# factor variable to distinguish events
events_factors <- factor(events)

# sum up the fatalities and injuries for each event
fatalities_sum <- aggregate(fatalities, list(events_factors), sum)
injuries_sum <- aggregate(injuries, list(events_factors), sum)
names(fatalities_sum) <- c("Event", "Count"); names(injuries_sum) <- c("Event", "Count")

# create a DF of Event, Injuries, Fatalities
health <- data.frame(fatalities_sum$Event, injuries_sum$Count, fatalities_sum$Count)
names(health) <- c("Event", "Injuries", "Fatalities")

# reorder by injuries and fatalities, then take the top 10 rows
health <- health[with(health, order(-Injuries, -Fatalities)), ][1:10,]

head(health)

##         Event Injuries Fatalities
## 213   Tornado    91407       5661
## 56       Heat     9228       3143
## 251      Wind     8961        988
## 43      Flood     8604       1525
## 89  Lightning     7710       1028
## 252    Wintry     6449       1093

Next the data frame was manipulated in order to obtain a plot where the histograms are ordered by size and where fatalities are stacked ontop of injuries.

## Using Event as id variables

For the second part of the report the crop damage had an extra column that contained the exponent of its units (i.e K = 3, M = 6 etc). A function was made to combine the coefficient and exponent into one value.

# function to combine the coefficient and exponent
convertUnits <- function(coeff, expon){
  
  if (is.na(expon)){
    as.numeric(coeff)
  }
  else if (toupper(expon)== "K"){
    as.numeric(coeff)*10^3
  }
  else if (toupper(expon) == "M"){
    as.numeric(coeff)*10^6
  }
  else if (toupper(expon)== "B"){
    as.numeric(coeff)*10^9
  }
  else{
    as.numeric(coeff)
  }
}

# assign variables to the relevant columns and apply function
prop_dmg <- apply(data_[, c('PROPDMG', 'PROPDMGEXP')], 1, function(y) convertUnits(y['PROPDMG'], y['PROPDMGEXP']))
crop_dmg <- apply(data_[, c('CROPDMG', 'CROPDMGEXP')], 1, function(y) convertUnits(y['CROPDMG'], y['CROPDMGEXP']))

Then an identical procedure to part one was carried out.

## Using Event as id variables

Effect of Severe Weather Events on Public Health and Economics

Chris Daly

Friday, August 22, 2014

Synopsis

Libraries

Data Processing

Results

Population health

Economic damage