Data Science Specialization - Reproducible Research: Peer Assessment 2

Marco Antonio Gonzalez Junior - July 25, 2015

Impact of weather on public health and economy in the United States of America

Synopsis

The goal of this research is to analyze the impact of different weather events on public health and economy based on the weather conditions database collected by the U.S. National Oceanic and Atmospheric Administration’s (NOAA) from 1950 to 2011. The estimates of fatalities, injuries, property and crop damage are used to decide which types of events are most harmful to the population health and economy. This research conclusion is that excessive heat and tornado are most harmful to population health, while flood, drought, and hurricane/typhoon have the greatest economic consequences.

Basic setup

Library dependencies:

library(R.utils)
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.7.0 (2015-02-19) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.19.0 (2015-02-27) successfully loaded. See ?R.oo for help.
## 
## Attaching package: 'R.oo'
## 
## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods
## 
## The following objects are masked from 'package:base':
## 
##     attach, detach, gc, load, save
## 
## R.utils v2.1.0 (2015-05-27) successfully loaded. See ?R.utils for help.
## 
## Attaching package: 'R.utils'
## 
## The following object is masked from 'package:utils':
## 
##     timestamp
## 
## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, inherits, isOpen, parse, warnings
library(plyr)
library(ggplot2)
require(gridExtra)
## Loading required package: gridExtra

Data processing

Download and unzip the data.

setwd("~/projects/DSS_ReproducibleResearch_PA2/")
if (!"stormData.csv.bz2" %in% dir("./data/")) {
    download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "data/stormData.csv.bz2")
    bunzip2("data/stormData.csv.bz2", overwrite = TRUE, remove = FALSE)
}

Read the generated csv file.

if (!"weatherData" %in% ls()) {
    weatherData <- read.csv("data/stormData.csv", sep = ",")
}
if (dim(weatherData)[2] == 37) {
    weatherData$year <- as.numeric(format(as.Date(weatherData$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"), "%Y"))
}
hist(weatherData$year, breaks = 30, main = "Bad weather events registered per year", xlab = "Year", ylab = "Events")

The above histogram shows that the number of events tracked starts to significantly increase around 1995. So, we use the subset of the data from 1990 to 2011 to get most out of good records.

storms <- weatherData[weatherData$year >= 1995, ]

Impact on public health

This section calculates the number of fatalities and injuries caused by the bad weather events.

sortHelper <- function(fieldName, top = 15, dataset = weatherData) {
    index <- which(colnames(dataset) == fieldName)
    field <- aggregate(dataset[, index], by = list(dataset$EVTYPE), FUN = "sum")
    names(field) <- c("EVTYPE", fieldName)
    field <- arrange(field, field[, 2], decreasing = TRUE)
    field <- head(field, n = top)
    field <- within(field, EVTYPE <- factor(x = EVTYPE, levels = field$EVTYPE))
    return(field)
}
fatalities <- sortHelper("FATALITIES", dataset = storms)
injuries <- sortHelper("INJURIES", dataset = storms)

Impact on economy

Converting the property damage and crop damage data into comparable numerical forms according to the meaning of units described in the code book (Storm Events). Both PROPDMGEXP and CROPDMGEXP columns record a multiplier for each observation where we have Hundred (H), Thousand (K), Million (M) and Billion (B).

convertHelper <- function(dataset = storms, fieldName, newFieldName) {
    totalLen <- dim(dataset)[2]
    index <- which(colnames(dataset) == fieldName)
    dataset[, index] <- as.character(dataset[, index])
    logic <- !is.na(toupper(dataset[, index]))
    dataset[logic & toupper(dataset[, index]) == "B", index] <- "9"
    dataset[logic & toupper(dataset[, index]) == "M", index] <- "6"
    dataset[logic & toupper(dataset[, index]) == "K", index] <- "3"
    dataset[logic & toupper(dataset[, index]) == "H", index] <- "2"
    dataset[logic & toupper(dataset[, index]) == "", index] <- "0"
    dataset[, index] <- as.numeric(dataset[, index])
    dataset[is.na(dataset[, index]), index] <- 0
    dataset <- cbind(dataset, dataset[, index - 1] * 10^dataset[, index])
    names(dataset)[totalLen + 1] <- newFieldName
    return(dataset)
}
storms <- convertHelper(storms, "PROPDMGEXP", "propertyDamage")
## Warning in convertHelper(storms, "PROPDMGEXP", "propertyDamage"): NAs
## introduced by coercion
storms <- convertHelper(storms, "CROPDMGEXP", "cropDamage")
## Warning in convertHelper(storms, "CROPDMGEXP", "cropDamage"): NAs
## introduced by coercion
property <- sortHelper("propertyDamage", dataset = storms)
crop <- sortHelper("cropDamage", dataset = storms)

Results

As for the impact on public health, we have got two sorted lists of severe weather events below by the number of people badly affected.

fatalities
##               EVTYPE FATALITIES
## 1     EXCESSIVE HEAT       1903
## 2            TORNADO       1545
## 3        FLASH FLOOD        934
## 4               HEAT        924
## 5          LIGHTNING        729
## 6              FLOOD        423
## 7        RIP CURRENT        360
## 8          HIGH WIND        241
## 9          TSTM WIND        241
## 10         AVALANCHE        223
## 11      RIP CURRENTS        204
## 12      WINTER STORM        195
## 13         HEAT WAVE        161
## 14 THUNDERSTORM WIND        131
## 15      EXTREME COLD        126
injuries
##               EVTYPE INJURIES
## 1            TORNADO    21765
## 2              FLOOD     6769
## 3     EXCESSIVE HEAT     6525
## 4          LIGHTNING     4631
## 5          TSTM WIND     3630
## 6               HEAT     2030
## 7        FLASH FLOOD     1734
## 8  THUNDERSTORM WIND     1426
## 9       WINTER STORM     1298
## 10 HURRICANE/TYPHOON     1275
## 11         HIGH WIND     1093
## 12              HAIL      916
## 13          WILDFIRE      911
## 14        HEAVY SNOW      751
## 15               FOG      718

And the following is a pair of graphs of total fatalities and total injuries affected by these severe weather events.

fatalitiesPlot <- qplot(EVTYPE, data = fatalities, weight = FATALITIES, geom = "bar", binwidth = 1) + 
    scale_y_continuous("Number of fatalities") + 
    theme(axis.text.x = element_text(angle = 45, 
    hjust = 1)) + xlab("Weather condition") + 
    ggtitle("Total fatalities by bad weather\n events in the U.S.A.\n from 1995 to 2011")
injuriesPlot <- qplot(EVTYPE, data = injuries, weight = INJURIES, geom = "bar", binwidth = 1) + 
    scale_y_continuous("Number of injuries") + 
    theme(axis.text.x = element_text(angle = 45, 
    hjust = 1)) + xlab("Weather condition") + 
    ggtitle("Total injuries by bad weather\n events in the U.S.A.\n from 1995 to 2011")
grid.arrange(fatalitiesPlot, injuriesPlot, ncol = 2)

Based on the above histograms, we find that excessive heat and tornado cause most fatalities; tornado causes most injuries in the United States of America from 1995 to 2011.

As for the impact on economy, we have got two sorted lists below by the amount of money cost by damages.

property
##               EVTYPE propertyDamage
## 1              FLOOD   144022037057
## 2  HURRICANE/TYPHOON    69305840000
## 3        STORM SURGE    43193536000
## 4            TORNADO    24935939545
## 5        FLASH FLOOD    16047794571
## 6               HAIL    15048722103
## 7          HURRICANE    11812819010
## 8     TROPICAL STORM     7653335550
## 9          HIGH WIND     5259785375
## 10          WILDFIRE     4759064000
## 11  STORM SURGE/TIDE     4641188000
## 12         TSTM WIND     4482361440
## 13         ICE STORM     3643555810
## 14 THUNDERSTORM WIND     3399282992
## 15    HURRICANE OPAL     3172846000
crop
##               EVTYPE  cropDamage
## 1            DROUGHT 13922066000
## 2              FLOOD  5422810400
## 3          HURRICANE  2741410000
## 4               HAIL  2614127070
## 5  HURRICANE/TYPHOON  2607872800
## 6        FLASH FLOOD  1343915000
## 7       EXTREME COLD  1292473000
## 8       FROST/FREEZE  1094086000
## 9         HEAVY RAIN   728399800
## 10    TROPICAL STORM   677836000
## 11         HIGH WIND   633561300
## 12         TSTM WIND   553947350
## 13    EXCESSIVE HEAT   492402000
## 14 THUNDERSTORM WIND   414354000
## 15              HEAT   401411500

The following graphs spots the total property damage and total crop damage affected by the bad weather events.

propertyPlot <- qplot(EVTYPE, data = property, weight = propertyDamage, geom = "bar", binwidth = 1) + 
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_y_continuous("Property damage in USD")+ 
    xlab("Weather condition") + ggtitle("Total property damage by\n bad weather events in\n the U.S.A. from 1995 to 2011")

cropPlot<- qplot(EVTYPE, data = crop, weight = cropDamage, geom = "bar", binwidth = 1) + 
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_y_continuous("Crop damage in USD") + 
    xlab("Weather condition") + ggtitle("Total crop damage by \nbad weather events in\n the U.S.A. from 1995 - 2011")
grid.arrange(propertyPlot, cropPlot, ncol = 2)

Based on the above histograms, we find that flood and hurricane/typhoon cause most property damage; drought and flood causes most crop damage in the United States of America from 1995 to 2011.

Conclusion

The research leads to the conclusion that excessive heat and tornadoes are most harmful to population health, while flood, drought and hurricane/typhoon have the greatest economic consequences.