Synopsis

In this report we aim to assess the impact that severe weather events have on health and on the economy of the United States of America. We do this by exploring data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database in order to discover which types of events are most harmful with respect to population health and which types of events have the greatest economic consequences.

Data Processing

Load helper libraries.

library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(stringr)
library(ggplot2)

Loading and Processing the Raw Data

Download and read the data.

if (!file.exists("StormData.csv.bz2")) {
    download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                  destfile = "StormData.csv.bz2", method = "curl")
}
conn <- bzfile("StormData.csv.bz2", open = "r")
stormData <- read.csv(conn)
close(conn)

Subset the original data into two data frames. The first one will have data related to fatalities and injuries, observations where both these values are zero are discarded. The second will contain data relevant to property and crop damages which we will use as a proxy to estimate economic impact. Again, if both of these values are zero for a given observation, it will be discarded. Once we have these two dataframes, the original will be discarded in order to liberate memory.

We also group the health set by event type and summarize each group by calculating the total of injuries and fatalities.

healthData <- filter(stormData,  FATALITIES > 0 | INJURIES > 0) %>%
    select(EVTYPE, FATALITIES, INJURIES) %>%
    group_by(EVTYPE) %>%
    summarize(total_fatalities = sum(FATALITIES), total_injuries = sum(INJURIES))

economicData <- filter(stormData,  PROPDMG > 0 | CROPDMG > 0) %>%
    select(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

rm(stormData)

Data for damages are divided into a column with the value and a column “EXP" with a coefficient which needs to be multiplied to the respective value. To facilitate this we will create a simple function that does this for us and apply it to the economic set. We can now group this set by event type* and summarize it by calculating the total of each group.

exponent_map <- c(2, 3, 6, 9)
names(exponent_map) <- c("h", "k", "m", "b")
calculateValue <- function(num, exp) {
    num * 10 ^ exponent_map[tolower(exp)]
}

economicData$PROPDMGVAL <- calculateValue(economicData$PROPDMG, economicData$PROPDMGEXP)
economicData$CROPDMGVAL <- calculateValue(economicData$CROPDMG, economicData$CROPDMGEXP)
economicData <- group_by(economicData, EVTYPE) %>%
    summarize(total_propdmg = sum(PROPDMGVAL), total_cropdmg = sum(CROPDMGVAL))

Results

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

To answer this question we just need to arrange the total number of fatalities from the health set in descending order and extract the top 10 values.

topFatalities <- arrange(healthData, -total_fatalities)[1:10,]
ggplot(topFatalities, aes(EVTYPE, total_fatalities, fill = EVTYPE)) + 
    geom_bar(stat = "identity") +
    theme(axis.text.x = element_text(angle = 90)) + 
    xlab("Event Type") + ylab("Number of Fatalities") + 
    ggtitle("Fatalities by Severe Weather Events")

Tornadoes, with 5,633 fatalities, seems to be the most harmful event by far since Excessive Heat, the next top event with 1,903, has 3,730 less fatalities.

With respect to injuries, we need to arrange the total number of injuries from the health set in descending order and extract the top 10 values.

topInjuries <- arrange(healthData, -total_injuries)[1:10,]
ggplot(topInjuries, aes(EVTYPE, total_injuries, fill = EVTYPE)) + 
    geom_bar(stat = "identity") +
    theme(axis.text.x = element_text(angle = 90)) + 
    xlab("Event Type") + ylab("Number of Injuries") + 
    ggtitle("Injuries by Severe Weather Events")

Excessive Heat, with 91,346 injuries, seems to be the most harmful event by far since Flash Flood, the next top event with 6,957, has 84,389 less injuries.

  1. Across the United States, which types of events have the greatest economic consequences?

To answer this question we add the total value from crop damage and property damage, order the sum in descending order and extract only the top 10 values. NA values are considered to be equal to zero (0).

topDamages <- economicData %>%
    mutate(total_damages = (ifelse(is.na(total_propdmg), 0, total_propdmg) +
               ifelse(is.na(total_cropdmg), 0, total_cropdmg)) / 10^9)
topDamages <- topDamages[order(-topDamages$total_damages),][1:10,]
ggplot(topDamages, aes(EVTYPE, total_damages, fill = EVTYPE)) + 
    geom_bar(stat = "identity") +
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    xlab("Event Type") + ylab("Damages (in Billions USD)") + 
    ggtitle("Damages by Severe Weather Events")

HURRICANE/TYPHOON with USD69.3 billion and STORM SURGE with USD43.3 billion are the most harmful events to the economy.

sessionInfo()
## R version 3.2.2 (2015-08-14)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.3 LTS
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_2.0.0 stringr_1.0.0 dplyr_0.4.3  
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.2      digest_0.6.8     assertthat_0.1   plyr_1.8.3      
##  [5] grid_3.2.2       R6_2.1.1         gtable_0.1.2     DBI_0.3.1       
##  [9] formatR_1.2.1    magrittr_1.5     scales_0.3.0     evaluate_0.8    
## [13] stringi_1.0-1    lazyeval_0.1.10  rmarkdown_0.9.1  labeling_0.3    
## [17] tools_3.2.2      munsell_0.4.2    yaml_2.1.13      parallel_3.2.2  
## [21] colorspace_1.2-6 htmltools_0.2.6  knitr_1.11.22