Coursera - Reproducible Research: Peer Assignment 2

Synopsis

Storms are well known contributors to both population health risk and economic risk in the United States. A wealthy source of information for analysing such risk is the NOAA storm database. This database is a compilation of storms and other significant weather phenomena and their cost to human fatalities and injuries and economic risk due to property and crop loss.

Data Processing

Data is pulled directly from the Coursera - Reproducible Research Project 2 description link.

# Packages
library(dplyr)
library(lubridate)
library(ggplot2)
library(dplyr)
library(RCurl)

# pull data
URL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
x <- download.file(URL, destfile='StormData.csv.bz2', method='curl')

# clean dates
stormData <- read.csv("StormData.csv.bz2")
stormData$BGN_DATE <- mdy_hms(stormData$BGN_DATE) 

The arithmatic mean for fatalities and injuries is calculated from the aggregated event types. The top 10 events for each is stored for further analysis.

# population health analysis
# calculates average fatalities and injuries by event type
fatalities_mean <- stormData[!is.na(stormData$FATALITIES),] %>% group_by(EVTYPE) %>%
        summarise(Average = mean(FATALITIES))
fatalities_mean <- fatalities_mean[order(fatalities_mean$Average, decreasing = TRUE), ]
fatalities_mean_top <- fatalities_mean[0:10, ]
# indentifies ordered factor for plotting
fatalities_mean_top$EVTYPE <- factor(fatalities_mean_top$EVTYPE, 
                                     levels = fatalities_mean_top$EVTYPE) 
injuries_mean <- stormData[!is.na(stormData$INJURIES),] %>% group_by(EVTYPE) %>%
        summarise(Average = mean(INJURIES))
injuries_mean <- injuries_mean[order(injuries_mean$Average, decreasing = TRUE), ]
injuries_mean_top <-  injuries_mean[0:10, ]
injuries_mean_top$EVTYPE <- factor(injuries_mean_top$EVTYPE, 
                                   levels = injuries_mean_top$EVTYPE)

Similiarly, the top 10 events by damage cost is stored. The total damage by event is calculated by summing the property and crop damage.

# Economic damage analysis
damage <- data.frame(stormData$EVTYPE, stormData$PROPDMG, stormData$PROPDMGEXP,
                     stormData$CROPDMG, stormData$CROPDMGEXP)
colnames(damage) <- c('Event', 'PropertyDamage', 'PROPDMGEXP', 'CropDamage',
                      'CROPDMGEXP')

# EXP lookup table - assumptions have been made regarding the data recording
# work. Full explanation can be found at 
# https://rstudio-pubs-static.s3.amazonaws.com/58957_37b6723ee52b455990e149edde45e5b6.html

EXP_lookup <- data.frame(c('-','?','+','0','1','2','3','4','5','6','7',
                           '8','B','h','H','k','K','m','M'),
                         c(0, 0, 1, 10, 10, 10, 10, 10, 10, 10, 10, 10,
                           1000000000, 100, 100, 1000, 1000, 1000000, 1000000))
colnames(EXP_lookup) <- c('code', 'multiplier')
damage <- merge(damage, EXP_lookup, by.x = "PROPDMGEXP", by.y = "code")
damage$PropertyDamageTotal <- damage$PropertyDamage * damage$multiplier
damage <- merge(damage, EXP_lookup, by.x = "CROPDMGEXP", by.y = "code")
damage$CropDamageTotal <- damage$CropDamage * damage$multiplier.y
damage <- data.frame(damage$Event, (damage$CropDamageTotal + 
                                            damage$PropertyDamageTotal) / 1000000000)
colnames(damage) <- c('Event', 'Damage')
damage <- damage %>% group_by(Event) %>% summarise(Damage = sum(Damage))
damage <- damage[order(damage$Damage, decreasing = TRUE), ]
damage_top <- damage[0:10, ]
damage_top$Event <- factor(damage_top$Event, levels = damage_top$Event)

Results

Across the United States, the events (as indicated in the EVTYPE variable of the NOAA Storm Database) that are most harmful with respect to population health has been broken down by fatalities and injuries, as they show little causation similiarity. The category TORNADOES, TSTM WIND AND HAIL is by far the largest contributor to death while HEAT WAVE is the single largest cause of injury.

ggplot(fatalities_mean_top, aes(x=EVTYPE, y=Average)) +
        geom_bar(stat="identity") +
        coord_flip() +
        ggtitle("Top Fatalities") +
        xlab("Event")

ggplot(injuries_mean_top, aes(x=EVTYPE, y=Average)) +
        geom_bar(stat="identity") +
        coord_flip() +
        ggtitle("Top Injuries") +
        xlab("Event")

The types of events which have the greatest economic consequences is shown in a single comparison of the summed property and crop damage cost totals. FLOOD is by far the largest source of economic damage.

ggplot(damage_top, aes(x=Event, y=Damage)) +
        geom_bar(stat="identity") +
        coord_flip() +
        ggtitle("Top Damage") +
        ylab("Total Damage (billions)")