Health and economic consequences of severe weather events in the United States

Synopsis

This project involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. In this analysis we explore health (injuries, fatalities) and economic (property damage) consequences of severe weather events in the US. Data is loaded from the Internet, cleaned slightly, summarized by event type, and two plots that indicate event types with worst socioeconomic consequences are produced.

Data Processing

packages <- c("R.utils", "ggplot2", "grid", "gridExtra", "Hmisc")
lapply(packages, library, character.only = TRUE)

## Loading file
if (!file.exists("data")) {dir.create("data")}  
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if (!file.exists("./data/StormData.csv.bz2")) 
    {download.file(fileUrl, destfile = "./data/StormData.csv.bz2", method = "curl")}
if (!file.exists("./data/StormData.csv"))
    {bunzip2(filename = "./data/StormData.csv.bz2", remove = FALSE)}
stormData <- read.csv("./data/StormData.csv")

## Cleaning of PROPDMG and PROPDMGEXP variables
stormData$PROPDMGEXP <- toupper(stormData$PROPDMGEXP) 
## Allowed exponent values: 0, H, K, M, B, which stand for 10^0, 10^2, 10^3, 10^6, 10^9
## We'll just assume that "dirty" values are to be multiplied by 10^0.
allowed <- data.frame(char = c("0", "H", "K", "M", "B"), 
              val = c(10^0, 10^2, 10^3, 10^6, 10^9))
stormData$PROPDMGEXP[which(stormData$PROPDMGEXP %nin% allowed$char)] <- 0 
for (i in nrow(allowed)) stormData$PROPDMG[stormData$PROPDMGEXP == allowed$char[i]] <- 
    stormData$PROPDMG[stormData$PROPDMGEXP == allowed$char[i]] * allowed$val[i]

## Plotting
## Plot1 - Across the United States, which types of events (as indicated in the 
## EVTYPE variable) are most harmful with respect to population health?
plot1data <-  aggregate(cbind(INJURIES, FATALITIES) ~ EVTYPE, stormData, sum)
#top n events 
df1 <- droplevels(head(plot1data[order(plot1data$INJURIES, decreasing=TRUE),], n = 6))
df2 <- droplevels(head(plot1data[order(plot1data$FATALITIES, decreasing=TRUE),], n = 6))
plot1.1 <- ggplot(df1, 
        aes(reorder(EVTYPE, -INJURIES), 
            INJURIES)) + geom_bar(stat = "identity") + xlab("EVENT TYPE")
plot1.2 <- ggplot(df2, 
        aes(reorder(EVTYPE, -FATALITIES), 
            FATALITIES)) + geom_bar(stat = "identity") + xlab("EVENT TYPE")

## Plot2 - Across the United States, which types of events 
## have the greatest economic consequences?
plot2data <-  aggregate(PROPDMG ~ EVTYPE, stormData, sum)
df3 <- droplevels(head(plot2data[order(plot2data$PROPDMG, decreasing=TRUE),], n = 6))
plot2 <- ggplot(df3, aes(reorder(EVTYPE, -PROPDMG), PROPDMG)) + 
geom_bar(aes(fill = EVTYPE), stat = "identity") + xlab("EVENT TYPE") + 
labs(title = "Most Harmful Events By Economic Consequences")

Results

grid.arrange(plot1.1, plot1.2, nrow = 2, 
         main = "Most Harmful To Population Health Events In The US")

plot of chunk plot1

It appears that maximum number of injuries per event type is 9.1346 × 104, while mean value is 142.668. Maximum number of fatalities is 5633.

plot2

plot of chunk plot2

From the economic perspective, the most damage is done by FLOOD, 1.225 × 1011. The mean value is 2.8006 × 108.