Reproducible Research: Peer Assessment 2

Synopsis

In this document, we analyse NOAA weather data to quantify the impact of adverse weather on human health and property. We seek out data where the weather has caused any fatality or injury to human beings as well as events that caused property or crop damage.

From the data and the plots, we can infer that tornadoes have adversely affected the health conditions most. Over 5000 deaths and more than 90,000 injuries have been caused by tornadoes.

In terms of economic damage, floods are the leading causes for property damage, causing over $150B in damages. However, drought and excessive heat has been the leading cause for agricultural damage in the US.

Data Processing

The following sections will perform various operations on the data toward our required assessments

Data Loading and Big Picture of Data

suppressMessages(library(dplyr))
suppressMessages(library(ggplot2))
suppressMessages(library(grid))
suppressMessages(library(gridExtra))
suppressMessages(library(scales))

stormData <- read.csv("repdata-data-StormData.csv", as.is = TRUE)
str(stormData)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Data preparation

In this section, we weed out data elements that are ot directly relevant to assessing the impact to health and economy. As you can see from the structure of the data set, there are many elements that are unrelated to our assessment.

# Identify the fields needed for our assessment so that we can reduce the data during analysis
reqColumns <- c("BGN_DATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")

# Subset the data to include only the essential data elements
essentialStormData <- stormData[, reqColumns]
Consolidate Event Types

There are over 900 event types listed in the dataset, great many of which are either repeats of some kind or ones with minimal impact in terms of either economic or health damage. Therefore, we will consolidate the event types to a manageable set of 10.

The event types that have minimal impact have been categorized as “Other”.

# The number of unique event types
eventTypes <- unique(stormData$EVTYPE)

# Create a new variable to represent the new set of event types and set to "Other"
essentialStormData$disasterType <- "Other"

# Search for the approximate type and consolidate to one in the new set
essentialStormData[grepl("flood|surf|blow-out|swells|fld|dam break|tsunami", 
                         essentialStormData$EVTYPE, ignore.case = TRUE), "disasterType"] <- "Floods"
essentialStormData[grepl("storm|hurricane|typhoon", 
                         essentialStormData$EVTYPE, ignore.case = TRUE), "disasterType"] <- "Hurricanes & Storms"
essentialStormData[grepl("tornado|spout|funnel|whirlwind", 
                         essentialStormData$EVTYPE, ignore.case = TRUE), "disasterType"] <- "Tornadoes"
essentialStormData[grepl("heat|warmth|warm|dry|hot|drought|record high|record temperature|temperature record", 
                         essentialStormData$EVTYPE, ignore.case = TRUE), "disasterType"] <- "Drought & Heat"
essentialStormData[grepl("cold|cool|ice|icy|frost|freeze|snow|winter|blizzard|chill|freezing|avalanche|glaze|sleet|wintry|wintery", 
                         essentialStormData$EVTYPE, ignore.case = TRUE), "disasterType"] <- "Freeze"
essentialStormData[grepl("tstm wind|tstm|thunderstorm|lightning|thunder|hail", 
                         essentialStormData$EVTYPE, ignore.case = TRUE), "disasterType"] <- "Thunder & Lightining"
essentialStormData[grepl("fire|smoke|volcanic", 
                         essentialStormData$EVTYPE, ignore.case = TRUE), "disasterType"] <- "Fire"
essentialStormData[grepl("erosion|slide|slump", 
                         essentialStormData$EVTYPE, ignore.case = TRUE), "disasterType"] <- "Erosion"
essentialStormData[grepl("dust|saharan|wind|wnd", 
                         essentialStormData$EVTYPE, ignore.case = TRUE), "disasterType"] <- "Winds"

# Create as factor, the new event type
essentialStormData$disasterType <- as.factor(essentialStormData$disasterType)
Subset desired data

In further data processing, as we are assessing only the impact to the health and economy, we can further subset the dataset to only those types where the impact is non-zero.

# Subset the data to only the ones that we care about, which is; the set of rows where the fatalities or injuries is non-zero
healthEssentialStormData <- subset(essentialStormData, !(FATALITIES == 0 & INJURIES == 0))

# Subset the data to only the ones that we care about, which is; the set of rows where the property damage or crop damage is non-zero
economyEssentialStormData <- subset(essentialStormData, !(PROPDMG == 0 & CROPDMG == 0)) 
Creating a uniform Property and crop damage representation

The PROPDMGEXP and CROPDMGEXP is a factor variable with multiple levels represented by the amount with which the PROPDMG and CROPDMG values have to be multiplied to get the actual damage value. We shall first set the EXP values to a standard set. Then, we shall store the multiplied values as PROPDMGVAL and CROPDMGVAL

# Alter the Property Damage exponent
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == ""] <- 1
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "B"] <- 1e+09
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "H"] <- 100
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "K"] <- 1000
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "M"] <- 1e+06
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "h"] <- 100
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "m"] <- 1e+06
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "0"] <- 1
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "1"] <- 10
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "2"] <- 100
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "3"] <- 1000
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "4"] <- 10000
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "5"] <- 1e+05
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "6"] <- 1e+06
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "7"] <- 1e+07
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "8"] <- 1e+08

# Alter the crop damage exponent
economyEssentialStormData$CROPEXP[economyEssentialStormData$CROPDMGEXP == "B"] <- 1e+09
economyEssentialStormData$CROPEXP[economyEssentialStormData$CROPDMGEXP == "K"] <- 1000
economyEssentialStormData$CROPEXP[economyEssentialStormData$CROPDMGEXP == "M"] <- 1e+06
economyEssentialStormData$CROPEXP[economyEssentialStormData$CROPDMGEXP == "k"] <- 1000
economyEssentialStormData$CROPEXP[economyEssentialStormData$CROPDMGEXP == "m"] <- 1e+06
economyEssentialStormData$CROPEXP[economyEssentialStormData$CROPDMGEXP == "0"] <- 1
economyEssentialStormData$CROPEXP[economyEssentialStormData$CROPDMGEXP == "2"] <- 100
economyEssentialStormData$CROPEXP[economyEssentialStormData$CROPDMGEXP == ""] <- 1
economyEssentialStormData$CROPEXP[economyEssentialStormData$CROPDMGEXP == "?"] <- 0

# Create new consolidated variables for Damage Values
economyEssentialStormData$PROPDMGVAL <- economyEssentialStormData$PROPDMG * economyEssentialStormData$PROPEXP
economyEssentialStormData$CROPDMGVAL <- economyEssentialStormData$CROPDMG * economyEssentialStormData$CROPEXP
Aggregate data sets for reporting

We shall create health and economy related aggregate datasets for plotting

injuries <- healthEssentialStormData %>% group_by(disasterType) %>% summarise(total=sum(INJURIES)) %>% arrange(-total)
fatalities <-   healthEssentialStormData %>% group_by(disasterType) %>% summarise(total=sum(FATALITIES)) %>% arrange(-total)

# Aggregate economy related data; property and crop damage by disaster types
propDamage <- aggregate(PROPDMGVAL ~ disasterType, economyEssentialStormData, sum) %>% arrange(desc(PROPDMGVAL))
cropDamage <- aggregate(CROPDMGVAL ~ disasterType, economyEssentialStormData, sum) %>% arrange(desc(CROPDMGVAL))

Results

The following sub-sections detail by way of plots, the impact of adverse weather to human health and economy.

Impact to human health

The two leading causes of fatalities related to adverse weather in the US are Tornadoes and Excessive heat.

injuriesPlot <- qplot(disasterType, data = injuries, weight = total, geom = "bar", binwidth = 1) + 
    scale_y_continuous("Number of Injuries") + 
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
    xlab("Severe Weather Type") + 
    ggtitle("Total Injuries by Severe Weather\n Events in the U.S.\n from 1995 - 2011")

fatalitiesPlot <- qplot(disasterType, data = fatalities, weight = total, geom = "bar", binwidth = 1) + 
    scale_y_continuous("Number of Fatalities") + 
    theme(axis.text.x = element_text(angle = 45, 
    hjust = 1)) + xlab("Severe Weather Type") + 
    ggtitle("Total Fatalities by Severe Weather\n Events in the U.S.\n from 1995 - 2011")

grid.arrange(fatalitiesPlot, injuriesPlot, ncol = 2)

Impact to economy

The most property damage is caused by Floods and storms in the US. Significant agricultural damage on the other hand is caused by multiple events such as Drought, Floods, Excessive cold and storms.

propDmagePlot <- qplot(disasterType, data = propDamage, weight = PROPDMGVAL/1e+09, geom = "bar", binwidth = 1) + 
    scale_y_continuous("Damage in Billions USD") + 
    theme(axis.text.x = element_text(angle = 45, 
    hjust = 1)) + xlab("Severe Weather Type") + 
    ggtitle("Property Damage by Severe Weather\n Events in the U.S.\n from 1995 - 2011")

cropDamagePlot <- qplot(disasterType, data = cropDamage, weight = CROPDMGVAL/1e+09, geom = "bar", binwidth = 1) + 
    scale_y_continuous("Damage in Billions USD") + 
    theme(axis.text.x = element_text(angle = 45, 
    hjust = 1)) + xlab("Severe Weather Type") + 
    ggtitle("Crop Damage by Severe Weather\n Events in the U.S.\n from 1995 - 2011")

grid.arrange(propDmagePlot, cropDamagePlot, ncol = 2)