Severe Weather Events in the United States

Synopsis

This analysis concludes the Coursera Reproducible Research course, part of the Data Science Specialization. The goal of the assignment is to explore the NOAA Storm Database and explore the effects of severe weather events on both population and economy.

The database covers the time period between 1950 and November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

The analysis aims to investigate which different types of sever weather events are most harmful on the populations health in respect of general injuries and fatalities. Further the economic consequences will be analyzed by exploring the financial damage done to both general property and agriculture (i.e. crops)

Data Processing

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

National Weather Service Storm Data Documentation

National Climatic Data Center Storm Events FAQ

Prior to analysis, this code chunk cleans up the R environment and loads packages that I’ll use:

rm(list=ls())

setwd("C:\\Users\\Joe\\Documents\\R_data")

packages <- c("lubridate", "downloader", "plyr", "dplyr", "tidyr", "stringr", "ggplot2", "rmarkdown", "knitr")

sapply(packages, require, character.only = TRUE, quietly = TRUE)
##  lubridate downloader       plyr      dplyr      tidyr    stringr 
##       TRUE       TRUE       TRUE       TRUE       TRUE       TRUE 
##    ggplot2  rmarkdown      knitr 
##       TRUE       TRUE       TRUE
rm(packages)

And this code chunk downloads and loads the NOAA storm data. Note that I set ‘cache=TRUE’ because of the very slow knit.

URL <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destfile <- "stormData.csv.bz2"

if(!file.exists("stormData.csv.bz2")){
        download.file(URL, destfile)
} 

sData_complete <- read.csv("stormData.csv.bz2")

rm(URL, destfile)

Since this analysis focuses only on the health and economic consequences of severe weather events, I only keep the relevant variables (EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) and drop the rest.

stormData <- sData_complete[,c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
rm(sData_complete)

Results

The analysis focuses on two question:

First, “across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?”

Next, “across the United States, which types of events have the greatest economic consequences?”

Weather Events and Fatalities

To answer the first question, I look at the relationship between event type (tornado, flood, etc.) and fatalities as well as the relationship between event and injuries.

fatal <- aggregate (FATALITIES~EVTYPE, stormData, sum)
fatal <- fatal [order(fatal$FATALITIES, decreasing=TRUE),]
barplot(height = fatal$FATALITIES[1:10], names.arg = fatal$EVTYPE[1:10], las = 2, cex.names= 0.65,
         col = rainbow (10, start=0, end=0.5))
title (main = "Fatalities: Top 10 Events")
title (ylab = "Total number of Fatalities")

The histogram shows that tornados are the deadliest severe weather event in the US over time. Excessive heat and flashflooding are the 2nd and 3rd deadliest events.

Weather Events and Injuries

injuries <- aggregate (INJURIES ~ EVTYPE, stormData, sum)
injuries <- injuries [order(injuries$INJURIES, decreasing=TRUE),]
barplot (height = injuries$INJURIES[1:10], names.arg = injuries$EVTYPE[1:10], las = 2, cex.names= 0.65, cex.axis = .75,
         col = rainbow (10, start=0, end=0.5))
title (main = "Injuries: Top 10 Events")
title (ylab = "Total number of Injuries", line = 3)

The histogram shows that tornados also account for the plurality of injuries in the US. High winds, floods, and excessive heat each account for a large number of injuries as well.

Weather events and property damage

To answer the second question, I need to tidy up the variables estimating property and crop damage. For some unknown and deeply annoying reason, NOAA stores damage estimates in two variables rather than one (e.g., PROPDMG and PROPDMGEXP). The PROPDMG variable indicates the amount and the PROPDMGEXP indicates the unit: “K” for thousands,“M” for millions, or “B” for billions. The same is true for the CROPDMG and CROPDMEXP variables.

The following code chunk changes the values to something I can interpret.

First, I define a basic function to convert the unit into its numeric equivalent (e.g. “K” becomes 1000) and apply the function to both the property and crop damage.

Next, I define two new variables, PROPDMGUSD and CROPDMGUSD, that contain the US dollar amount of weather damages for property and crops and add them together into a comprehensive DAMAGES variable, given in billions of US dollars.

nested_ifelse <- function(x){
        x <- as.character(x)
        ifelse (x == "B", as.numeric(1000000000),
        ifelse(x == "M", as.numeric(1000000), 
        ifelse(x == "K", as.numeric(1000), 0)))
}
stormData$PROPDMGEXP <- toupper(stormData$PROPDMGEXP)
stormData$PROPDMGEXP <- nested_ifelse(stormData$PROPDMGEXP)
stormData$PROPDMGUSD <- as.numeric(stormData$PROPDMG*stormData$PROPDMGEXP)

stormData$CROPDMGEXP <- toupper(stormData$CROPDMGEXP)
stormData$CROPDMGEXP <- nested_ifelse(stormData$CROPDMGEXP)
stormData$CROPDMGUSD <- as.numeric(stormData$CROPDMG*stormData$CROPDMGEXP)

stormData$DAMAGES <- (stormData$PROPDMGUSD + stormData$CROPDMGUSD)/1000000000

The following histogram shows the breakdown of economic damages by event type.

damages <- aggregate (DAMAGES ~ EVTYPE, stormData, sum)
damages <- damages[order(damages$DAMAGES, decreasing=TRUE),]

barplot (height = damages$DAMAGES[1:10], names.arg = damages$EVTYPE[1:10], las = 2, cex.names= 0.65,
         col = rainbow (10, start=0, end=0.5))
title (main = "Economic Damages: Top 10 Events")
title (ylab = "Total Cost of Damages (Billions of USD)")

Based on this histogram, we see that floods account for substantially more damages than other weather events, followed up hurrianes/typhoons and tornados.