Impact of weather events in the United States

Synopsis

This report is produced in partial fulfillment of the requirements for the Reproducible Research Course offered by John Hopkins Bloomberg School of Public Health and Coursera.

This report describes processing and analytic steps performed on the Storm Events Database provided by the National Climatic Data Center and strives to provide analysis on the impact of weather events to the United States in terms of population health and economic consequences.

This report concludes with displaying the top 10 events that cause the greatest casualties and greatest monetary damage.

Data Processing

Reading data

  1. Raw data is downloaded and read into R
## Download and read raw data
url <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, destfile="stormdata.csv.bz2")
raw <- read.csv(bzfile("stormdata.csv.bz2"), header=TRUE)

Normalizing and selecting data

1. Column names are normalized using the make.names function and then converted to lowercase. Dates are normalized to R Date() objects 2. Required columns are then selected as part of this research report 3. Based on the National Climatic Data Center’s descriptions of the event types as described in source [1], only data recorded after 01 January 1996 is taken into account. This is done to correct potential bias of data to Tornados, Thunderstorm Wind and Hail.

## Normalization of names and required date
names(raw) <- tolower(make.names(names(raw)))
raw$bgn_date <- as.Date(raw$bgn_date, format="%m/%d/%Y")

## Extract required time frame and data variables
data <- subset(raw, bgn_date >= as.Date("1996-01-01"), select=c(bgn_date, countyname, state, evtype, fatalities, injuries, propdmg, propdmgexp, cropdmg, cropdmgexp))

4. Derived values for combined population casualties and combined economic damage are then calculated.
5.Combined population casualties is calculated by summing fatalities and injuries.
6.Combined economic damage is calculated by summing property damage and crop damage. Units interpretation is based on source [2]

## Calculate derived values
data$propdmgexpvalue[data$propdmgexp == "K"] <- 1e3
data$propdmgexpvalue[data$propdmgexp == "M"] <- 1e6
data$propdmgexpvalue[data$propdmgexp == "B"] <- 1e9
data$propdmgexpvalue[data$propdmgexp == "0" | data$propdmgexp == ""] <- 1
data$cropdmgexpvalue[data$cropdmgexp == "K"] <- 1e3
data$cropdmgexpvalue[data$cropdmgexp == "M"] <- 1e6
data$cropdmgexpvalue[data$cropdmgexp == "B"] <- 1e9
data$cropdmgexpvalue[data$cropdmgexp == "0" | data$cropdmgexp == ""] <- 1

## calculated combined property and crop damage
data$combineddamage <- data$propdmg*data$propdmgexpvalue+data$cropdmg*data$cropdmgexpvalue

Harmonizing Event Types

1. It is observed that there are much more event types (see output below) in the data than the described 48 official event types as per source [3].

length(unique(data$evtype))
## [1] 516

2.Therefore the event type values are harmonized to fit with the events within the official 48 types as stipulated from Directive 10-1605. Event names are also normalized to lowercase.

## harmonization
data$evtype <- tolower(data$evtype)
data$evtype[grep("*marine thunderstorm wind*", data$evtype)] <- "marine thunderstorm wind"
data$evtype[grep("*extreme cold*", data$evtype)] <- "extreme cold/wind chill"
data$evtype[grep("*extreme wind chill*", data$evtype)] <- "extreme cold/wind chill"
data$evtype[grep("*astronomical low tide*", data$evtype)] <- "astronomical low tide"
data$evtype[grep("*hurricane*", data$evtype)] <- "hurricane (typhoon)"
data$evtype[grep("*typhoon*", data$evtype)] <- "hurricane (typhoon)"
data$evtype[grep("*tropical depression*", data$evtype)] <- "tropical depression"
data$evtype[grep("*marine strong wind*", data$evtype)] <- "marine strong wind"
data$evtype[grep("*thunderstorm wind*", data$evtype)] <- "thunderstorm wind"
data$evtype[grep("*lake-effect snow*", data$evtype)] <- "lake-effect snow"
data$evtype[grep("*marine high wind*", data$evtype)] <- "marine high wind"
data$evtype[grep("*storm surge/tide*", data$evtype)] <- "storm surge/tide"
data$evtype[grep("*storm surge*", data$evtype)] <- "storm surge/tide"
data$evtype[grep("*tide*", data$evtype)] <- "tide"
data$evtype[grep("^(!extreme)*cold*", data$evtype)] <- "cold/wind chill"
data$evtype[grep("^(!extreme)*wind chill*", data$evtype)] <- "cold/wind chill"
data$evtype[grep("*lakeshore flood*", data$evtype)] <- "lakeshore flood"
data$evtype[grep("*excessive heat*", data$evtype)] <- "excessive heat"
data$evtype[grep("*tropical storm*", data$evtype)] <- "tropical storm"
data$evtype[grep("*winter weather*", data$evtype)] <- "winter weather"
data$evtype[grep("*coastal flood*", data$evtype)] <- "coastal flood"
data$evtype[grep("*frost*", data$evtype)] <- "frost/freeze"
data$evtype[grep("*freeze*", data$evtype)] <- "frost/freeze"
data$evtype[grep("*funnel cloud*", data$evtype)] <- "funnel cloud"
data$evtype[grep("*freezing fog*", data$evtype)] <- "freezing fog"
data$evtype[grep("*volcanic ash*", data$evtype)] <- "volcanic ash"
data$evtype[grep("*winter storm*", data$evtype)] <- "winter storm"
data$evtype[grep("*debris flow*", data$evtype)] <- "debris flow"
data$evtype[grep("*dense smoke*", data$evtype)] <- "dense smoke"
data$evtype[grep("*flash flood*", data$evtype)] <- "flash flood"
data$evtype[grep("*marine hail*", data$evtype)] <- "marine hail"
data$evtype[grep("*rip current*", data$evtype)] <- "rip current"
data$evtype[grep("*strong wind*", data$evtype)] <- "strong wind"
data$evtype[grep("*dust devil*", data$evtype)] <- "dust devil"
data$evtype[grep("*dust storm*", data$evtype)] <- "dust storm"
data$evtype[grep("*heavy rain*", data$evtype)] <- "heavy rain"
data$evtype[grep("*heavy snow*", data$evtype)] <- "heavy snow"
data$evtype[grep("*waterspout*", data$evtype)] <- "waterspout"
data$evtype[grep("*astronomical low tide*", data$evtype)] <- "astronomical low tide"
data$evtype[grep("*avalanche*", data$evtype)] <- "avalanche"
data$evtype[grep("*dense fog*", data$evtype)] <- "dense fog"
data$evtype[grep("*high surf*", data$evtype)] <- "high surf"
data$evtype[grep("*high wind*", data$evtype)] <- "high wind"
data$evtype[grep("*ice storm*", data$evtype)] <- "ice storm"
data$evtype[grep("*lightning*", data$evtype)] <- "lightning"
data$evtype[grep("*blizzard*", data$evtype)] <- "blizzard"
data$evtype[grep("*wildfire*", data$evtype)] <- "wildfire"
data$evtype[grep("*drought*", data$evtype)] <- "drought"
data$evtype[grep("*tornado*", data$evtype)] <- "tornado"
data$evtype[grep("*tsunami*", data$evtype)] <- "tsunami"
data$evtype[grep("*seiche*", data$evtype)] <- "seiche"
data$evtype[grep("^(!lakeshore)*flood*", data$evtype)] <- "flood"
data$evtype[grep("*sleet*", data$evtype)] <- "sleet"
data$evtype[grep("*hail*", data$evtype)] <- "hail"
data$evtype[grep("^(!excessive)*heat*", data$evtype)] <- "heat"

Dataset for analysis

1. Population casualties and economic damage are tabulated for each recorded event type using the plyr package. 2. The data is then sorted by population casualties and economic damage respectively in descending order. 3. The top 10 events are then selected for the aforementioned areas of interest.

library(plyr)
populationData <- ddply(data, .(evtype), summarize, casualties=sum(fatalities, injuries))
populationData <- populationData[order(-populationData$casualties),]

## Select top 10
populationTop10 <- populationData[1:11,]
names(populationTop10) <- c("events", "casualties")
economyData <- ddply(data, .(evtype), summarize, monetarydamage=sum(combineddamage))
economyData <- economyData[order(-economyData$monetarydamage),]

## Select top 10
economyTop10 <- economyData[1:10,]
names(economyTop10) <- c("events", "monetarydamage")

Results

Motivation to use the “Top 10” approach

Firstly, the idea of a “Top 10” is familiar with most people, therefore making the information easy to interpret.

Secondly, the top 10 events for each area of interest covers more than 80% of the casualties caused and more than 90% of monetary damages caused as shown below.

Top 10 Coverage for Population Casualty coverage:

sum(populationTop10$casualties)/sum(populationData$casualties)
## [1] 0.8589

Top 10 Coverage for monetary damage coverage:

sum(economyTop10$monetarydamage)/sum(economyData$monetarydamage)
## [1] 0.9368

Which types of events are most harmful to population health?

It is noted that amongst the top 10 events, there is an event “tstm wind” that is not part of the official 48 events described.

In the absence of any official description, it is recommended not to be considered until further clarification. In light of this, the top eleventh event (high wind) is thus recommended to be considered.

Both events “tstm wind” and “high wind” are nevertheless still shown in the top 10 to reflect this finding.

# hist(populationTop10$casualties)
library(ggplot2)
popplot <- ggplot(populationTop10, aes(x=events, y=casualties)) + geom_point() + coord_flip()
popplot <- popplot + ggtitle("Top 10 population casualties by event")
popplot

plot of chunk populationPlot

populationTop10$rank <- 1:11
names(populationTop10) <- c("Event", "Number of Casualties", "Rank")
populationTop10
##                   Event Number of Casualties Rank
## 266             tornado                22178    1
## 47       excessive heat                 8188    2
## 60                flood                 7172    3
## 110           lightning                 4792    4
## 271           tstm wind                 3870    5
## 73                 heat                 2710    6
## 59          flash flood                 2561    7
## 263   thunderstorm wind                 1567    8
## 331        winter storm                 1483    9
## 83  hurricane (typhoon)                 1453   10
## 79            high wind                 1320   11

Which types of events have the greatest economic consequences?

## format monetary damage into billions of dollars
economyTop10$monetarydamage <- format(economyTop10$monetarydamage/1e9, scientific=FALSE, big.mark=",")

# hist(economyTop10$monetarydamage)
econplot <- ggplot(economyTop10, aes(x=events, y=monetarydamage)) + geom_point() + coord_flip()
econplot <- econplot + ggtitle("Top 10 monetary damage (Billions of US Dollars) by event")
econplot

plot of chunk economyPlot

economyTop10$rank <- 1:10
names(economyTop10) <- c("Events", "Billions of US Dollars", "Rank")
economyTop10
##                  Events Billions of US Dollars Rank
## 60                flood                148.920    1
## 83  hurricane (typhoon)                 87.069    2
## 265                tide                 47.845    3
## 266             tornado                 24.900    4
## 71                 hail                 17.201    5
## 59          flash flood                 16.557    6
## 32              drought                 14.414    7
## 269      tropical storm                  8.320    8
## 79            high wind                  5.883    9
## 323            wildfire                  5.054   10

Environment Information

sessionInfo()
## R version 3.1.1 (2014-07-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_1.0.0 plyr_1.8.1   
## 
## loaded via a namespace (and not attached):
##  [1] colorspace_1.2-4 digest_0.6.4     evaluate_0.5.5   formatR_0.10    
##  [5] grid_3.1.1       gtable_0.1.2     htmltools_0.2.4  knitr_1.6       
##  [9] labeling_0.2     MASS_7.3-33      munsell_0.4.2    proto_0.3-10    
## [13] Rcpp_0.11.2      reshape2_1.4     rmarkdown_0.2.49 scales_0.2.4    
## [17] stringr_0.6.2    tools_3.1.1