0. Synopsis
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This analysis attempts to address:
- Which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health across the United States.
- Which types of events have the greatest economic consequences across the United States.
1. Data Processing
storm_data_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(storm_data_url, destfile = "stormdata.csv.bz2", mode = "wb")
datedl <- date()
storm_data <- read.csv("stormdata.csv.bz2", stringsAsFactors = FALSE)
## taking subset
sub_sd <- storm_data[,c("BGN_DATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP")]
### date tranformations (using data from 2000-2009)
sub_sd$BGN_DATE <- gsub(" .*", "", sub_sd$BGN_DATE)
sub_sd$BGN_DATE <- as.Date(sub_sd$BGN_DATE, "%m/%d/%Y")
sub_sd <- sub_sd[sub_sd$BGN_DATE >= "2000/1/1" & sub_sd$BGN_DATE <= "2009/12/31", ]
sub_sd$BGN_DATE <- format(sub_sd$BGN_DATE, "%Y")
### damage transformations, neglecting crop damages in this analysis.
### ("K" as thousand, "M" as million & "B" as billion)
sub_sd$PROPDMGEXP <- gsub("K", 1e3, sub_sd$PROPDMGEXP)
sub_sd$PROPDMGEXP <- gsub("M", 1e6, sub_sd$PROPDMGEXP)
sub_sd$PROPDMGEXP <- gsub("B", 1e9, sub_sd$PROPDMGEXP)
sub_sd$PROPDMGEXP[sub_sd$PROPDMGEXP < 1000] <- "1"
sub_sd$DMG <- as.numeric(sub_sd$PROPDMG) * as.numeric(sub_sd$PROPDMGEXP)
sub_sd <- subset(sub_sd, select = -c(PROPDMG, PROPDMGEXP))
Storm Data (47 mb) was downloaded Thu Mar 09 00:49:19 2017.
Some of the documents that are available on how the variables are constructed:
- The National Weather Service Storm Data Documentation
- National Climatic Data Center Storm Events FAQ
2. Results
Considering top single events.
head(sub_sd[order(sub_sd$DMG, decreasing = TRUE),])
## BGN_DATE EVTYPE FATALITIES INJURIES DMG
## 605953 2006 FLOOD 0 0 1.150e+11
## 577676 2005 STORM SURGE 0 0 3.130e+10
## 577675 2005 HURRICANE/TYPHOON 0 0 1.693e+10
## 581535 2005 STORM SURGE 0 0 1.126e+10
## 569308 2005 HURRICANE/TYPHOON 5 0 1.000e+10
## 581533 2005 HURRICANE/TYPHOON 0 0 7.350e+09
head(sub_sd[order(sub_sd$FATALITIES, decreasing = TRUE),])
## BGN_DATE EVTYPE FATALITIES INJURIES DMG
## 598500 2005 EXCESSIVE HEAT 49 0 0.0e+00
## 606363 2006 EXCESSIVE HEAT 46 18 1.7e+05
## 629242 2006 EXCESSIVE HEAT 42 0 0.0e+00
## 785239 2009 TSUNAMI 32 129 8.1e+07
## 565388 2005 EXCESSIVE HEAT 30 0 0.0e+00
## 611803 2006 EXCESSIVE HEAT 24 0 0.0e+00
head(sub_sd[order(sub_sd$INJURIES, decreasing = TRUE),])
## BGN_DATE EVTYPE FATALITIES INJURIES DMG
## 529351 2004 HURRICANE/TYPHOON 7 780 5.42e+09
## 667233 2007 EXCESSIVE HEAT 2 519 0.00e+00
## 625168 2006 EXCESSIVE HEAT 4 437 0.00e+00
## 484801 2002 HURRICANE/TYPHOON 1 316 1.75e+08
## 625173 2006 EXCESSIVE HEAT 3 306 0.00e+00
## 621296 2006 HEAT 0 215 0.00e+00
Damages 2000-2009.
df <- aggregate(cbind(FATALITIES,INJURIES,DMG) ~ EVTYPE, sub_sd, sum)
topdmg <- head(df[order(df$DMG, decreasing = TRUE),], 3)
barplot(topdmg$DMG, xaxt = 'n', ylab = "Total Property Damages in USD", xlab = "Weather Event", main = "Top Damages by Extreme Weather Events 2000-2009", cex.axis = 0.7)
axis(1, at = 1:3, topdmg$EVTYPE, cex.axis = 0.7)

topdmg
## EVTYPE FATALITIES INJURIES DMG
## 46 FLOOD 167 178 123879368090
## 80 HURRICANE/TYPHOON 64 1275 69305840000
## 143 STORM SURGE 0 4 43170935000
Harmful to health 2000-2009.
topfatal <- head(df[order(df$FATALITIES, decreasing = TRUE),], 3)
barplot(topfatal$FATALITIES, xaxt = 'n', ylab = "Total Fatalities", xlab = "Weather Event", main = "Top Fatalities by Extreme Weather Events 2000-2009", cex.axis = 0.7)
axis(1, at = 1:3, topfatal$EVTYPE, cex.axis = 0.7)

topfatal
## EVTYPE FATALITIES INJURIES DMG
## 36 EXCESSIVE HEAT 938 3507 3170000
## 152 TORNADO 561 8351 8504158410
## 45 FLASH FLOOD 465 594 9653669510
topinj <- head(df[order(df$INJURIES, decreasing = TRUE),], 3)
barplot(topinj$INJURIES, xaxt = 'n', ylab = "Total Injuries", xlab = "Weather Event", main = "Top Injuries by Extreme Weather Events 2000-2009", cex.axis = 0.7)
axis(1, at = 1:3, topinj$EVTYPE, cex.axis = 0.7)

topinj
## EVTYPE FATALITIES INJURIES DMG
## 152 TORNADO 561 8351 8504158410
## 36 EXCESSIVE HEAT 938 3507 3170000
## 93 LIGHTNING 411 2617 479874750
*** Disclaimer: The suggestions and remarks in this page are based on personal research experience. Research practices and approaches vary. Exercise your own judgment regarding the suitability of the content.
*** Analysis environment
sessionInfo()
## R version 3.3.2 (2016-10-31)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 14393)
##
## locale:
## [1] LC_COLLATE=English_Singapore.1252 LC_CTYPE=English_Singapore.1252
## [3] LC_MONETARY=English_Singapore.1252 LC_NUMERIC=C
## [5] LC_TIME=English_Singapore.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] backports_1.0.5 magrittr_1.5 rprojroot_1.2 tools_3.3.2
## [5] htmltools_0.3.5 yaml_2.1.14 Rcpp_0.12.8 stringi_1.1.2
## [9] rmarkdown_1.3 knitr_1.15.1 stringr_1.1.0 digest_0.6.10
## [13] evaluate_0.10