0. Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This analysis attempts to address:
  • Which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health across the United States.
  • Which types of events have the greatest economic consequences across the United States.

1. Data Processing

storm_data_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(storm_data_url, destfile = "stormdata.csv.bz2", mode = "wb")
datedl <- date()
storm_data <- read.csv("stormdata.csv.bz2", stringsAsFactors = FALSE)

## taking subset
sub_sd <- storm_data[,c("BGN_DATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP")]

### date tranformations (using data from 2000-2009)
sub_sd$BGN_DATE <- gsub(" .*", "", sub_sd$BGN_DATE)
sub_sd$BGN_DATE <- as.Date(sub_sd$BGN_DATE, "%m/%d/%Y")
sub_sd <- sub_sd[sub_sd$BGN_DATE >= "2000/1/1" & sub_sd$BGN_DATE <= "2009/12/31", ]
sub_sd$BGN_DATE <- format(sub_sd$BGN_DATE, "%Y")

### damage transformations, neglecting crop damages in this analysis. 
### ("K" as thousand, "M" as million & "B" as billion)
sub_sd$PROPDMGEXP <- gsub("K", 1e3, sub_sd$PROPDMGEXP)
sub_sd$PROPDMGEXP <- gsub("M", 1e6, sub_sd$PROPDMGEXP)
sub_sd$PROPDMGEXP <- gsub("B", 1e9, sub_sd$PROPDMGEXP)
sub_sd$PROPDMGEXP[sub_sd$PROPDMGEXP < 1000] <- "1"
sub_sd$DMG <- as.numeric(sub_sd$PROPDMG) * as.numeric(sub_sd$PROPDMGEXP)
sub_sd <- subset(sub_sd, select = -c(PROPDMG, PROPDMGEXP))
Storm Data (47 mb) was downloaded Thu Mar 09 00:49:19 2017.
Some of the documents that are available on how the variables are constructed:
  1. The National Weather Service Storm Data Documentation
  2. National Climatic Data Center Storm Events FAQ

2. Results

Considering top single events.
head(sub_sd[order(sub_sd$DMG, decreasing = TRUE),])
##        BGN_DATE            EVTYPE FATALITIES INJURIES       DMG
## 605953     2006             FLOOD          0        0 1.150e+11
## 577676     2005       STORM SURGE          0        0 3.130e+10
## 577675     2005 HURRICANE/TYPHOON          0        0 1.693e+10
## 581535     2005       STORM SURGE          0        0 1.126e+10
## 569308     2005 HURRICANE/TYPHOON          5        0 1.000e+10
## 581533     2005 HURRICANE/TYPHOON          0        0 7.350e+09
head(sub_sd[order(sub_sd$FATALITIES, decreasing = TRUE),])
##        BGN_DATE         EVTYPE FATALITIES INJURIES     DMG
## 598500     2005 EXCESSIVE HEAT         49        0 0.0e+00
## 606363     2006 EXCESSIVE HEAT         46       18 1.7e+05
## 629242     2006 EXCESSIVE HEAT         42        0 0.0e+00
## 785239     2009        TSUNAMI         32      129 8.1e+07
## 565388     2005 EXCESSIVE HEAT         30        0 0.0e+00
## 611803     2006 EXCESSIVE HEAT         24        0 0.0e+00
head(sub_sd[order(sub_sd$INJURIES, decreasing = TRUE),])
##        BGN_DATE            EVTYPE FATALITIES INJURIES      DMG
## 529351     2004 HURRICANE/TYPHOON          7      780 5.42e+09
## 667233     2007    EXCESSIVE HEAT          2      519 0.00e+00
## 625168     2006    EXCESSIVE HEAT          4      437 0.00e+00
## 484801     2002 HURRICANE/TYPHOON          1      316 1.75e+08
## 625173     2006    EXCESSIVE HEAT          3      306 0.00e+00
## 621296     2006              HEAT          0      215 0.00e+00
Damages 2000-2009.
df <- aggregate(cbind(FATALITIES,INJURIES,DMG) ~ EVTYPE, sub_sd, sum)
topdmg <- head(df[order(df$DMG, decreasing = TRUE),], 3)

barplot(topdmg$DMG, xaxt = 'n', ylab = "Total Property Damages in USD", xlab = "Weather Event", main = "Top Damages by Extreme Weather Events 2000-2009", cex.axis = 0.7)
axis(1, at = 1:3, topdmg$EVTYPE, cex.axis = 0.7)

topdmg
##                EVTYPE FATALITIES INJURIES          DMG
## 46              FLOOD        167      178 123879368090
## 80  HURRICANE/TYPHOON         64     1275  69305840000
## 143       STORM SURGE          0        4  43170935000
Harmful to health 2000-2009.
topfatal <- head(df[order(df$FATALITIES, decreasing = TRUE),], 3)

barplot(topfatal$FATALITIES, xaxt = 'n', ylab = "Total Fatalities", xlab = "Weather Event", main = "Top Fatalities by Extreme Weather Events 2000-2009", cex.axis = 0.7)
axis(1, at = 1:3, topfatal$EVTYPE, cex.axis = 0.7)

topfatal
##             EVTYPE FATALITIES INJURIES        DMG
## 36  EXCESSIVE HEAT        938     3507    3170000
## 152        TORNADO        561     8351 8504158410
## 45     FLASH FLOOD        465      594 9653669510
topinj <- head(df[order(df$INJURIES, decreasing = TRUE),], 3)

barplot(topinj$INJURIES, xaxt = 'n', ylab = "Total Injuries", xlab = "Weather Event", main = "Top Injuries by Extreme Weather Events 2000-2009", cex.axis = 0.7)
axis(1, at = 1:3, topinj$EVTYPE, cex.axis = 0.7)

topinj
##             EVTYPE FATALITIES INJURIES        DMG
## 152        TORNADO        561     8351 8504158410
## 36  EXCESSIVE HEAT        938     3507    3170000
## 93       LIGHTNING        411     2617  479874750
*** Disclaimer: The suggestions and remarks in this page are based on personal research experience. Research practices and approaches vary. Exercise your own judgment regarding the suitability of the content.
*** Analysis environment
sessionInfo()
## R version 3.3.2 (2016-10-31)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 14393)
## 
## locale:
## [1] LC_COLLATE=English_Singapore.1252  LC_CTYPE=English_Singapore.1252   
## [3] LC_MONETARY=English_Singapore.1252 LC_NUMERIC=C                      
## [5] LC_TIME=English_Singapore.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] backports_1.0.5 magrittr_1.5    rprojroot_1.2   tools_3.3.2    
##  [5] htmltools_0.3.5 yaml_2.1.14     Rcpp_0.12.8     stringi_1.1.2  
##  [9] rmarkdown_1.3   knitr_1.15.1    stringr_1.1.0   digest_0.6.10  
## [13] evaluate_0.10