Synopsis

The United States is subject to a broad array of meteorological phenomena of varying degrees of severity. Many such events can have direct impact human populations. In this analysis, we consider which events are most harmful to population health by examining the total reported number of fatalities and total reported injuries per event type. We also consider which events have greatest economic consequences by examining the total of estimates to both crop damage and property damage, per event type.

Data Processing

The data for this assignment are in a compressed (.bz2) comma-separated-value file. The archive is available to download file from the course web site:

Storm Data [47Mb]

Accompanying documentation (PDF) of the database is available. It describes some of the variables and can be obtained at the addresses below:

National Weather Service Storm Data Documentation

National Climatic Data Center Storm Events FAQ

Required libraries are loaded. Data are loaded from the original bz2 file, and, in order to reduce processing time, just the relevant columns are subsetted. Dump session info for reproducibility. A small subroutine, myKable, is set up to apply clean formatting to results tables.

library(dplyr)
library(stringr)
library(ggplot2)
library(kableExtra)
library(RCurl)
library(scales)

sessionInfo()
## R version 4.0.0 (2020-04-24)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18362)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] scales_1.1.1     RCurl_1.98-1.2   kableExtra_1.1.0 ggplot2_3.3.0   
## [5] stringr_1.4.0    dplyr_0.8.5     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.4.6      pillar_1.4.4      compiler_4.0.0    bitops_1.0-6     
##  [5] tools_4.0.0       digest_0.6.25     evaluate_0.14     lifecycle_0.2.0  
##  [9] tibble_3.0.1      gtable_0.3.0      viridisLite_0.3.0 pkgconfig_2.0.3  
## [13] rlang_0.4.6       rstudioapi_0.11   yaml_2.2.1        xfun_0.13        
## [17] withr_2.2.0       httr_1.4.1        knitr_1.28        xml2_1.3.2       
## [21] vctrs_0.2.4       hms_0.5.3         webshot_0.5.2     grid_4.0.0       
## [25] tidyselect_1.0.0  glue_1.4.0        R6_2.4.1          rmarkdown_2.1    
## [29] purrr_0.3.4       readr_1.3.1       magrittr_1.5      codetools_0.2-16 
## [33] ellipsis_0.3.0    htmltools_0.4.0   assertthat_0.2.1  rvest_0.3.5      
## [37] colorspace_1.4-1  stringi_1.4.6     munsell_0.5.0     crayon_1.3.4
#url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"

setwd("d:/courses/coursera-r-programming/reproducible-research/week-four/assignment/")

#getURL(url)

data <- read.csv("repdata_data_StormData.csv.bz2", stringsAsFactors = FALSE, header = TRUE)

dataSubset <- data[,c(8,23:28)]

myKable <- function(x, columnNames, caption){

  kable(x, format = "html", col.names = columnNames, align= "l", caption = caption) %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "bordered"), full_width = FALSE, font_size = 11, position = "left") %>%
    column_spec(1, width = "20em") %>%
    column_spec(2, width = "14em")
}

Examine which events report the highest number of fatalities, subsetting the top 12 event types.

totalFatalities <- dataSubset %>%
  group_by(EVTYPE) %>%
  summarise(FAT = sum(FATALITIES)) 

totalFatalitiesTable <- totalFatalities %>%
  arrange(desc(FAT)) %>%
  head(n = 12L) %>%
  myKable(columnNames = c("Event Type", "Total Fatalities"), caption = "Event types with most fatalities")

Examine which events report the highest number of injuries, subsetting the top 12 event types.

totalInjuries <- dataSubset %>%
  group_by(EVTYPE) %>%
  summarise(INJ = sum(INJURIES)) 

totalInjuriesTable <- totalInjuries %>%
  arrange(desc(INJ)) %>%
  head(n = 12L) %>%
  myKable(columnNames = c("Event Type", "Total Injuries"), caption = "Event types with most injuries")

Each damage data is represented by two fields: the first is a numeric value, the second is an alphabetic character. Alphabetic characters are used to signify magnitude and include “K” for thousands, “M” for millions, and “B” for billions. Replace these alphabetic values with their respective base 10 exponents. Then group the data by event type and multiply out. The resultsing frame is subsetted to present the top 12 most costly weather events in terms of the total of property damage and crop damage.

dataSubset <- dataSubset %>%
  mutate(CROPMULTIPLIER = CROPDMGEXP, PROPMULTIPLIER = PROPDMGEXP) 

dataSubset$CROPMULTIPLIER <- str_replace(dataSubset$CROPMULTIPLIER, "[hH]", "2")
dataSubset$CROPMULTIPLIER <- str_replace(dataSubset$CROPMULTIPLIER, "[Kk]", "3")
dataSubset$CROPMULTIPLIER <- str_replace(dataSubset$CROPMULTIPLIER, "[mM]", "6")
dataSubset$CROPMULTIPLIER <- str_replace(dataSubset$CROPMULTIPLIER, "[bB]", "9")
dataSubset$PROPMULTIPLIER <- str_replace(dataSubset$PROPMULTIPLIER, "[hH]", "2")
dataSubset$PROPMULTIPLIER <- str_replace(dataSubset$PROPMULTIPLIER, "[Kk]", "3")
dataSubset$PROPMULTIPLIER <- str_replace(dataSubset$PROPMULTIPLIER, "[mM]", "6")
dataSubset$PROPMULTIPLIER <- str_replace(dataSubset$PROPMULTIPLIER, "[bB]", "9")

propertyCost <- dataSubset %>%
  group_by(EVTYPE) %>%
  summarise(INJ = sum(10^(as.numeric(PROPMULTIPLIER)) * as.numeric(PROPDMG), na.rm = TRUE)) 

propertyCostTable <- propertyCost %>%
  arrange(desc(INJ)) %>%
  head(n = 12L) %>%
  myKable(columnNames = c("Event Type", "Property Damage ($)"), caption = "Event types most damaging to property")

cropCost <- dataSubset %>%
  group_by(EVTYPE) %>%
  summarise(INJ = sum(10^(as.numeric(CROPMULTIPLIER)) * as.numeric(CROPDMG), na.rm = TRUE)) 

cropCostTable <- cropCost %>%
  arrange(desc(INJ)) %>%
  head(n = 12L) %>%
  myKable(columnNames = c("Event Type", "Crop Damage ($)"), caption = "Event types most damaging to crops")

Results

totalFatalitiesTable
Event types with most fatalities
Event Type Total Fatalities
TORNADO 5633
EXCESSIVE HEAT 1903
FLASH FLOOD 978
HEAT 937
LIGHTNING 816
TSTM WIND 504
FLOOD 470
RIP CURRENT 368
HIGH WIND 248
AVALANCHE 224
WINTER STORM 206
RIP CURRENTS 204
totalInjuriesTable
Event types with most injuries
Event Type Total Injuries
TORNADO 91346
TSTM WIND 6957
FLOOD 6789
EXCESSIVE HEAT 6525
LIGHTNING 5230
HEAT 2100
ICE STORM 1975
FLASH FLOOD 1777
THUNDERSTORM WIND 1488
HAIL 1361
WINTER STORM 1321
HURRICANE/TYPHOON 1275
propertyCostTable
Event types most damaging to property
Event Type Property Damage ($)
FLOOD 144657709800
HURRICANE/TYPHOON 69305840000
TORNADO 56947380614
STORM SURGE 43323536000
FLASH FLOOD 16822673772
HAIL 15735267456
HURRICANE 11868319010
TROPICAL STORM 7703890550
WINTER STORM 6688497251
HIGH WIND 5270046260
RIVER FLOOD 5118945500
WILDFIRE 4765114000
cropCostTable
Event types most damaging to crops
Event Type Crop Damage ($)
DROUGHT 13972566000
FLOOD 5661968450
RIVER FLOOD 5029459000
ICE STORM 5022113500
HAIL 3025954470
HURRICANE 2741910000
HURRICANE/TYPHOON 2607872800
FLASH FLOOD 1421317100
EXTREME COLD 1292973000
FROST/FREEZE 1094086000
HEAVY RAIN 733399800
TROPICAL STORM 678346000

From the tables above, the injuries and fatalities are aggregated to indicate which events have the greatest effect on human health. The aggregated values for the top 12 weather events are shown.

totalHealth <- cbind(totalInjuries, totalFatalities = totalFatalities$FAT)

totalHealth$TOTAL <- totalHealth[,2] + totalHealth[,3]

totalHealth %>%
  arrange(desc(TOTAL)) %>%
  head(n = 12L) %>%
  ggplot(aes(x = reorder(EVTYPE, +TOTAL), TOTAL)) + 
  geom_bar(stat='identity') + 
  xlab("Event type") + ylab("Total fatalities and injuries") +
  ggtitle("Top 12 weather events most harmful to human health") +
  theme(plot.title = element_text(hjust = .5)) + 
  coord_flip()

The aggregated cost per event for the 12 most costly events is shown. This indicates that flooding is the wrather event that causes the greatest economic cost.

totalCost <- cbind(propertyCost, cropCost = cropCost$INJ)

totalCost$TOTAL <- totalCost[,2] + totalCost[,3]

totalCost %>%
  arrange(desc(TOTAL)) %>%
  head(n = 12L) %>%
  ggplot(aes(x = reorder(EVTYPE, +TOTAL), TOTAL / 1000000000)) + 
  geom_bar(stat='identity') + 
  xlab("Event type") + ylab("Total cost (Billions of dollars)") +
  ggtitle("Top 12 weather events with greatest economic consequence") +
  theme(plot.title = element_text(hjust = .5)) + 
  coord_flip()

Conclusion

Flooding is the weather event type that has greatest economic consequence. Tornadoes are the weather event type that are most harmful to population health.