The United States is subject to a broad array of meteorological phenomena of varying degrees of severity. Many such events can have direct impact human populations. In this analysis, we consider which events are most harmful to population health by examining the total reported number of fatalities and total reported injuries per event type. We also consider which events have greatest economic consequences by examining the total of estimates to both crop damage and property damage, per event type.
The data for this assignment are in a compressed (.bz2) comma-separated-value file. The archive is available to download file from the course web site:
Storm Data [47Mb]
Accompanying documentation (PDF) of the database is available. It describes some of the variables and can be obtained at the addresses below:
National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ
Required libraries are loaded. Data are loaded from the original bz2 file, and, in order to reduce processing time, just the relevant columns are subsetted. Dump session info for reproducibility. A small subroutine, myKable, is set up to apply clean formatting to results tables.
library(dplyr)
library(stringr)
library(ggplot2)
library(kableExtra)
library(RCurl)
library(scales)
sessionInfo()
## R version 4.0.0 (2020-04-24)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18362)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] scales_1.1.1 RCurl_1.98-1.2 kableExtra_1.1.0 ggplot2_3.3.0
## [5] stringr_1.4.0 dplyr_0.8.5
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.4.6 pillar_1.4.4 compiler_4.0.0 bitops_1.0-6
## [5] tools_4.0.0 digest_0.6.25 evaluate_0.14 lifecycle_0.2.0
## [9] tibble_3.0.1 gtable_0.3.0 viridisLite_0.3.0 pkgconfig_2.0.3
## [13] rlang_0.4.6 rstudioapi_0.11 yaml_2.2.1 xfun_0.13
## [17] withr_2.2.0 httr_1.4.1 knitr_1.28 xml2_1.3.2
## [21] vctrs_0.2.4 hms_0.5.3 webshot_0.5.2 grid_4.0.0
## [25] tidyselect_1.0.0 glue_1.4.0 R6_2.4.1 rmarkdown_2.1
## [29] purrr_0.3.4 readr_1.3.1 magrittr_1.5 codetools_0.2-16
## [33] ellipsis_0.3.0 htmltools_0.4.0 assertthat_0.2.1 rvest_0.3.5
## [37] colorspace_1.4-1 stringi_1.4.6 munsell_0.5.0 crayon_1.3.4
#url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
setwd("d:/courses/coursera-r-programming/reproducible-research/week-four/assignment/")
#getURL(url)
data <- read.csv("repdata_data_StormData.csv.bz2", stringsAsFactors = FALSE, header = TRUE)
dataSubset <- data[,c(8,23:28)]
myKable <- function(x, columnNames, caption){
kable(x, format = "html", col.names = columnNames, align= "l", caption = caption) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "bordered"), full_width = FALSE, font_size = 11, position = "left") %>%
column_spec(1, width = "20em") %>%
column_spec(2, width = "14em")
}
Examine which events report the highest number of fatalities, subsetting the top 12 event types.
totalFatalities <- dataSubset %>%
group_by(EVTYPE) %>%
summarise(FAT = sum(FATALITIES))
totalFatalitiesTable <- totalFatalities %>%
arrange(desc(FAT)) %>%
head(n = 12L) %>%
myKable(columnNames = c("Event Type", "Total Fatalities"), caption = "Event types with most fatalities")
Examine which events report the highest number of injuries, subsetting the top 12 event types.
totalInjuries <- dataSubset %>%
group_by(EVTYPE) %>%
summarise(INJ = sum(INJURIES))
totalInjuriesTable <- totalInjuries %>%
arrange(desc(INJ)) %>%
head(n = 12L) %>%
myKable(columnNames = c("Event Type", "Total Injuries"), caption = "Event types with most injuries")
Each damage data is represented by two fields: the first is a numeric value, the second is an alphabetic character. Alphabetic characters are used to signify magnitude and include “K” for thousands, “M” for millions, and “B” for billions. Replace these alphabetic values with their respective base 10 exponents. Then group the data by event type and multiply out. The resultsing frame is subsetted to present the top 12 most costly weather events in terms of the total of property damage and crop damage.
dataSubset <- dataSubset %>%
mutate(CROPMULTIPLIER = CROPDMGEXP, PROPMULTIPLIER = PROPDMGEXP)
dataSubset$CROPMULTIPLIER <- str_replace(dataSubset$CROPMULTIPLIER, "[hH]", "2")
dataSubset$CROPMULTIPLIER <- str_replace(dataSubset$CROPMULTIPLIER, "[Kk]", "3")
dataSubset$CROPMULTIPLIER <- str_replace(dataSubset$CROPMULTIPLIER, "[mM]", "6")
dataSubset$CROPMULTIPLIER <- str_replace(dataSubset$CROPMULTIPLIER, "[bB]", "9")
dataSubset$PROPMULTIPLIER <- str_replace(dataSubset$PROPMULTIPLIER, "[hH]", "2")
dataSubset$PROPMULTIPLIER <- str_replace(dataSubset$PROPMULTIPLIER, "[Kk]", "3")
dataSubset$PROPMULTIPLIER <- str_replace(dataSubset$PROPMULTIPLIER, "[mM]", "6")
dataSubset$PROPMULTIPLIER <- str_replace(dataSubset$PROPMULTIPLIER, "[bB]", "9")
propertyCost <- dataSubset %>%
group_by(EVTYPE) %>%
summarise(INJ = sum(10^(as.numeric(PROPMULTIPLIER)) * as.numeric(PROPDMG), na.rm = TRUE))
propertyCostTable <- propertyCost %>%
arrange(desc(INJ)) %>%
head(n = 12L) %>%
myKable(columnNames = c("Event Type", "Property Damage ($)"), caption = "Event types most damaging to property")
cropCost <- dataSubset %>%
group_by(EVTYPE) %>%
summarise(INJ = sum(10^(as.numeric(CROPMULTIPLIER)) * as.numeric(CROPDMG), na.rm = TRUE))
cropCostTable <- cropCost %>%
arrange(desc(INJ)) %>%
head(n = 12L) %>%
myKable(columnNames = c("Event Type", "Crop Damage ($)"), caption = "Event types most damaging to crops")
totalFatalitiesTable
| Event Type | Total Fatalities |
|---|---|
| TORNADO | 5633 |
| EXCESSIVE HEAT | 1903 |
| FLASH FLOOD | 978 |
| HEAT | 937 |
| LIGHTNING | 816 |
| TSTM WIND | 504 |
| FLOOD | 470 |
| RIP CURRENT | 368 |
| HIGH WIND | 248 |
| AVALANCHE | 224 |
| WINTER STORM | 206 |
| RIP CURRENTS | 204 |
totalInjuriesTable
| Event Type | Total Injuries |
|---|---|
| TORNADO | 91346 |
| TSTM WIND | 6957 |
| FLOOD | 6789 |
| EXCESSIVE HEAT | 6525 |
| LIGHTNING | 5230 |
| HEAT | 2100 |
| ICE STORM | 1975 |
| FLASH FLOOD | 1777 |
| THUNDERSTORM WIND | 1488 |
| HAIL | 1361 |
| WINTER STORM | 1321 |
| HURRICANE/TYPHOON | 1275 |
propertyCostTable
| Event Type | Property Damage ($) |
|---|---|
| FLOOD | 144657709800 |
| HURRICANE/TYPHOON | 69305840000 |
| TORNADO | 56947380614 |
| STORM SURGE | 43323536000 |
| FLASH FLOOD | 16822673772 |
| HAIL | 15735267456 |
| HURRICANE | 11868319010 |
| TROPICAL STORM | 7703890550 |
| WINTER STORM | 6688497251 |
| HIGH WIND | 5270046260 |
| RIVER FLOOD | 5118945500 |
| WILDFIRE | 4765114000 |
cropCostTable
| Event Type | Crop Damage ($) |
|---|---|
| DROUGHT | 13972566000 |
| FLOOD | 5661968450 |
| RIVER FLOOD | 5029459000 |
| ICE STORM | 5022113500 |
| HAIL | 3025954470 |
| HURRICANE | 2741910000 |
| HURRICANE/TYPHOON | 2607872800 |
| FLASH FLOOD | 1421317100 |
| EXTREME COLD | 1292973000 |
| FROST/FREEZE | 1094086000 |
| HEAVY RAIN | 733399800 |
| TROPICAL STORM | 678346000 |
From the tables above, the injuries and fatalities are aggregated to indicate which events have the greatest effect on human health. The aggregated values for the top 12 weather events are shown.
totalHealth <- cbind(totalInjuries, totalFatalities = totalFatalities$FAT)
totalHealth$TOTAL <- totalHealth[,2] + totalHealth[,3]
totalHealth %>%
arrange(desc(TOTAL)) %>%
head(n = 12L) %>%
ggplot(aes(x = reorder(EVTYPE, +TOTAL), TOTAL)) +
geom_bar(stat='identity') +
xlab("Event type") + ylab("Total fatalities and injuries") +
ggtitle("Top 12 weather events most harmful to human health") +
theme(plot.title = element_text(hjust = .5)) +
coord_flip()
The aggregated cost per event for the 12 most costly events is shown. This indicates that flooding is the wrather event that causes the greatest economic cost.
totalCost <- cbind(propertyCost, cropCost = cropCost$INJ)
totalCost$TOTAL <- totalCost[,2] + totalCost[,3]
totalCost %>%
arrange(desc(TOTAL)) %>%
head(n = 12L) %>%
ggplot(aes(x = reorder(EVTYPE, +TOTAL), TOTAL / 1000000000)) +
geom_bar(stat='identity') +
xlab("Event type") + ylab("Total cost (Billions of dollars)") +
ggtitle("Top 12 weather events with greatest economic consequence") +
theme(plot.title = element_text(hjust = .5)) +
coord_flip()
Flooding is the weather event type that has greatest economic consequence. Tornadoes are the weather event type that are most harmful to population health.