Our analysis reveals that by far the largest impact on the population health (both fatalities and injuries caused) is by the tornadoes. The impact on the economy is twofold. Property is in most cases (and with highest costs) damaged by hurricanes/typhoons and storm surges. The crops are on the other hand mostly damaged by floods (including river floods) and ice storms. We should note that both the causes for highest crop damage are not large factors for damage on the other property.
library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.3 ✓ dplyr 1.0.2
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Whole data set is read directly from the compressed CSV file. All the columns are set to be treated as character data type to prevent errors caused by wrongly guessed column types on the input.
storms <- read_csv(bzfile("./repdata_data_StormData.csv.bz2"),
col_types = str_c(rep("c", 37), collapse = "")) %>%
mutate(INJURIES = as.numeric(INJURIES),
FATALITIES = as.numeric(FATALITIES),
PROPDMG = as.numeric(PROPDMG),
CROPDMG = as.numeric(CROPDMG))
Population impact is calculated as sum of injuries and/or fatalities for all the recorded events across the USA.
event_impact <- storms %>%
group_by(EVTYPE) %>%
summarize(injuries = sum(INJURIES),
fatalities = sum(FATALITIES), .groups = "drop") %>%
rename(event = EVTYPE) %>%
mutate(total = injuries + fatalities)
At first we multiply the property and crop damage estimates by the correct magnitude. Some of the characters in the PROPDMGEXP and CROPDMGEXP are not described in the attached file, these are not further considered for the analysis, i.e. treated as NA. Economic impact is calculated as sum of damage for all the recorded events across the USA. The result is reported in billions of US dollars.
event_dmg <- storms %>%
select(EVTYPE, starts_with("PROPDMG"), starts_with("CROPDMG")) %>%
mutate(
propdmg = case_when(
PROPDMGEXP == "K" ~ PROPDMG * 1000,
PROPDMGEXP == "M" ~ PROPDMG * 1000000,
PROPDMGEXP == "B" ~ PROPDMG * 1000000000,
is.na(PROPDMGEXP) ~ PROPDMG),
cropdmg = case_when(
CROPDMGEXP == "K" ~ CROPDMG * 1000,
CROPDMGEXP == "M" ~ CROPDMG * 1000000,
CROPDMGEXP == "B" ~ CROPDMG * 1000000000,
is.na(CROPDMGEXP) ~ CROPDMG)
) %>%
select(EVTYPE, propdmg, cropdmg) %>%
group_by(EVTYPE) %>%
summarize(propdmg = sum(propdmg),
cropdmg = sum(cropdmg), .groups = "drop") %>%
mutate(total = propdmg + cropdmg,
across(where(is.numeric), function(x) x / 1000000)) %>%
rename(event = EVTYPE)
DT::datatable(arrange(event_impact, desc(total)))
event_impact %>%
filter(total >= 500) %>%
pivot_longer(c(injuries, fatalities),
names_to = "impact", values_to = "cases") %>%
ggplot(aes(fct_reorder(event, total), cases)) +
geom_col(color = "black", fill = "gray80") +
coord_flip() +
scale_y_log10() +
labs(y = "number of cases (log scaled)",
x = "type of event") +
facet_wrap(vars(impact)) +
theme_minimal()
Events arranged by number of fatalities and injuries caused (events with more than 500 total cases shown)
We can conclude that regarding population harm tornadoes are the most harmful events as of both injuries and fatalities. Among the other weather events with large impact on the population health and well-being are excessive heat, marine thunderstorm wind, floods and lightning-
The values are in millions of dollars.
DT::datatable(arrange(event_dmg, desc(total)))
event_dmg %>%
filter(total >= 500) %>%
rename(property = propdmg, crops = cropdmg) %>%
pivot_longer(c(property, crops),
names_to = "impact", values_to = "damage") %>%
ggplot(aes(fct_reorder(event, total), damage)) +
geom_col(color = "black", fill = "gray80") +
coord_flip() +
labs(y = "damage (in millions of USD, different scales for crops and property)",
x = "type of event") +
facet_wrap(vars(impact), scales = "free") +
theme_minimal()
Events causing the largest damage to the crops and property (events costing over 0.5 billions USD in total shown)
In case of economic damage, the property is damaged mostly by hurricanes/typhoons and storm surges. The damage on crops is on the other hand largely caused by floods. Overall damage on the property is much larger than the damage on the crops.