This analysis tries to answer the following two questions:
1 which types of natural events are most harmful with respect to the population health.
2 which types of natural events have the greatest economic consequences.
The data used is from the Storm Data, an official pubblication of the National Oceanic and Atmospheric Administration (NOAA), which documents 1) the occurence of significant weather phenomena with sufficient intensity to cause loss of life, injuries or significant property damage, 2) unusual weather phenomena the generate media attention and 3) other significant meteorological events.
The analysis performed (more details below) shows the the most harmful events for public health (injuries + deaths) are tornadoes, 5.6k fatalities and 91.4k injuries, followed by thunderstorms (712 deaths and 9.5k injuries) and excessive heat (2k deaths and 6.7k injuries).
From an economic perspective, the most harmful events are floods, which costed a total of ~$161bn, followed by hurricanes (~$95bn) and tornadoes (~$60bn).
Previous results are calculated on total values both for health, deaths + injuries, and for economic consequences, property + crop daages. Under results you’ll find a more detailed table, splitted for single variable.
library(magrittr)
library(kableExtra)
library(ggplot2)
Except for magrittr, kableExtra and ggplot2, all the packages used in the analysis are called through the notation package::package_function.
make_table <- function(dataset, pos = "left") {
dataset %>%
knitr::kable(format = "html") %>%
kable_styling(bootstrap_options = "striped", full_width = FALSE, position = pos, font_size = 12)
}
The make_table function uses knitr and kableExtra to create a common output for the tables created in the document, which all use the same style.
get_top <- function(dataset, var_1, var_2) {
var_1 <- as.name(var_1)
var_2 <- as.name(var_2)
dataset %>%
dplyr::group_by({{var_1}}) %>%
dplyr::summarise((!!var_2) := sum(!!sym(var_2))) %>%
dplyr::arrange(desc({{var_2}})) %>%
dplyr::top_n(n = 10, wt = {{var_2}})
}
The get_top function groups by var_1 and than calculates the sum of var_2, orders it and keeps only the top 10 rows by value.
The raw data, which can be downloaded at the following link, is saved in a .csv file compressed via the bzip2 algorithm. The compressed file size is ~47MB.
if (!file.exists("./data/storm_data.csv.bz2")) {
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
downloader::download(url, "./data/storm_data.csv.bz2")
}
noaa <- readr::read_csv("./data/storm_data.csv.bz2", col_types = readr::cols(.default = readr::col_character()))
The code checks if the file storm_data already exists in a data folder under the current project working directory; if not, the downloader::dowload function downloads the file. The readr::read_csv function reads the raw file into R, taking care of the decompression process and assigning it to the variable noaa.
By default, readr guesses the columns types looking at the first 1’000 rows. With this dataset, the guessing process showed multiple parsing failures, including some columns read as logical. To avoid this problem, I forced all data as character and I’ll parse the numeric variables later.
dplyr::glimpse(noaa)
## Observations: 902,297
## Variables: 37
## $ STATE__ <chr> "1.00", "1.00", "1.00", "1.00", "1.00", "1.00", "1.00", ...
## $ BGN_DATE <chr> "4/18/1950 0:00:00", "4/18/1950 0:00:00", "2/20/1951 0:0...
## $ BGN_TIME <chr> "0130", "0145", "1600", "0900", "1500", "2000", "0100", ...
## $ TIME_ZONE <chr> "CST", "CST", "CST", "CST", "CST", "CST", "CST", "CST", ...
## $ COUNTY <chr> "97.00", "3.00", "57.00", "89.00", "43.00", "77.00", "9....
## $ COUNTYNAME <chr> "MOBILE", "BALDWIN", "FAYETTE", "MADISON", "CULLMAN", "L...
## $ STATE <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "A...
## $ EVTYPE <chr> "TORNADO", "TORNADO", "TORNADO", "TORNADO", "TORNADO", "...
## $ BGN_RANGE <chr> "0.00", "0.00", "0.00", "0.00", "0.00", "0.00", "0.00", ...
## $ BGN_AZI <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ BGN_LOCATI <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ END_DATE <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ END_TIME <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ COUNTY_END <chr> "0.00", "0.00", "0.00", "0.00", "0.00", "0.00", "0.00", ...
## $ COUNTYENDN <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ END_RANGE <chr> "0.00", "0.00", "0.00", "0.00", "0.00", "0.00", "0.00", ...
## $ END_AZI <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ END_LOCATI <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ LENGTH <chr> "14.00", "2.00", "0.10", "0.00", "0.00", "1.50", "1.50",...
## $ WIDTH <chr> "100.00", "150.00", "123.00", "100.00", "150.00", "177.0...
## $ F <chr> "3", "2", "2", "2", "2", "2", "2", "1", "3", "3", "1", "...
## $ MAG <chr> "0.00", "0.00", "0.00", "0.00", "0.00", "0.00", "0.00", ...
## $ FATALITIES <chr> "0.00", "0.00", "0.00", "0.00", "0.00", "0.00", "0.00", ...
## $ INJURIES <chr> "15.00", "0.00", "2.00", "2.00", "2.00", "6.00", "1.00",...
## $ PROPDMG <chr> "25.00", "2.50", "25.00", "2.50", "2.50", "2.50", "2.50"...
## $ PROPDMGEXP <chr> "K", "K", "K", "K", "K", "K", "K", "K", "K", "K", "M", "...
## $ CROPDMG <chr> "0.00", "0.00", "0.00", "0.00", "0.00", "0.00", "0.00", ...
## $ CROPDMGEXP <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ WFO <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ STATEOFFIC <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ ZONENAMES <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ LATITUDE <chr> "3040.00", "3042.00", "3340.00", "3458.00", "3412.00", "...
## $ LONGITUDE <chr> "8812.00", "8755.00", "8742.00", "8626.00", "8642.00", "...
## $ LATITUDE_E <chr> "3051.00", "0.00", "0.00", "0.00", "0.00", "0.00", "0.00...
## $ LONGITUDE_ <chr> "8806.00", "0.00", "0.00", "0.00", "0.00", "0.00", "0.00...
## $ REMARKS <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ REFNUM <chr> "1.00", "2.00", "3.00", "4.00", "5.00", "6.00", "7.00", ...
The dataset has 902’297 rows and 37 columns, for a total of 33’384’989 values. As shown by dplyr::glimpse, there are NAs in multiple columns.
The variables needed to answer the questions are less than what we have in the raw dataset. In order to make it more manageable, I kept only 7 columns:
I passed the data frame to janitor::clean_names and to dplyr::rename to convert variable’s names to snake_case, then to readr::type_convert to correctly parse the columns.
noaa %<>%
dplyr::select(EVTYPE, FATALITIES:INJURIES, PROPDMG:CROPDMGEXP) %>%
janitor::clean_names() %>%
dplyr::rename(
ev_type = evtype, prop_dmg = propdmg, prop_dmg_exp = propdmgexp, crop_dmg = cropdmg, crop_dmg_exp = cropdmgexp
)
noaa <-
readr::type_convert(
noaa, col_types = readr::cols(
ev_type = readr::col_character(),
fatalities = readr::col_double(),
injuries = readr::col_double(),
prop_dmg = readr::col_double(),
prop_dmg_exp = readr::col_character(),
crop_dmg = readr::col_double(),
crop_dmg_exp = readr::col_character()
)
)
The table below gives a quick snapshot of the noaa dataset, highlighting the high number of unique values in the event type column (977) and the high number of NAs in the property and crop damages exponents:
noaa %>%
purrr::map_df(
~data.frame(
class = class(.),
count = format(length(.), big.mark = "'"),
unique = format(dplyr::n_distinct(.), big.mark = "'"),
NAs = format(sum(is.na(.)), big.mark = "'")
), .id = "variable"
)
## variable class count unique NAs
## 1 ev_type character 902'297 977 0
## 2 fatalities numeric 902'297 52 0
## 3 injuries numeric 902'297 200 0
## 4 prop_dmg numeric 902'297 1'390 0
## 5 prop_dmg_exp character 902'297 19 465'934
## 6 crop_dmg numeric 902'297 432 0
## 7 crop_dmg_exp character 902'297 9 618'413
The NOAA documentation contains 48 type of events (page 6 of this link), while the ev_type variable of the dataset has 977 unique values.
Below the 10 most frequent event types in the dataset.
noaa %>%
dplyr::group_by(ev_type) %>%
dplyr::tally(sort = TRUE) %>%
dplyr::top_n(n = 10) %>%
dplyr::mutate(n = format(n, big.mark = "'"))
## # A tibble: 10 x 2
## ev_type n
## <chr> <chr>
## 1 HAIL "288'661"
## 2 TSTM WIND "219'944"
## 3 THUNDERSTORM WIND " 82'563"
## 4 TORNADO " 60'652"
## 5 FLASH FLOOD " 54'278"
## 6 FLOOD " 25'326"
## 7 THUNDERSTORM WINDS " 20'843"
## 8 HIGH WIND " 20'212"
## 9 LIGHTNING " 15'755"
## 10 HEAVY SNOW " 15'708"
I found multiple problems in the ev_type column that explain the high number of unique values.
1. Daily summary rows
The dataset has 73 rows that seems a daily summary, which isn’t usefull for the analysis. There’s also one row (not shown) which starts with “?”.
stringr::str_subset(noaa$ev_type, "^Summary") %>% head()
## [1] "Summary Jan 17" "Summary of March 14" "Summary of March 23"
## [4] "Summary of March 24" "Summary of April 3rd" "Summary of April 12"
2. Upper/lowercase, punctuation and multiple spaces
The same event is sometimes uppercase (“COASTAL FLOODING”), lowercase (“coastal flooding”) or a mix of the two (“Coastal Flooding”). Some words are separated by one space (“COASTAL FLOOD”), two spaces (“COASTAL FLOODING/EROSION”) or no spaces at all (“COASTALFLOOD”).
stringr::str_subset(noaa$ev_type, "(?i)^coastal\\s*flood.*") %>% unique()
## [1] "COASTAL FLOOD" "COASTAL FLOODING"
## [3] "Coastal Flooding" "COASTALFLOOD"
## [5] "Coastal Flood" "coastal flooding"
## [7] "COASTAL FLOODING/EROSION" "COASTAL FLOODING/EROSION"
3. Different notations for the same event
As shown in the table above, different observations of the same event have slightly different names. For example “COASTAL FLOOD” and “COASTAL FLOODING”.
In some cases the word is singular, plural or abbreviated: for example “TSTM WIND”, “THUNDERSTORM WIND” and “THUNDERSTORM WINDS” for “Thunderstorm Wind” in the data below.
stringr::str_subset(noaa$ev_type, "(?i)(tstm|thunderstorm).*wind.*") %>% unique() %>% head()
## [1] "TSTM WIND" "THUNDERSTORM WINDS"
## [3] "THUNDERSTORM WIND" "THUNDERSTORM WINDS LIGHTNING"
## [5] "THUNDERSTORM WINDS/HAIL" "THUNDERSTORM WINDS HAIL"
4. multiple events for the same observation
Some observatons have multiple events, as shown below. The rule I used is \(\underline{the\space first\space word\space in\space the\space description\space is\space the\space event\space key\space}\). This is helpfull to make the cleaning process easier and coherent. In the example below, “BLIZZARD/HEAVY SNOW” is recoded as blizzard while “HEAVY SNOW/BLIZZARD” is recoded as heavy snow.
stringr::str_subset(noaa$ev_type, "BLIZZARD") %>% unique()
## [1] "BLIZZARD" "BLIZZARD WEATHER"
## [3] "HIGH WIND/BLIZZARD" "HIGH WIND/BLIZZARD/FREEZING RA"
## [5] "HIGH WIND/ BLIZZARD" "BLIZZARD/HIGH WIND"
## [7] "HIGH WIND/WIND CHILL/BLIZZARD" "BLIZZARD/HEAVY SNOW"
## [9] "GROUND BLIZZARD" "HEAVY SNOW/BLIZZARD"
## [11] "BLIZZARD/FREEZING RAIN" "BLIZZARD AND HEAVY SNOW"
## [13] "BLIZZARD AND EXTREME WIND CHIL" "BLIZZARD/WINTER STORM"
## [15] "HEAVY SNOW/BLIZZARD/AVALANCHE"
The first step I took to clean the data is 1) filter out all rows containing the word summary or starting with a punctuation, 2) use stringr::str_to_lower to lowercase all ev_type, 3) substitute any punctuation with a space and 4) replace any double space with one space.
noaa %<>%
dplyr::filter(
!stringr::str_detect(ev_type, "[Ss]ummary"),
!stringr::str_detect(ev_type, "^[:punct:]")
) %>%
dplyr::mutate(
ev_type = stringr::str_to_lower(ev_type),
ev_type = stringr::str_replace(ev_type, "[:punct:]", " "),
ev_type = stringr::str_replace(ev_type, "\\s{2}", "\\s")
)
The number of unique values decreased to 802 from 977 before, better but still far away from the NOAA’s 48.
The next step I took is to write a code based on some basic RegEx to assign every observation to its event category. I found some events which were already “clean” and that didn’t need any processing: lakeshore flood, lightining, marine hail, marine high wind, marine strong wind, seiche, tropical depression and tsunami.
noaa %<>%
dplyr::mutate(
ev_type = dplyr::case_when(
ev_type = stringr::str_detect(ev_type, "^astro.*") ~ "astronomical low tide",
ev_type = stringr::str_detect(ev_type, "^aval.*") ~ "avalanche",
ev_type = stringr::str_detect(ev_type, "^blizzar.*") ~ "blizzard",
ev_type = stringr::str_detect(ev_type, "^(coastal|cstl).*flood.*|beach.*(flood|fld).*") ~ "coastal flood",
ev_type = stringr::str_detect(ev_type, "^cold.*") ~ "cold/wind chill",
ev_type = stringr::str_detect(ev_type, "^fog|^dense.*fog.*") ~ "dense fog",
ev_type = stringr::str_detect(ev_type, "smoke") ~ "dense smoke",
ev_type = stringr::str_detect(ev_type, "^drough") ~ "drought",
ev_type = stringr::str_detect(ev_type, "^dust.*dev.*") ~ "dust devil",
ev_type = stringr::str_detect(ev_type, "^dust.*storm.*") ~ "dust storm",
ev_type = stringr::str_detect(ev_type, "^(record|excessive|extreme).*heat.*") ~ "excessive heat",
ev_type = stringr::str_detect(ev_type, "^(extreme|record)\\s?(cold|wind).*") ~ "extreme cold/wind chill",
ev_type = stringr::str_detect(ev_type, "^flash.*flood.*") ~ "flash flood",
ev_type = stringr::str_detect(ev_type, "^flood|^(urban|river|small).*(flood|fld).*") ~ "flood",
ev_type = stringr::str_detect(ev_type, "(frost|freeze)") ~ "frost/freeze",
ev_type = stringr::str_detect(ev_type, "^funnel.*cloud.*|^funnel") ~ "funnel cloud",
ev_type = stringr::str_detect(ev_type, "^freez.*fog.*") ~ "freezing fog",
ev_type = stringr::str_detect(ev_type, "^hail.*") ~ "hail",
ev_type = stringr::str_detect(ev_type, "^[^(record|excessive|extreme)].*heat.*|^heat.*") ~ "heat",
ev_type = stringr::str_detect(ev_type, "^(rain.*|heavy.*rain.*)") ~ "heavy rain",
ev_type = stringr::str_detect(ev_type, "^(heavy|excessive)\\s?snow.*") ~ "heavy snow",
ev_type = stringr::str_detect(ev_type, "^(heavy|high|rough).*surf.*") ~ "high surf",
ev_type = stringr::str_detect(ev_type, "^high\\s?wind.*") ~ "high wind",
ev_type = stringr::str_detect(ev_type, "^(hurricane|typhoon)") ~ "hurricane (typhoon)",
ev_type = stringr::str_detect(ev_type, "^ice\\s?storm.*") ~ "ice storm",
ev_type = stringr::str_detect(ev_type, "^lake.*effect.*") ~ "lake-effect snow",
ev_type = stringr::str_detect(ev_type, "^marine.*(tstm|thunderstorm).*") ~ "marine thunderstorm wind",
ev_type = stringr::str_detect(ev_type, "^rip.*") ~ "rip current",
ev_type = stringr::str_detect(ev_type, "^sleet.*") ~ "sleet",
ev_type = stringr::str_detect(ev_type, "^storm.*") ~ "storm surge/tide",
ev_type = stringr::str_detect(ev_type, "^strong.*wind.*") ~ "strong wind",
ev_type = stringr::str_detect(ev_type, "^thunder.*|^tstm.*") ~ "thunderstorm wind",
ev_type = stringr::str_detect(ev_type, "^torn.*") ~ "tornado",
ev_type = stringr::str_detect(ev_type, "^trop.*stor.*") ~ "tropical storm",
ev_type = stringr::str_detect(ev_type, "^volc.*ash.*") ~ "volcanic ash",
ev_type = stringr::str_detect(ev_type, "^water\\s?spout.*") ~ "waterspout",
ev_type = stringr::str_detect(ev_type, "^wild.*fire.*") ~ "wildfire",
ev_type = stringr::str_detect(ev_type, "^winter\\s?storm.*") ~ "winter storm",
ev_type = stringr::str_detect(ev_type, "^winter\\s?weath.*") ~ "winter weather",
TRUE ~ ev_type
)
)
The unique values decresed further to 407.
The event type cleaning has been trickier than first tought. A good example is the event heavy snow. Using Regex to filter out all the string containing the word snow that start with heavy / record / snow (heavy snow), ice (ice storm) and blizzard I got the following unique values:
stringr::str_subset(noaa$ev_type, "^[^(heavy|record|snow|ice|blizzard)].*snow.*") %>% unique()
## [1] "first snow" "freezing rain snow" "freezing rain and snow"
## [4] "prolong cold snow" "monthly snowfall" "mountain snows"
## [7] "moderate snow" "moderate snowfall" "unusually late snow"
## [10] "falling snow ice"
Events like first snow, moderate snowfall or mountain snow are quite difficult to link to the NOAA’s table.
I created a character vector containing all 48 variables from the NOAA document and used it to filter the correct values and see what are the remaining results (10 most frequent in the table below).
noaa_ev <-
c("astronomical low tide", "avalanche", "blizzard", "coastal flood", "cold/wind chill", "debris flow", "dense fog",
"dense smoke", "drought", "dust devil", "dust storm", "excessive heat", "extreme cold/wind chill", "flash flood",
"flood", "frost/freeze", "funnel cloud", "freezing fog", "hail", "heat", "heavy rain", "heavy snow", "high surf",
"high wind", "hurricane (typhoon)", "ice storm", "lake-effect snow", "lakeshore flood", "lightning", "marine hail",
"marine high wind", "marine strong wind", "marine thunderstorm wind", "rip current", "seiche", "sleet",
"storm surge/tide", "strong wind", "thunderstorm wind", "tornado", "tropical depression", "tropical storm", "tsunami",
"volcanic ash", "waterspout", "wildfire", "winter storm", "winter weather")
noaa %>%
dplyr::filter(!ev_type %in% noaa_ev) %>%
dplyr::group_by(ev_type) %>%
dplyr::tally(sort = TRUE) %>%
dplyr::top_n(n = 10)
## # A tibble: 10 x 2
## ev_type n
## <chr> <int>
## 1 snow 617
## 2 landslide 600
## 3 wind 347
## 4 freezing rain 260
## 5 dry microburst 186
## 6 light snow 176
## 7 record warmth 154
## 8 unseasonably warm 126
## 9 moderate snowfall 101
## 10 wintry mix 94
The number of rows of variables not in list is 4’453 equivalent to 0.49% of the full dataset. I consider this acceptable and keep only values in the 48 NOAA variables.
noaa %<>%
dplyr::filter(ev_type %in% noaa_ev)
Property exponents:
unique(noaa$prop_dmg_exp)
## [1] "K" "M" NA "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
Crop exponents:
unique(noaa$crop_dmg_exp)
## [1] NA "M" "K" "m" "B" "?" "0" "k" "2"
To convert the exponent of property and crop damages I’ve used the informations found (here). It is a very well done analysis, based upon a previous one made by Eddie Song.
The possible values of exponent and their relative multiplier are:
The code below multiplies the property and crop damages for the relative exponent and keeps only the columns I need.
num <- as.character(0:8)
noaa %<>%
dplyr::mutate(
prop_dmg_exp = stringr::str_to_lower(prop_dmg_exp),
crop_dmg_exp = stringr::str_to_lower(crop_dmg_exp)
) %>%
dplyr::mutate(
prop_damages = dplyr::case_when(
prop_dmg_exp == "-" ~ prop_dmg * 0,
prop_dmg_exp == "?" ~ prop_dmg * 0,
prop_dmg_exp == "+" ~ prop_dmg * 1,
prop_dmg_exp %in% num ~ prop_dmg * 10,
prop_dmg_exp == "h" ~ prop_dmg * 100,
prop_dmg_exp == "k" ~ prop_dmg * 1000,
prop_dmg_exp == "m" ~ prop_dmg * 1000000,
prop_dmg_exp == "b" ~ prop_dmg * 1000000000,
is.na(prop_dmg_exp) ~ prop_dmg * 0,
TRUE ~ prop_dmg
), crop_damages = dplyr::case_when(
crop_dmg_exp == "-" ~ crop_dmg * 0,
crop_dmg_exp == "?" ~ crop_dmg * 0,
crop_dmg_exp == "+" ~ crop_dmg * 1,
crop_dmg_exp %in% num ~ crop_dmg * 10,
crop_dmg_exp == "h" ~ crop_dmg * 100,
crop_dmg_exp == "k" ~ crop_dmg * 1000,
crop_dmg_exp == "m" ~ crop_dmg * 1000000,
crop_dmg_exp == "b" ~ crop_dmg * 1000000000,
is.na(crop_dmg_exp) ~ crop_dmg * 0,
TRUE ~ prop_dmg
)
) %>%
dplyr::select(-prop_dmg,-prop_dmg_exp, -crop_dmg, -crop_dmg_exp)
noaa %>%
dplyr::select(prop_damages, crop_damages) %>%
purrr::map_df(
~data.frame(
count = format(length(.), big.mark = "'"),
NAs = format(sum(is.na(.)), big.mark = "'"),
zero = format(sum(. == 0), big.mark = "'"),
zero_p = scales::percent(sum(. == 0) / length(.), accuracy = 0.1),
min = min(.),
max = format(max(.), big.mark = "'"),
total = format(sum(.), big.mark = "'")
), .id = "variable"
)
## variable count NAs zero zero_p min max total
## 1 prop_damages 897'770 0 659'583 73.5% 0 115'000'000'000 425'557'040'220
## 2 crop_damages 897'770 0 875'751 97.5% 0 5'000'000'000 48'822'392'260
noaa %>%
dplyr::select(fatalities, injuries) %>%
purrr::map_df(
~data.frame(
count = format(length(.), big.mark = "'"),
NAs = format(sum(is.na(.)), big.mark = "'"),
zero = format(sum(. == 0), big.mark = "'"),
zero_p = scales::percent(sum(. == 0) / length(.), accuracy = 0.1),
min = min(.),
max = format(max(.), big.mark = "'"),
total = format(sum(.), big.mark = "'")
), .id = "variable"
)
## variable count NAs zero zero_p min max total
## 1 fatalities 897'770 0 890'924 99.2% 0 583 14'918
## 2 injuries 897'770 0 880'332 98.1% 0 1'700 139'563
noaa %>%
dplyr::select(ev_type, fatalities, injuries) %>%
dplyr::group_by(ev_type) %>%
dplyr::summarise(
fatalities = sum(fatalities),
injuries = sum(injuries)
) %>%
dplyr::top_n(n = 20, wt = (fatalities + injuries)) %>%
tidyr::pivot_longer(-ev_type) %>%
ggplot(aes(y = reorder(ev_type, value), x = value, fill = name)) +
geom_bar(position = "stack", stat = "identity") +
labs(title = "20 most harmful events \n", x = "", y ="") +
scale_x_continuous(labels = scales::label_number_si()) +
theme(legend.title = element_blank())
The graph above shows the 10 most harmuful events for the population (sum of fatalities and injuries). The most harmful one, by a long way, is tornado which caused 5’685 fatalities (~38% of all dataset’s fatalities) and 91’364 injuries (~65% of total). Regarding fatalities only, the 3 most harmful events are tornado, excessive heat and heat (almost 50% of all dataset’s fatalities). Regarding injuries the 3 most harmful events are tornado, thunderstorm wind and flood (~77% of all dataset’s injuries).
The table below gives a more detailed view of the 10 most harmful events, both for fatalities and injuries.
tbl1 <-
noaa %>%
get_top(var_1 = "ev_type", var_2 = "fatalities") %>%
dplyr::mutate(
percent = scales::percent(fatalities / sum(noaa$fatalities), accuracy = 0.1),
absolute = format(fatalities, big.mark = "'")
) %>%
dplyr::select(event = ev_type, absolute, percent) %>%
make_table(pos = "float_left") %>%
column_spec(1, width = "4.8cm") %>%
column_spec(2:3, width = "2cm") %>%
add_header_above(c(" ", "Fatalities" = 2))
tbl2 <-
noaa %>%
get_top(var_1 = "ev_type", var_2 = "injuries") %>%
dplyr::mutate(
percent = scales::percent(injuries / sum(noaa$injuries), accuracy = 0.1),
absolute = format(injuries, big.mark = "'")
) %>%
dplyr::select(event = ev_type, absolute, percent) %>%
make_table(pos = "center") %>%
column_spec(1, width = "4.8cm") %>%
column_spec(2:3, width = "2cm") %>%
add_header_above(c(" ", "Injuries" = 2))
| event | absolute | percent |
|---|---|---|
| tornado | 5’658 | 37.9% |
| excessive heat | 2’018 | 13.5% |
| heat | 1’118 | 7.5% |
| flash flood | 1’018 | 6.8% |
| lightning | 816 | 5.5% |
| thunderstorm wind | 712 | 4.8% |
| rip current | 577 | 3.9% |
| flood | 528 | 3.5% |
| extreme cold/wind chill | 305 | 2.0% |
| high wind | 293 | 2.0% |
| event | absolute | percent |
|---|---|---|
| tornado | 91’364 | 65.5% |
| thunderstorm wind | 9’509 | 6.8% |
| flood | 6’888 | 4.9% |
| excessive heat | 6’730 | 4.8% |
| lightning | 5’230 | 3.7% |
| heat | 2’494 | 1.8% |
| ice storm | 1’977 | 1.4% |
| flash flood | 1’785 | 1.3% |
| wildfire | 1’606 | 1.2% |
| high wind | 1’471 | 1.1% |
noaa %>%
dplyr::select(ev_type, prop_damages, crop_damages) %>%
dplyr::group_by(ev_type) %>%
dplyr::summarise(
properties = sum(prop_damages),
crops = sum(crop_damages)
) %>%
dplyr::top_n(n = 20, wt = (properties + crops)) %>%
tidyr::pivot_longer(-ev_type) %>%
ggplot(aes(y = reorder(ev_type, value), x = value, fill = name)) +
geom_bar(position = "stack", stat = "identity") +
labs(title = "20 most costly events \n", x = "", y ="") +
scale_x_continuous(labels = scales::label_number_si()) +
theme(legend.title = element_blank())
The graph above shows the 10 events with the greatest economic impact (sum of property and crop damages). The most harmful one is flood which caused US$150bn of property and ~US$11 bn of crop damages (respectively 35% and 22% all dataset’s damages). Regarding property damages only, the 3 most harmful events are flood, hurricane and tornado (68% all dataset’s property damages). Regarding crop damages the 3 most harmful events are drought, flood and hurricane (67% of all dataset’s crop damages).
The table below gives a more detailed view of the 10 most harmful events, both for property and crop damages.
tbl1 <-
noaa %>%
get_top(var_1 = "ev_type", var_2 = "prop_damages") %>%
dplyr::mutate(
percent = scales::percent(prop_damages / sum(noaa$prop_damages), accuracy = 0.1),
absolute = format(prop_damages, big.mark = "'")
) %>%
dplyr::select(event = ev_type, absolute, percent) %>%
make_table(pos = "float_left") %>%
column_spec(1, width = "4.8cm") %>%
column_spec(2:3, width = "2cm") %>%
add_header_above(c(" ", "Properties Damages" = 2))
tbl2 <-
noaa %>%
get_top(var_1 = "ev_type", var_2 = "crop_damages") %>%
dplyr::mutate(
percent = scales::percent(crop_damages / sum(noaa$crop_damages), accuracy = 0.1),
absolute = format(crop_damages, big.mark = "'")
) %>%
dplyr::select(event = ev_type, absolute, percent) %>%
make_table(pos = "center") %>%
column_spec(1, width = "4.8cm") %>%
column_spec(2:3, width = "2cm") %>%
add_header_above(c(" ", "Injuries" = 2))
| event | absolute | percent |
|---|---|---|
| flood | 150’276’219’254 | 35.3% |
| hurricane (typhoon) | 85’356’410’010 | 20.1% |
| tornado | 58’541’934’597 | 13.8% |
| storm surge/tide | 47’964’924’000 | 11.3% |
| flash flood | 16’732’872’111 | 3.9% |
| hail | 15’974’472’377 | 3.8% |
| thunderstorm wind | 9’762’897’822 | 2.3% |
| wildfire | 8’491’563’500 | 2.0% |
| tropical storm | 7’714’390’550 | 1.8% |
| winter storm | 6’748’997’260 | 1.6% |
| event | absolute | percent |
|---|---|---|
| drought | 13’972’571’780 | 28.6% |
| flood | 10’945’975’050 | 22.4% |
| hurricane (typhoon) | 5’516’117’800 | 11.3% |
| ice storm | 5’022’113’500 | 10.3% |
| hail | 3’026’094’800 | 6.2% |
| frost/freeze | 1’997’061’000 | 4.1% |
| flash flood | 1’437’163’150 | 2.9% |
| extreme cold/wind chill | 1’330’023’000 | 2.7% |
| thunderstorm wind | 1’225’459’700 | 2.5% |
| heavy rain | 796’502’800 | 1.6% |