Synopsis


This analysis tries to answer the following two questions:

1 which types of natural events are most harmful with respect to the population health.

2 which types of natural events have the greatest economic consequences.

The data used is from the Storm Data, an official pubblication of the National Oceanic and Atmospheric Administration (NOAA), which documents 1) the occurence of significant weather phenomena with sufficient intensity to cause loss of life, injuries or significant property damage, 2) unusual weather phenomena the generate media attention and 3) other significant meteorological events.

The analysis performed (more details below) shows the the most harmful events for public health (injuries + deaths) are tornadoes, 5.6k fatalities and 91.4k injuries, followed by thunderstorms (712 deaths and 9.5k injuries) and excessive heat (2k deaths and 6.7k injuries).

From an economic perspective, the most harmful events are floods, which costed a total of ~$161bn, followed by hurricanes (~$95bn) and tornadoes (~$60bn).

Previous results are calculated on total values both for health, deaths + injuries, and for economic consequences, property + crop daages. Under results you’ll find a more detailed table, splitted for single variable.

Data processing


Libraries


library(magrittr)
library(kableExtra)
library(ggplot2)

Except for magrittr, kableExtra and ggplot2, all the packages used in the analysis are called through the notation package::package_function.


Functions


make_table <- function(dataset, pos = "left") {
  dataset %>%
      knitr::kable(format = "html") %>%
      kable_styling(bootstrap_options = "striped", full_width = FALSE, position = pos, font_size = 12)
}

The make_table function uses knitr and kableExtra to create a common output for the tables created in the document, which all use the same style.

get_top <- function(dataset, var_1, var_2) {
    
    var_1 <- as.name(var_1)
    var_2 <- as.name(var_2)
  
    dataset %>%
    dplyr::group_by({{var_1}}) %>%
    dplyr::summarise((!!var_2) := sum(!!sym(var_2))) %>%
    dplyr::arrange(desc({{var_2}})) %>%
    dplyr::top_n(n = 10, wt = {{var_2}})
}

The get_top function groups by var_1 and than calculates the sum of var_2, orders it and keeps only the top 10 rows by value.


Load Data


The raw data, which can be downloaded at the following link, is saved in a .csv file compressed via the bzip2 algorithm. The compressed file size is ~47MB.

if (!file.exists("./data/storm_data.csv.bz2")) {
    
    url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
    downloader::download(url, "./data/storm_data.csv.bz2")
    
}

noaa <- readr::read_csv("./data/storm_data.csv.bz2", col_types = readr::cols(.default = readr::col_character()))

The code checks if the file storm_data already exists in a data folder under the current project working directory; if not, the downloader::dowload function downloads the file. The readr::read_csv function reads the raw file into R, taking care of the decompression process and assigning it to the variable noaa.
By default, readr guesses the columns types looking at the first 1’000 rows. With this dataset, the guessing process showed multiple parsing failures, including some columns read as logical. To avoid this problem, I forced all data as character and I’ll parse the numeric variables later.


First Look


dplyr::glimpse(noaa)
## Observations: 902,297
## Variables: 37
## $ STATE__    <chr> "1.00", "1.00", "1.00", "1.00", "1.00", "1.00", "1.00", ...
## $ BGN_DATE   <chr> "4/18/1950 0:00:00", "4/18/1950 0:00:00", "2/20/1951 0:0...
## $ BGN_TIME   <chr> "0130", "0145", "1600", "0900", "1500", "2000", "0100", ...
## $ TIME_ZONE  <chr> "CST", "CST", "CST", "CST", "CST", "CST", "CST", "CST", ...
## $ COUNTY     <chr> "97.00", "3.00", "57.00", "89.00", "43.00", "77.00", "9....
## $ COUNTYNAME <chr> "MOBILE", "BALDWIN", "FAYETTE", "MADISON", "CULLMAN", "L...
## $ STATE      <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "A...
## $ EVTYPE     <chr> "TORNADO", "TORNADO", "TORNADO", "TORNADO", "TORNADO", "...
## $ BGN_RANGE  <chr> "0.00", "0.00", "0.00", "0.00", "0.00", "0.00", "0.00", ...
## $ BGN_AZI    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ BGN_LOCATI <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ END_DATE   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ END_TIME   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ COUNTY_END <chr> "0.00", "0.00", "0.00", "0.00", "0.00", "0.00", "0.00", ...
## $ COUNTYENDN <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ END_RANGE  <chr> "0.00", "0.00", "0.00", "0.00", "0.00", "0.00", "0.00", ...
## $ END_AZI    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ END_LOCATI <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ LENGTH     <chr> "14.00", "2.00", "0.10", "0.00", "0.00", "1.50", "1.50",...
## $ WIDTH      <chr> "100.00", "150.00", "123.00", "100.00", "150.00", "177.0...
## $ F          <chr> "3", "2", "2", "2", "2", "2", "2", "1", "3", "3", "1", "...
## $ MAG        <chr> "0.00", "0.00", "0.00", "0.00", "0.00", "0.00", "0.00", ...
## $ FATALITIES <chr> "0.00", "0.00", "0.00", "0.00", "0.00", "0.00", "0.00", ...
## $ INJURIES   <chr> "15.00", "0.00", "2.00", "2.00", "2.00", "6.00", "1.00",...
## $ PROPDMG    <chr> "25.00", "2.50", "25.00", "2.50", "2.50", "2.50", "2.50"...
## $ PROPDMGEXP <chr> "K", "K", "K", "K", "K", "K", "K", "K", "K", "K", "M", "...
## $ CROPDMG    <chr> "0.00", "0.00", "0.00", "0.00", "0.00", "0.00", "0.00", ...
## $ CROPDMGEXP <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ WFO        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ STATEOFFIC <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ ZONENAMES  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ LATITUDE   <chr> "3040.00", "3042.00", "3340.00", "3458.00", "3412.00", "...
## $ LONGITUDE  <chr> "8812.00", "8755.00", "8742.00", "8626.00", "8642.00", "...
## $ LATITUDE_E <chr> "3051.00", "0.00", "0.00", "0.00", "0.00", "0.00", "0.00...
## $ LONGITUDE_ <chr> "8806.00", "0.00", "0.00", "0.00", "0.00", "0.00", "0.00...
## $ REMARKS    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ REFNUM     <chr> "1.00", "2.00", "3.00", "4.00", "5.00", "6.00", "7.00", ...

The dataset has 902’297 rows and 37 columns, for a total of 33’384’989 values. As shown by dplyr::glimpse, there are NAs in multiple columns.


1st step in processing the dataset


The variables needed to answer the questions are less than what we have in the raw dataset. In order to make it more manageable, I kept only 7 columns:

  • event type (variable EVTYPE),
  • fatalities and injuries (FATALITIES and INJURIES),
  • property and crop damages (PROPDMG and CROPDMG) and
  • their respective exponentials (PROPDMGEXP and CROPDMGEXP).

I passed the data frame to janitor::clean_names and to dplyr::rename to convert variable’s names to snake_case, then to readr::type_convert to correctly parse the columns.

noaa %<>% 
    dplyr::select(EVTYPE, FATALITIES:INJURIES, PROPDMG:CROPDMGEXP) %>%
    janitor::clean_names() %>%
    dplyr::rename(
        ev_type = evtype, prop_dmg = propdmg, prop_dmg_exp = propdmgexp, crop_dmg = cropdmg, crop_dmg_exp = cropdmgexp
    )

noaa <-
    readr::type_convert(
      noaa, col_types = readr::cols(
            ev_type      = readr::col_character(),
            fatalities   = readr::col_double(),
            injuries     = readr::col_double(),
            prop_dmg     = readr::col_double(),
            prop_dmg_exp = readr::col_character(),
            crop_dmg     = readr::col_double(),
            crop_dmg_exp = readr::col_character()
        )
    )


The table below gives a quick snapshot of the noaa dataset, highlighting the high number of unique values in the event type column (977) and the high number of NAs in the property and crop damages exponents:

noaa %>%
    purrr::map_df(
        ~data.frame(
            class  = class(.),
            count  = format(length(.), big.mark = "'"),
            unique = format(dplyr::n_distinct(.), big.mark = "'"),
            NAs    = format(sum(is.na(.)), big.mark = "'")
        ), .id = "variable"
    )
##       variable     class   count unique     NAs
## 1      ev_type character 902'297    977       0
## 2   fatalities   numeric 902'297     52       0
## 3     injuries   numeric 902'297    200       0
## 4     prop_dmg   numeric 902'297  1'390       0
## 5 prop_dmg_exp character 902'297     19 465'934
## 6     crop_dmg   numeric 902'297    432       0
## 7 crop_dmg_exp character 902'297      9 618'413


Event Type variables


The NOAA documentation contains 48 type of events (page 6 of this link), while the ev_type variable of the dataset has 977 unique values.

Below the 10 most frequent event types in the dataset.

noaa %>% 
    dplyr::group_by(ev_type) %>% 
    dplyr::tally(sort = TRUE) %>% 
    dplyr::top_n(n = 10) %>%
    dplyr::mutate(n = format(n, big.mark = "'"))
## # A tibble: 10 x 2
##    ev_type            n        
##    <chr>              <chr>    
##  1 HAIL               "288'661"
##  2 TSTM WIND          "219'944"
##  3 THUNDERSTORM WIND  " 82'563"
##  4 TORNADO            " 60'652"
##  5 FLASH FLOOD        " 54'278"
##  6 FLOOD              " 25'326"
##  7 THUNDERSTORM WINDS " 20'843"
##  8 HIGH WIND          " 20'212"
##  9 LIGHTNING          " 15'755"
## 10 HEAVY SNOW         " 15'708"


I found multiple problems in the ev_type column that explain the high number of unique values.

  1. Daily summary rows

The dataset has 73 rows that seems a daily summary, which isn’t usefull for the analysis. There’s also one row (not shown) which starts with “?”.

stringr::str_subset(noaa$ev_type, "^Summary") %>% head()
## [1] "Summary Jan 17"       "Summary of March 14"  "Summary of March 23" 
## [4] "Summary of March 24"  "Summary of April 3rd" "Summary of April 12"

  2. Upper/lowercase, punctuation and multiple spaces

The same event is sometimes uppercase (“COASTAL FLOODING”), lowercase (“coastal flooding”) or a mix of the two (“Coastal Flooding”). Some words are separated by one space (“COASTAL FLOOD”), two spaces (“COASTAL FLOODING/EROSION”) or no spaces at all (“COASTALFLOOD”).

stringr::str_subset(noaa$ev_type, "(?i)^coastal\\s*flood.*") %>% unique()
## [1] "COASTAL FLOOD"             "COASTAL FLOODING"         
## [3] "Coastal Flooding"          "COASTALFLOOD"             
## [5] "Coastal Flood"             "coastal flooding"         
## [7] "COASTAL FLOODING/EROSION"  "COASTAL  FLOODING/EROSION"

  3. Different notations for the same event

As shown in the table above, different observations of the same event have slightly different names. For example “COASTAL FLOOD” and “COASTAL FLOODING”.

In some cases the word is singular, plural or abbreviated: for example “TSTM WIND”, “THUNDERSTORM WIND” and “THUNDERSTORM WINDS” for “Thunderstorm Wind” in the data below.

stringr::str_subset(noaa$ev_type, "(?i)(tstm|thunderstorm).*wind.*") %>% unique() %>% head()
## [1] "TSTM WIND"                    "THUNDERSTORM WINDS"          
## [3] "THUNDERSTORM WIND"            "THUNDERSTORM WINDS LIGHTNING"
## [5] "THUNDERSTORM WINDS/HAIL"      "THUNDERSTORM WINDS HAIL"

  4. multiple events for the same observation

Some observatons have multiple events, as shown below. The rule I used is \(\underline{the\space first\space word\space in\space the\space description\space is\space the\space event\space key\space}\). This is helpfull to make the cleaning process easier and coherent. In the example below, “BLIZZARD/HEAVY SNOW” is recoded as blizzard while “HEAVY SNOW/BLIZZARD” is recoded as heavy snow.

stringr::str_subset(noaa$ev_type, "BLIZZARD") %>% unique()
##  [1] "BLIZZARD"                       "BLIZZARD WEATHER"              
##  [3] "HIGH WIND/BLIZZARD"             "HIGH WIND/BLIZZARD/FREEZING RA"
##  [5] "HIGH WIND/ BLIZZARD"            "BLIZZARD/HIGH WIND"            
##  [7] "HIGH WIND/WIND CHILL/BLIZZARD"  "BLIZZARD/HEAVY SNOW"           
##  [9] "GROUND BLIZZARD"                "HEAVY SNOW/BLIZZARD"           
## [11] "BLIZZARD/FREEZING RAIN"         "BLIZZARD AND HEAVY SNOW"       
## [13] "BLIZZARD AND EXTREME WIND CHIL" "BLIZZARD/WINTER STORM"         
## [15] "HEAVY SNOW/BLIZZARD/AVALANCHE"


Event Type cleaning - Part 1


The first step I took to clean the data is 1) filter out all rows containing the word summary or starting with a punctuation, 2) use stringr::str_to_lower to lowercase all ev_type, 3) substitute any punctuation with a space and 4) replace any double space with one space.

noaa %<>% 
    dplyr::filter(
        !stringr::str_detect(ev_type, "[Ss]ummary"),
        !stringr::str_detect(ev_type, "^[:punct:]")
    ) %>%
    dplyr::mutate(
        ev_type = stringr::str_to_lower(ev_type),
        ev_type = stringr::str_replace(ev_type, "[:punct:]", " "),
        ev_type = stringr::str_replace(ev_type, "\\s{2}", "\\s")
    )

The number of unique values decreased to 802 from 977 before, better but still far away from the NOAA’s 48.


Event Type cleaning - Part 2


The next step I took is to write a code based on some basic RegEx to assign every observation to its event category. I found some events which were already “clean” and that didn’t need any processing: lakeshore flood, lightining, marine hail, marine high wind, marine strong wind, seiche, tropical depression and tsunami.

noaa %<>% 
    dplyr::mutate(
        ev_type = dplyr::case_when(
            ev_type = stringr::str_detect(ev_type, "^astro.*") ~ "astronomical low tide",
            ev_type = stringr::str_detect(ev_type, "^aval.*") ~ "avalanche",
            ev_type = stringr::str_detect(ev_type, "^blizzar.*") ~ "blizzard",
            ev_type = stringr::str_detect(ev_type, "^(coastal|cstl).*flood.*|beach.*(flood|fld).*") ~ "coastal flood",
            ev_type = stringr::str_detect(ev_type, "^cold.*") ~ "cold/wind chill",
            ev_type = stringr::str_detect(ev_type, "^fog|^dense.*fog.*") ~ "dense fog",
            ev_type = stringr::str_detect(ev_type, "smoke") ~ "dense smoke",
            ev_type = stringr::str_detect(ev_type, "^drough") ~ "drought",
            ev_type = stringr::str_detect(ev_type, "^dust.*dev.*") ~ "dust devil",
            ev_type = stringr::str_detect(ev_type, "^dust.*storm.*") ~ "dust storm",
            ev_type = stringr::str_detect(ev_type, "^(record|excessive|extreme).*heat.*") ~ "excessive heat",
            ev_type = stringr::str_detect(ev_type, "^(extreme|record)\\s?(cold|wind).*") ~ "extreme cold/wind chill",
            ev_type = stringr::str_detect(ev_type, "^flash.*flood.*") ~ "flash flood",
            ev_type = stringr::str_detect(ev_type, "^flood|^(urban|river|small).*(flood|fld).*") ~ "flood",
            ev_type = stringr::str_detect(ev_type, "(frost|freeze)") ~ "frost/freeze",
            ev_type = stringr::str_detect(ev_type, "^funnel.*cloud.*|^funnel") ~ "funnel cloud",
            ev_type = stringr::str_detect(ev_type, "^freez.*fog.*") ~ "freezing fog",
            ev_type = stringr::str_detect(ev_type, "^hail.*") ~ "hail",
            ev_type = stringr::str_detect(ev_type, "^[^(record|excessive|extreme)].*heat.*|^heat.*") ~ "heat",
            ev_type = stringr::str_detect(ev_type, "^(rain.*|heavy.*rain.*)") ~ "heavy rain",
            ev_type = stringr::str_detect(ev_type, "^(heavy|excessive)\\s?snow.*") ~ "heavy snow",
            ev_type = stringr::str_detect(ev_type, "^(heavy|high|rough).*surf.*") ~ "high surf",
            ev_type = stringr::str_detect(ev_type, "^high\\s?wind.*") ~ "high wind",
            ev_type = stringr::str_detect(ev_type, "^(hurricane|typhoon)") ~ "hurricane (typhoon)",
            ev_type = stringr::str_detect(ev_type, "^ice\\s?storm.*") ~ "ice storm",
            ev_type = stringr::str_detect(ev_type, "^lake.*effect.*") ~ "lake-effect snow",
            ev_type = stringr::str_detect(ev_type, "^marine.*(tstm|thunderstorm).*") ~ "marine thunderstorm wind",
            ev_type = stringr::str_detect(ev_type, "^rip.*") ~ "rip current",
            ev_type = stringr::str_detect(ev_type, "^sleet.*") ~ "sleet",
            ev_type = stringr::str_detect(ev_type, "^storm.*") ~ "storm surge/tide",
            ev_type = stringr::str_detect(ev_type, "^strong.*wind.*") ~ "strong wind",
            ev_type = stringr::str_detect(ev_type, "^thunder.*|^tstm.*") ~ "thunderstorm wind",
            ev_type = stringr::str_detect(ev_type, "^torn.*") ~ "tornado",
            ev_type = stringr::str_detect(ev_type, "^trop.*stor.*") ~ "tropical storm",
            ev_type = stringr::str_detect(ev_type, "^volc.*ash.*") ~ "volcanic ash",
            ev_type = stringr::str_detect(ev_type, "^water\\s?spout.*") ~ "waterspout",
            ev_type = stringr::str_detect(ev_type, "^wild.*fire.*") ~ "wildfire",
            ev_type = stringr::str_detect(ev_type, "^winter\\s?storm.*") ~ "winter storm",
            ev_type = stringr::str_detect(ev_type, "^winter\\s?weath.*") ~ "winter weather",
            TRUE ~ ev_type
        )
    )

The unique values decresed further to 407.

The event type cleaning has been trickier than first tought. A good example is the event heavy snow. Using Regex to filter out all the string containing the word snow that start with heavy / record / snow (heavy snow), ice (ice storm) and blizzard I got the following unique values:

stringr::str_subset(noaa$ev_type, "^[^(heavy|record|snow|ice|blizzard)].*snow.*") %>% unique()
##  [1] "first snow"             "freezing rain snow"     "freezing rain and snow"
##  [4] "prolong cold snow"      "monthly snowfall"       "mountain snows"        
##  [7] "moderate snow"          "moderate snowfall"      "unusually late snow"   
## [10] "falling snow ice"

Events like first snow, moderate snowfall or mountain snow are quite difficult to link to the NOAA’s table.

I created a character vector containing all 48 variables from the NOAA document and used it to filter the correct values and see what are the remaining results (10 most frequent in the table below).

noaa_ev <- 
    c("astronomical low tide", "avalanche", "blizzard", "coastal flood", "cold/wind chill", "debris flow", "dense fog",
      "dense smoke", "drought", "dust devil", "dust storm", "excessive heat", "extreme cold/wind chill", "flash flood",
      "flood", "frost/freeze", "funnel cloud", "freezing fog", "hail", "heat", "heavy rain", "heavy snow", "high surf",
      "high wind", "hurricane (typhoon)", "ice storm", "lake-effect snow", "lakeshore flood", "lightning", "marine hail",
      "marine high wind", "marine strong wind", "marine thunderstorm wind", "rip current", "seiche", "sleet", 
      "storm surge/tide", "strong wind", "thunderstorm wind", "tornado", "tropical depression", "tropical storm", "tsunami",
      "volcanic ash", "waterspout", "wildfire", "winter storm", "winter weather")

noaa %>%
    dplyr::filter(!ev_type %in% noaa_ev) %>%
    dplyr::group_by(ev_type) %>% 
    dplyr::tally(sort = TRUE) %>% 
    dplyr::top_n(n = 10)
## # A tibble: 10 x 2
##    ev_type               n
##    <chr>             <int>
##  1 snow                617
##  2 landslide           600
##  3 wind                347
##  4 freezing rain       260
##  5 dry microburst      186
##  6 light snow          176
##  7 record warmth       154
##  8 unseasonably warm   126
##  9 moderate snowfall   101
## 10 wintry mix           94

The number of rows of variables not in list is 4’453 equivalent to 0.49% of the full dataset. I consider this acceptable and keep only values in the 48 NOAA variables.

noaa %<>%
    dplyr::filter(ev_type %in% noaa_ev)


Property/Crop Damages Exponent


Property exponents:

unique(noaa$prop_dmg_exp)
##  [1] "K" "M" NA  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"

Crop exponents:

unique(noaa$crop_dmg_exp)
## [1] NA  "M" "K" "m" "B" "?" "0" "k" "2"


Exponent Conversion


To convert the exponent of property and crop damages I’ve used the informations found (here). It is a very well done analysis, based upon a previous one made by Eddie Song.

The possible values of exponent and their relative multiplier are:

  • h = hundreds = 100
  • k = thousands = 1,000
  • m = millions = 1,000,000
  • b = billions = 1,000,000,000
  • (+) = 1
  • (-) = 0
  • (?) = 0
  • black/empty character = 0
  • numeric 0:8 = 10

The code below multiplies the property and crop damages for the relative exponent and keeps only the columns I need.

num <- as.character(0:8)

noaa %<>%
    dplyr::mutate(
        prop_dmg_exp = stringr::str_to_lower(prop_dmg_exp),
        crop_dmg_exp = stringr::str_to_lower(crop_dmg_exp)
    ) %>%
    dplyr::mutate(
        prop_damages = dplyr::case_when(
            prop_dmg_exp == "-" ~ prop_dmg * 0,
            prop_dmg_exp == "?" ~ prop_dmg * 0,
            prop_dmg_exp == "+" ~ prop_dmg * 1,
            prop_dmg_exp %in% num ~ prop_dmg * 10,
            prop_dmg_exp == "h" ~ prop_dmg * 100,
            prop_dmg_exp == "k" ~ prop_dmg * 1000,
            prop_dmg_exp == "m" ~ prop_dmg * 1000000,
            prop_dmg_exp == "b" ~ prop_dmg * 1000000000,
            is.na(prop_dmg_exp) ~ prop_dmg * 0,
            TRUE ~ prop_dmg
        ), crop_damages = dplyr::case_when(
            crop_dmg_exp == "-" ~ crop_dmg * 0,
            crop_dmg_exp == "?" ~ crop_dmg * 0,
            crop_dmg_exp == "+" ~ crop_dmg * 1,
            crop_dmg_exp %in% num ~ crop_dmg * 10,
            crop_dmg_exp == "h" ~ crop_dmg * 100,
            crop_dmg_exp == "k" ~ crop_dmg * 1000,
            crop_dmg_exp == "m" ~ crop_dmg * 1000000,
            crop_dmg_exp == "b" ~ crop_dmg * 1000000000,
            is.na(crop_dmg_exp) ~ crop_dmg * 0,
            TRUE ~ prop_dmg
        )
    ) %>%
    dplyr::select(-prop_dmg,-prop_dmg_exp, -crop_dmg, -crop_dmg_exp)


Property / Crop Damages


noaa %>%
    dplyr::select(prop_damages, crop_damages) %>%
    purrr::map_df(
        ~data.frame(
            count    = format(length(.), big.mark = "'"),
            NAs      = format(sum(is.na(.)), big.mark = "'"),
            zero     = format(sum(. == 0), big.mark = "'"),
            zero_p   = scales::percent(sum(. == 0) / length(.), accuracy = 0.1),
            min      = min(.),
            max      = format(max(.), big.mark = "'"),
            total    = format(sum(.), big.mark = "'")
        ), .id = "variable"
    )
##       variable   count NAs    zero zero_p min             max           total
## 1 prop_damages 897'770   0 659'583  73.5%   0 115'000'000'000 425'557'040'220
## 2 crop_damages 897'770   0 875'751  97.5%   0   5'000'000'000  48'822'392'260


Fataliaties / Injuries


noaa %>%
    dplyr::select(fatalities, injuries) %>%
    purrr::map_df(
        ~data.frame(
            count    = format(length(.), big.mark = "'"),
            NAs      = format(sum(is.na(.)), big.mark = "'"),
            zero     = format(sum(. == 0), big.mark = "'"),
            zero_p   = scales::percent(sum(. == 0) / length(.), accuracy = 0.1),
            min      = min(.),
            max      = format(max(.), big.mark = "'"),
            total    = format(sum(.), big.mark = "'")
        ), .id = "variable"
    )
##     variable   count NAs    zero zero_p min   max   total
## 1 fatalities 897'770   0 890'924  99.2%   0   583  14'918
## 2   injuries 897'770   0 880'332  98.1%   0 1'700 139'563


Results

Most Harmful Events


noaa %>%
    dplyr::select(ev_type, fatalities, injuries) %>%
    dplyr::group_by(ev_type) %>%
    dplyr::summarise(
        fatalities = sum(fatalities),
        injuries   = sum(injuries)
    ) %>%
    dplyr::top_n(n = 20, wt = (fatalities + injuries)) %>%
    tidyr::pivot_longer(-ev_type) %>%
    ggplot(aes(y = reorder(ev_type, value), x = value, fill = name)) +
    geom_bar(position = "stack", stat = "identity") +
    labs(title = "20 most harmful events \n", x = "", y ="") +
    scale_x_continuous(labels = scales::label_number_si()) + 
    theme(legend.title = element_blank())

The graph above shows the 10 most harmuful events for the population (sum of fatalities and injuries). The most harmful one, by a long way, is tornado which caused 5’685 fatalities (~38% of all dataset’s fatalities) and 91’364 injuries (~65% of total). Regarding fatalities only, the 3 most harmful events are tornado, excessive heat and heat (almost 50% of all dataset’s fatalities). Regarding injuries the 3 most harmful events are tornado, thunderstorm wind and flood (~77% of all dataset’s injuries).

The table below gives a more detailed view of the 10 most harmful events, both for fatalities and injuries.

tbl1 <-
  noaa %>% 
  get_top(var_1 = "ev_type", var_2 = "fatalities") %>%
  dplyr::mutate(
    percent = scales::percent(fatalities / sum(noaa$fatalities), accuracy = 0.1),
    absolute = format(fatalities, big.mark = "'")
  ) %>%
  dplyr::select(event = ev_type, absolute, percent) %>%
  make_table(pos = "float_left") %>%
  column_spec(1, width = "4.8cm") %>%
  column_spec(2:3, width = "2cm") %>%
  add_header_above(c(" ", "Fatalities" = 2))
  
tbl2 <-
  noaa %>% 
  get_top(var_1 = "ev_type", var_2 = "injuries") %>%
  dplyr::mutate(
    percent = scales::percent(injuries / sum(noaa$injuries), accuracy = 0.1),
    absolute = format(injuries, big.mark = "'")
  ) %>%
  dplyr::select(event = ev_type, absolute, percent) %>%
  make_table(pos = "center") %>%
  column_spec(1, width = "4.8cm") %>%
  column_spec(2:3, width = "2cm") %>%
  add_header_above(c(" ", "Injuries" = 2))
Fatalities
event absolute percent
tornado 5’658 37.9%
excessive heat 2’018 13.5%
heat 1’118 7.5%
flash flood 1’018 6.8%
lightning 816 5.5%
thunderstorm wind 712 4.8%
rip current 577 3.9%
flood 528 3.5%
extreme cold/wind chill 305 2.0%
high wind 293 2.0%
Injuries
event absolute percent
tornado 91’364 65.5%
thunderstorm wind 9’509 6.8%
flood 6’888 4.9%
excessive heat 6’730 4.8%
lightning 5’230 3.7%
heat 2’494 1.8%
ice storm 1’977 1.4%
flash flood 1’785 1.3%
wildfire 1’606 1.2%
high wind 1’471 1.1%


Most Costly Events


noaa %>%
    dplyr::select(ev_type, prop_damages, crop_damages) %>%
    dplyr::group_by(ev_type) %>%
    dplyr::summarise(
        properties = sum(prop_damages),
        crops      = sum(crop_damages)
    ) %>%
    dplyr::top_n(n = 20, wt = (properties + crops)) %>%
    tidyr::pivot_longer(-ev_type) %>%
    ggplot(aes(y = reorder(ev_type, value), x = value, fill = name)) +
    geom_bar(position = "stack", stat = "identity") +
    labs(title = "20 most costly events \n", x = "", y ="") +
    scale_x_continuous(labels = scales::label_number_si()) + 
    theme(legend.title = element_blank())

The graph above shows the 10 events with the greatest economic impact (sum of property and crop damages). The most harmful one is flood which caused US$150bn of property and ~US$11 bn of crop damages (respectively 35% and 22% all dataset’s damages). Regarding property damages only, the 3 most harmful events are flood, hurricane and tornado (68% all dataset’s property damages). Regarding crop damages the 3 most harmful events are drought, flood and hurricane (67% of all dataset’s crop damages).

The table below gives a more detailed view of the 10 most harmful events, both for property and crop damages.

tbl1 <-
  noaa %>% 
  get_top(var_1 = "ev_type", var_2 = "prop_damages") %>%
  dplyr::mutate(
    percent = scales::percent(prop_damages / sum(noaa$prop_damages), accuracy = 0.1),
    absolute = format(prop_damages, big.mark = "'")
  ) %>%
  dplyr::select(event = ev_type, absolute, percent) %>%
  make_table(pos = "float_left") %>%
  column_spec(1, width = "4.8cm") %>%
  column_spec(2:3, width = "2cm") %>%
  add_header_above(c(" ", "Properties Damages" = 2))
  
tbl2 <-
  noaa %>% 
  get_top(var_1 = "ev_type", var_2 = "crop_damages") %>%
  dplyr::mutate(
    percent = scales::percent(crop_damages / sum(noaa$crop_damages), accuracy = 0.1),
    absolute = format(crop_damages, big.mark = "'")
  ) %>%
  dplyr::select(event = ev_type, absolute, percent) %>%
  make_table(pos = "center") %>%
  column_spec(1, width = "4.8cm") %>%
  column_spec(2:3, width = "2cm") %>%
  add_header_above(c(" ", "Injuries" = 2))
Properties Damages
event absolute percent
flood 150’276’219’254 35.3%
hurricane (typhoon) 85’356’410’010 20.1%
tornado 58’541’934’597 13.8%
storm surge/tide 47’964’924’000 11.3%
flash flood 16’732’872’111 3.9%
hail 15’974’472’377 3.8%
thunderstorm wind 9’762’897’822 2.3%
wildfire 8’491’563’500 2.0%
tropical storm 7’714’390’550 1.8%
winter storm 6’748’997’260 1.6%
Injuries
event absolute percent
drought 13’972’571’780 28.6%
flood 10’945’975’050 22.4%
hurricane (typhoon) 5’516’117’800 11.3%
ice storm 5’022’113’500 10.3%
hail 3’026’094’800 6.2%
frost/freeze 1’997’061’000 4.1%
flash flood 1’437’163’150 2.9%
extreme cold/wind chill 1’330’023’000 2.7%
thunderstorm wind 1’225’459’700 2.5%
heavy rain 796’502’800 1.6%