This report details the data processing and analyses of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm/extreme weather events database. The database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The report attempts to answer two questions:
1. Which types of events are most harmful to population health across the US?
2. Which types of events have the highest economic consequences across the US?
The NOAA storm database analyzed here covers the period 1950-2011. In the earlier years of the database there are fewer events, most likely due to a lack of good records; more recent years are more complete.
Extensive documentation of the database and variables is available in these documents:
* National Weather Service Storm Data Documentation
* National Climatic Data Center Storm Events FAQ
Before proceeding with downloading and processing the data, set up the workspace.
This assumes that the project directory has the following subdirectory structure:
* data - raw data files
* doc - Rmd/html notebooks detailing the analyses
* figures - image files/figures for the report, if they should be separate
* output - outputs and processed data
* R - R code/scripts/functions
This helps keep the project organized, and avoids having to type long file paths.
data.dir <- "data"
doc.dir <- "doc"
figures.dir <- "figures"
save.dir <- "output"
functions.dir <- "R"
Load the necessary libraries.
library(here) ## easily construct relative file paths
## here() starts at /media/eti/SAMSUNG/fun/courses/data_science_specialization/05_reproducible_research/w4
library(tidyverse) ## data cleaning, processing, manipulation and analysis
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.0.6 ✓ dplyr 1.0.4
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(kableExtra) ## html table formatting
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
library(patchwork) ## combine (gg)plots easily
## library(janitor) ## data cleaning
## library(priceR) ## inflation calculations and data from the World Bank
Set the ggplot theme to always black-and-white.
theme_set(theme_bw())
The data for this report are stored as a comma-separated-value file compressed via bzip2 to reduce its size, and available through the course website.
## url provided on the course website
storm.url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
## increase timeout to prevent download from failing due to faulty internet connection
options(timeout = 360)
download.file(storm.url,
destfile = here(data.dir, "storm.data.csv.bz2"))
Read in the data - warning, this takes some time; then check its structure.
## the file will be automatically uncompressed by read_csv
storms.raw <- read_csv(here(data.dir, "storm.data.csv.bz2"),
col_types = paste0(rep("c", times = 37), collapse = "")) ## all columns will be specified as character type to begin with, because the file is so big - this way parsing errors will be avoided, too
str(storms.raw)
## spec_tbl_df [902,297 × 37] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ STATE__ : chr [1:902297] "1.00" "1.00" "1.00" "1.00" ...
## $ BGN_DATE : chr [1:902297] "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr [1:902297] "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr [1:902297] "CST" "CST" "CST" "CST" ...
## $ COUNTY : chr [1:902297] "97.00" "3.00" "57.00" "89.00" ...
## $ COUNTYNAME: chr [1:902297] "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr [1:902297] "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr [1:902297] "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : chr [1:902297] "0.00" "0.00" "0.00" "0.00" ...
## $ BGN_AZI : chr [1:902297] NA NA NA NA ...
## $ BGN_LOCATI: chr [1:902297] NA NA NA NA ...
## $ END_DATE : chr [1:902297] NA NA NA NA ...
## $ END_TIME : chr [1:902297] NA NA NA NA ...
## $ COUNTY_END: chr [1:902297] "0.00" "0.00" "0.00" "0.00" ...
## $ COUNTYENDN: chr [1:902297] NA NA NA NA ...
## $ END_RANGE : chr [1:902297] "0.00" "0.00" "0.00" "0.00" ...
## $ END_AZI : chr [1:902297] NA NA NA NA ...
## $ END_LOCATI: chr [1:902297] NA NA NA NA ...
## $ LENGTH : chr [1:902297] "14.00" "2.00" "0.10" "0.00" ...
## $ WIDTH : chr [1:902297] "100.00" "150.00" "123.00" "100.00" ...
## $ F : chr [1:902297] "3" "2" "2" "2" ...
## $ MAG : chr [1:902297] "0.00" "0.00" "0.00" "0.00" ...
## $ FATALITIES: chr [1:902297] "0.00" "0.00" "0.00" "0.00" ...
## $ INJURIES : chr [1:902297] "15.00" "0.00" "2.00" "2.00" ...
## $ PROPDMG : chr [1:902297] "25.00" "2.50" "25.00" "2.50" ...
## $ PROPDMGEXP: chr [1:902297] "K" "K" "K" "K" ...
## $ CROPDMG : chr [1:902297] "0.00" "0.00" "0.00" "0.00" ...
## $ CROPDMGEXP: chr [1:902297] NA NA NA NA ...
## $ WFO : chr [1:902297] NA NA NA NA ...
## $ STATEOFFIC: chr [1:902297] NA NA NA NA ...
## $ ZONENAMES : chr [1:902297] NA NA NA NA ...
## $ LATITUDE : chr [1:902297] "3040.00" "3042.00" "3340.00" "3458.00" ...
## $ LONGITUDE : chr [1:902297] "8812.00" "8755.00" "8742.00" "8626.00" ...
## $ LATITUDE_E: chr [1:902297] "3051.00" "0.00" "0.00" "0.00" ...
## $ LONGITUDE_: chr [1:902297] "8806.00" "0.00" "0.00" "0.00" ...
## $ REMARKS : chr [1:902297] NA NA NA NA ...
## $ REFNUM : chr [1:902297] "1.00" "2.00" "3.00" "4.00" ...
## - attr(*, "spec")=
## .. cols(
## .. STATE__ = col_character(),
## .. BGN_DATE = col_character(),
## .. BGN_TIME = col_character(),
## .. TIME_ZONE = col_character(),
## .. COUNTY = col_character(),
## .. COUNTYNAME = col_character(),
## .. STATE = col_character(),
## .. EVTYPE = col_character(),
## .. BGN_RANGE = col_character(),
## .. BGN_AZI = col_character(),
## .. BGN_LOCATI = col_character(),
## .. END_DATE = col_character(),
## .. END_TIME = col_character(),
## .. COUNTY_END = col_character(),
## .. COUNTYENDN = col_character(),
## .. END_RANGE = col_character(),
## .. END_AZI = col_character(),
## .. END_LOCATI = col_character(),
## .. LENGTH = col_character(),
## .. WIDTH = col_character(),
## .. F = col_character(),
## .. MAG = col_character(),
## .. FATALITIES = col_character(),
## .. INJURIES = col_character(),
## .. PROPDMG = col_character(),
## .. PROPDMGEXP = col_character(),
## .. CROPDMG = col_character(),
## .. CROPDMGEXP = col_character(),
## .. WFO = col_character(),
## .. STATEOFFIC = col_character(),
## .. ZONENAMES = col_character(),
## .. LATITUDE = col_character(),
## .. LONGITUDE = col_character(),
## .. LATITUDE_E = col_character(),
## .. LONGITUDE_ = col_character(),
## .. REMARKS = col_character(),
## .. REFNUM = col_character()
## .. )
From the documentation, the relevant variables for identifying the weather events most damaging to property/harmful to population health are:
* EVTYPE - the type of weather event
* FATALITIES - the number of fatalities
* INJURIES - the number of injuries.
And for estimating the highest economic consequences, also:
* PROPDMG - the estimated dollar value of property damage caused by the event
* PROPDMGEXP - a multiplier of the amount of money
* CROPDMG - the estimated dollar value of crop damage caused by the event
* CROPDMGEXP - a multiplier of the amount of money
In case extra information is needed, I’ll also extract the reference number (REFNUM), start and end dates (BGN_DATE, END_DATE), and state (STATE).
storms.sub <- storms.raw %>%
select(REFNUM, STATE, BGN_DATE, END_DATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
Convert all variable names to lowercase (and clean up anything else that might be wrong with them) using package janitor.
## not importing the whole package only for this one single function
storms.sub <- storms.sub %>% janitor::clean_names()
Transform the variables into the correct data type. This concerns the unambiguous variables (the reference numbers, the dates and the numbers of casualties/damages).
(storms.sub <- storms.sub %>%
mutate(refnum = as.numeric(refnum),
bgn_date = as.Date(bgn_date, format = "%m/%d/%Y"),
end_date = as.Date(end_date, format = "%m/%d/%Y"),
fatalities = as.numeric(fatalities),
injuries = as.numeric(injuries),
propdmg = as.numeric(propdmg),
cropdmg = as.numeric(cropdmg))
)
## # A tibble: 902,297 x 11
## refnum state bgn_date end_date evtype fatalities injuries propdmg
## <dbl> <chr> <date> <date> <chr> <dbl> <dbl> <dbl>
## 1 1 AL 1950-04-18 NA TORNA… 0 15 25
## 2 2 AL 1950-04-18 NA TORNA… 0 0 2.5
## 3 3 AL 1951-02-20 NA TORNA… 0 2 25
## 4 4 AL 1951-06-08 NA TORNA… 0 2 2.5
## 5 5 AL 1951-11-15 NA TORNA… 0 2 2.5
## 6 6 AL 1951-11-15 NA TORNA… 0 6 2.5
## 7 7 AL 1951-11-16 NA TORNA… 0 1 2.5
## 8 8 AL 1952-01-22 NA TORNA… 0 0 2.5
## 9 9 AL 1952-02-13 NA TORNA… 1 14 25
## 10 10 AL 1952-02-13 NA TORNA… 0 0 25
## # … with 902,287 more rows, and 3 more variables: propdmgexp <chr>,
## # cropdmg <dbl>, cropdmgexp <chr>
Check the (numeric) variables for NAs, inconsistencies, etc.
storms.sub %>%
select(bgn_date, cropdmg, propdmg, fatalities, injuries) %>%
summary()
## bgn_date cropdmg propdmg fatalities
## Min. :1950-01-03 Min. : 0.000 Min. : 0.00 Min. : 0.0000
## 1st Qu.:1995-04-20 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.0000
## Median :2002-03-18 Median : 0.000 Median : 0.00 Median : 0.0000
## Mean :1998-12-27 Mean : 1.527 Mean : 12.06 Mean : 0.0168
## 3rd Qu.:2007-07-28 3rd Qu.: 0.000 3rd Qu.: 0.50 3rd Qu.: 0.0000
## Max. :2011-11-30 Max. :990.000 Max. :5000.00 Max. :583.0000
## injuries
## Min. : 0.0000
## 1st Qu.: 0.0000
## Median : 0.0000
## Mean : 0.1557
## 3rd Qu.: 0.0000
## Max. :1700.0000
There are no NAs in the dates, damages and casualties. But there are many zeroes (i.e. events that did not cause any material damage according to the estimates) - the medians of the variables are 0, and so are even the third quantiles. So I’d expect there to be a few very destructive events (like a big tornado), but that most events did not cause significant damages or fatalities.
Weather events that cause neither damages to property/crops nor injuries/fatalities are not of interest for this report - its objective is to find the ones with the highest cost. I am going to drop them now to reduce the number of records.
storms.sub.dmg <- storms.sub %>%
filter(!(propdmg == 0 & cropdmg == 0 & fatalities == 0 & injuries == 0))
This reduced the dataset by a factor of 3.5.
I’m keeping a copy with all records in case I decide to calculate overall frequencies, etc.
According to the documentation, in 1950-1955, only tornadoes were recorded. Then, over the period 1955-1996, the weather events recorded were tornadoes, thunderstorm winds, and hail. Since 1993, the list was expanded to include the current official 48 weather events.
Check how many records there are prior to 1993.
## events before 1993
storms.sub.dmg %>%
filter(lubridate::year(bgn_date) < 1993) %>%
tally()
## # A tibble: 1 x 1
## n
## <int>
## 1 27376
## events after 1993
storms.sub.dmg %>%
filter(lubridate::year(bgn_date) >= 1993) %>%
tally()
## # A tibble: 1 x 1
## n
## <int>
## 1 227257
There are about 8 times more records in the period after 1993. Since these include the full 48 types of weather events, it might be more meaningful to compare only them in order to find out which event causes the highest damages (otherwise, the tornadoes, thunderstorms and hail are all likely to get artificially inflated values due to the longer period over which they were recorded). Given that comparatively few records will be excluded, it probably won’t bias the conclusions too much.
Subset to only the post-1993 period.
storms.post.1993 <- storms.sub.dmg %>%
filter(lubridate::year(bgn_date) >= 1993)
Calculate the actual property and crop damages.
Check the variables “propdmgexp” and “cropdmgexp”.
storms.post.1993 %>%
select(propdmgexp) %>%
table()
## .
## - + 0 2 3 4 5 6 7 B h
## 1 5 210 1 1 4 18 3 3 40 1
## H K m M
## 6 208203 7 8547
storms.post.1993 %>%
select(cropdmgexp) %>%
table()
## .
## ? 0 B k K m M
## 6 17 7 21 99932 1 1985
According to the documentation, these were only supposed to contain “k”, “m” and “b” - for thousands, millions and billions of dollars, respectively.
Following this logic, the “h” should correspond to hundreds.
Reading around, it turns out that the - and + mean “less than” and “more than” the damage amount entered in the corresponding column (the idea was for less experienced personnel to indicate that there was uncertainty in the estimate). I’m going to ignore them, since they are so infrequent compared to the others.
The “?” also means that there is uncertainty about the damage estimate - I’m ignoring it as well; it’s also not that frequent overall.
The values 0-8 are apparently all equal to a multiplication by 10 (based on this resource).
Recalculate the actual damages, keeping the above considerations in mind.
storms.post.1993 <- storms.post.1993 %>%
## property damages
mutate(propdmg_actual = case_when(propdmgexp %in% c("-", "+", "?") | is.na(propdmgexp) ~ propdmg * 1,
propdmgexp %in% as.character(0:8) ~ propdmg * 10,
propdmgexp %in% c("h", "H") ~ propdmg * 100,
propdmgexp %in% c("k", "K") ~ propdmg * 1000,
propdmgexp %in% c("m", "M") ~ propdmg * 1000000,
propdmgexp %in% c("b", "B") ~ propdmg * 1000000000),
## crop damages
cropdmg_actual = case_when(cropdmgexp %in% c("-", "+", "?") | is.na(cropdmgexp) ~ cropdmg * 1,
cropdmgexp %in% as.character(0:8) ~ cropdmg * 10,
cropdmgexp %in% c("h", "H") ~ cropdmg * 100,
cropdmgexp %in% c("k", "K") ~ cropdmg * 1000,
cropdmgexp %in% c("m", "M") ~ cropdmg * 1000000,
cropdmgexp %in% c("b", "B") ~ cropdmg * 1000000000))
Adjust the reported costs for the inflation, so the numbers are directly comparable. The package priceR has a handy function that can do that, using data from the World Bank. I’ll convert all costs to (average) 2019 USD, as the most recent available in the dataset at the time of writing of this report.
us.inflation.df <- priceR::retrieve_inflation_data("US")
## Validating iso2Code for US
## Generating URL to request all 297 results
## Retrieving inflation data for US
## Generating URL to request all 61 results
countries.df <- priceR::show_countries()
## Generating URL to request all 297 results
Adjust the costs for inflation against 2019.
## takes an ungodly amount of time - but still preferable to extracting the data myself from whatever bureau of labor statistics there are, and getting a crash course in inflation rates and buying power
storms.post.1993 <- storms.post.1993 %>%
## property damage
mutate(propdmg_actual_2019 = priceR::adjust_for_inflation(propdmg_actual,
from_date = lubridate::year(bgn_date),
to_date = 2019,
country = "US",
inflation_dataframe = us.inflation.df, countries_dataframe = countries.df),
## crop damage
cropdmg_actual_2019 = priceR::adjust_for_inflation(cropdmg_actual,
from_date = lubridate::year(bgn_date),
to_date = 2019,
country = "US",
inflation_dataframe = us.inflation.df, countries_dataframe = countries.df))
Check out the results, to be on the safe side..
head(storms.post.1993 %>%
select(bgn_date, propdmg_actual, propdmg_actual_2019, cropdmg_actual, cropdmg_actual_2019), n = 10)
## # A tibble: 10 x 5
## bgn_date propdmg_actual propdmg_actual_20… cropdmg_actual cropdmg_actual_2…
## <date> <dbl> <dbl> <dbl> <dbl>
## 1 1994-02-09 0 0 0 0
## 2 1993-03-12 5000000000 8848828959. 0 0
## 3 1995-10-04 100000000 167772558. 10000000 16777256.
## 4 1994-04-15 50000 86240. 0 0
## 5 1994-06-26 5000000 8623964. 500000 862396.
## 6 1994-06-26 500000 862396. 0 0
## 7 1994-06-26 500000 862396. 0 0
## 8 1995-08-03 25000000 41943140. 1000000 1677726.
## 9 1995-11-11 50000 83886. 0 0
## 10 1995-10-03 48000000 80530828. 4000000 6710902.
tail(storms.post.1993 %>%
select(bgn_date, propdmg_actual, propdmg_actual_2019, cropdmg_actual, cropdmg_actual_2019), n = 10)
## # A tibble: 10 x 5
## bgn_date propdmg_actual propdmg_actual_20… cropdmg_actual cropdmg_actual_2…
## <date> <dbl> <dbl> <dbl> <dbl>
## 1 2011-11-01 25000 28414. 0 0
## 2 2011-11-26 2000 2273. 0 0
## 3 2011-11-22 5000 5683. 0 0
## 4 2011-11-22 5000 5683. 0 0
## 5 2011-11-09 2000 2273. 0 0
## 6 2011-11-09 5000 5683. 0 0
## 7 2011-11-23 600 682. 0 0
## 8 2011-11-13 1000 1137. 0 0
## 9 2011-11-01 2000 2273. 0 0
## 10 2011-11-30 7500 8524. 0 0
Check the number of distinct event types.
storms.post.1993 %>%
select(evtype) %>%
unique() %>%
tally()
## # A tibble: 1 x 1
## n
## <int>
## 1 485
There should be only 48 unique values, but instead there are 485. There are misspellings, alternate spellings, abbreviations, other non-standard terms used, etc.
First, convert all event types to lowercase, and remove extra white spaces from around/inside the strings - this should immediately deal with some of the duplication.
storms.post.1993 <- storms.post.1993 %>%
mutate(evtype = tolower(evtype),
evtype = str_squish(evtype))
Check them out, arranging them alphabetically.
storms.post.1993 %>%
pull(evtype) %>%
unique() %>%
sort()
## [1] "?" "agricultural freeze"
## [3] "apache county" "astronomical high tide"
## [5] "astronomical low tide" "avalance"
## [7] "avalanche" "beach erosion"
## [9] "black ice" "blizzard"
## [11] "blizzard/winter storm" "blowing dust"
## [13] "blowing snow" "breakup flooding"
## [15] "brush fire" "coastal erosion"
## [17] "coastal flood" "coastal flooding"
## [19] "coastal flooding/erosion" "coastal storm"
## [21] "coastal surge" "coastalstorm"
## [23] "cold" "cold air tornado"
## [25] "cold and snow" "cold and wet conditions"
## [27] "cold temperature" "cold wave"
## [29] "cold weather" "cold/wind chill"
## [31] "cold/winds" "cool and wet"
## [33] "dam break" "damaging freeze"
## [35] "dense fog" "dense smoke"
## [37] "downburst" "drought"
## [39] "drought/excessive heat" "drowning"
## [41] "dry microburst" "dry mircoburst winds"
## [43] "dust devil" "dust devil waterspout"
## [45] "dust storm" "dust storm/high winds"
## [47] "early frost" "erosion/cstl flood"
## [49] "excessive heat" "excessive rainfall"
## [51] "excessive snow" "excessive wetness"
## [53] "extended cold" "extreme cold"
## [55] "extreme cold/wind chill" "extreme heat"
## [57] "extreme wind chill" "extreme windchill"
## [59] "falling snow/ice" "flash flood"
## [61] "flash flood - heavy rain" "flash flood from ice jams"
## [63] "flash flood landslides" "flash flood winds"
## [65] "flash flood/" "flash flood/ street"
## [67] "flash flood/flood" "flash flood/landslide"
## [69] "flash flooding" "flash flooding/flood"
## [71] "flash flooding/thunderstorm wi" "flash floods"
## [73] "flood" "flood & heavy rain"
## [75] "flood flash" "flood/flash"
## [77] "flood/flash flood" "flood/flash/flood"
## [79] "flood/flashflood" "flood/rain/winds"
## [81] "flood/river flood" "flooding"
## [83] "flooding/heavy rain" "floods"
## [85] "fog" "fog and cold temperatures"
## [87] "forest fires" "freeze"
## [89] "freezing drizzle" "freezing fog"
## [91] "freezing rain" "freezing rain/sleet"
## [93] "freezing rain/snow" "freezing spray"
## [95] "frost" "frost/freeze"
## [97] "frost\\freeze" "funnel cloud"
## [99] "glaze" "glaze ice"
## [101] "glaze/ice storm" "gradient wind"
## [103] "grass fires" "ground blizzard"
## [105] "gustnado" "gusty wind"
## [107] "gusty wind/hail" "gusty wind/hvy rain"
## [109] "gusty wind/rain" "gusty winds"
## [111] "hail" "hail 0.75"
## [113] "hail 075" "hail 100"
## [115] "hail 125" "hail 150"
## [117] "hail 175" "hail 200"
## [119] "hail 275" "hail 450"
## [121] "hail 75" "hail damage"
## [123] "hail/wind" "hail/winds"
## [125] "hailstorm" "hard freeze"
## [127] "hazardous surf" "heat"
## [129] "heat wave" "heat wave drought"
## [131] "heat waves" "heavy lake snow"
## [133] "heavy mix" "heavy precipitation"
## [135] "heavy rain" "heavy rain and flood"
## [137] "heavy rain/high surf" "heavy rain/lightning"
## [139] "heavy rain/severe weather" "heavy rain/small stream urban"
## [141] "heavy rain/snow" "heavy rains"
## [143] "heavy rains/flooding" "heavy seas"
## [145] "heavy shower" "heavy snow"
## [147] "heavy snow and high winds" "heavy snow and strong winds"
## [149] "heavy snow shower" "heavy snow squalls"
## [151] "heavy snow-squalls" "heavy snow/blizzard"
## [153] "heavy snow/blizzard/avalanche" "heavy snow/freezing rain"
## [155] "heavy snow/high winds & flood" "heavy snow/ice"
## [157] "heavy snow/squalls" "heavy snow/wind"
## [159] "heavy snow/winter storm" "heavy snowpack"
## [161] "heavy surf" "heavy surf and wind"
## [163] "heavy surf coastal flooding" "heavy surf/high surf"
## [165] "heavy swells" "high"
## [167] "high seas" "high surf"
## [169] "high surf advisory" "high swells"
## [171] "high tides" "high water"
## [173] "high waves" "high wind"
## [175] "high wind (g40)" "high wind 48"
## [177] "high wind and seas" "high wind damage"
## [179] "high wind/blizzard" "high wind/heavy snow"
## [181] "high wind/seas" "high winds"
## [183] "high winds heavy rains" "high winds/"
## [185] "high winds/coastal flood" "high winds/cold"
## [187] "high winds/heavy rain" "high winds/snow"
## [189] "hurricane" "hurricane edouard"
## [191] "hurricane emily" "hurricane erin"
## [193] "hurricane felix" "hurricane gordon"
## [195] "hurricane opal" "hurricane opal/high winds"
## [197] "hurricane-generated swells" "hurricane/typhoon"
## [199] "hvy rain" "hyperthermia/exposure"
## [201] "hypothermia" "hypothermia/exposure"
## [203] "ice" "ice and snow"
## [205] "ice floes" "ice jam"
## [207] "ice jam flood (minor" "ice jam flooding"
## [209] "ice on road" "ice roads"
## [211] "ice storm" "ice storm/flash flood"
## [213] "ice/strong winds" "icy roads"
## [215] "lake effect snow" "lake flood"
## [217] "lake-effect snow" "lakeshore flood"
## [219] "landslide" "landslides"
## [221] "landslump" "landspout"
## [223] "late season snow" "light freezing rain"
## [225] "light snow" "light snowfall"
## [227] "lighting" "lightning"
## [229] "lightning and heavy rain" "lightning and thunderstorm win"
## [231] "lightning fire" "lightning injury"
## [233] "lightning thunderstorm winds" "lightning wauseon"
## [235] "lightning." "lightning/heavy rain"
## [237] "ligntning" "low temperature"
## [239] "major flood" "marine accident"
## [241] "marine hail" "marine high wind"
## [243] "marine mishap" "marine strong wind"
## [245] "marine thunderstorm wind" "marine tstm wind"
## [247] "microburst" "microburst winds"
## [249] "minor flooding" "mixed precip"
## [251] "mixed precipitation" "mud slide"
## [253] "mud slides" "mud slides urban flooding"
## [255] "mudslide" "mudslides"
## [257] "non tstm wind" "non-severe wind damage"
## [259] "non-tstm wind" "other"
## [261] "rain" "rain/snow"
## [263] "rain/wind" "rainstorm"
## [265] "rapidly rising water" "record cold"
## [267] "record heat" "record rainfall"
## [269] "record snow" "record/excessive heat"
## [271] "rip current" "rip currents"
## [273] "rip currents/heavy surf" "river and stream flood"
## [275] "river flood" "river flooding"
## [277] "rock slide" "rogue wave"
## [279] "rough seas" "rough surf"
## [281] "rural flood" "seiche"
## [283] "severe thunderstorm" "severe thunderstorm winds"
## [285] "severe thunderstorms" "severe turbulence"
## [287] "sleet" "sleet/ice storm"
## [289] "small hail" "small stream flood"
## [291] "snow" "snow accumulation"
## [293] "snow and heavy snow" "snow and ice"
## [295] "snow and ice storm" "snow freezing rain"
## [297] "snow squall" "snow squalls"
## [299] "snow/ bitter cold" "snow/ ice"
## [301] "snow/blowing snow" "snow/cold"
## [303] "snow/freezing rain" "snow/heavy snow"
## [305] "snow/high winds" "snow/ice"
## [307] "snow/ice storm" "snow/sleet"
## [309] "snow/sleet/freezing rain" "snowmelt flooding"
## [311] "storm force winds" "storm surge"
## [313] "storm surge/tide" "strong wind"
## [315] "strong winds" "thuderstorm winds"
## [317] "thundeerstorm winds" "thunderestorm winds"
## [319] "thundersnow" "thunderstorm"
## [321] "thunderstorm damage to" "thunderstorm hail"
## [323] "thunderstorm wind" "thunderstorm wind (g40)"
## [325] "thunderstorm wind 60 mph" "thunderstorm wind 65 mph"
## [327] "thunderstorm wind 65mph" "thunderstorm wind 98 mph"
## [329] "thunderstorm wind g50" "thunderstorm wind g52"
## [331] "thunderstorm wind g55" "thunderstorm wind g60"
## [333] "thunderstorm wind trees" "thunderstorm wind."
## [335] "thunderstorm wind/ tree" "thunderstorm wind/ trees"
## [337] "thunderstorm wind/awning" "thunderstorm wind/hail"
## [339] "thunderstorm wind/lightning" "thunderstorm winds"
## [341] "thunderstorm winds 13" "thunderstorm winds 63 mph"
## [343] "thunderstorm winds and" "thunderstorm winds g60"
## [345] "thunderstorm winds hail" "thunderstorm winds lightning"
## [347] "thunderstorm winds." "thunderstorm winds/ flood"
## [349] "thunderstorm winds/flooding" "thunderstorm winds/funnel clou"
## [351] "thunderstorm winds/hail" "thunderstorm winds53"
## [353] "thunderstorm windshail" "thunderstorm windss"
## [355] "thunderstorm wins" "thunderstorms"
## [357] "thunderstorms wind" "thunderstorms winds"
## [359] "thunderstormw" "thunderstormwinds"
## [361] "thunderstrom wind" "thundertorm winds"
## [363] "thunerstorm winds" "tidal flooding"
## [365] "tornado" "tornado f0"
## [367] "tornado f1" "tornado f2"
## [369] "tornado f3" "tornadoes"
## [371] "tornadoes, tstm wind, hail" "torndao"
## [373] "torrential rainfall" "tropical depression"
## [375] "tropical storm" "tropical storm alberto"
## [377] "tropical storm dean" "tropical storm gordon"
## [379] "tropical storm jerry" "tstm wind"
## [381] "tstm wind (41)" "tstm wind (g35)"
## [383] "tstm wind (g40)" "tstm wind (g45)"
## [385] "tstm wind 40" "tstm wind 45"
## [387] "tstm wind 55" "tstm wind 65)"
## [389] "tstm wind and lightning" "tstm wind damage"
## [391] "tstm wind g45" "tstm wind g58"
## [393] "tstm wind/hail" "tstm winds"
## [395] "tstmw" "tsunami"
## [397] "tunderstorm wind" "typhoon"
## [399] "unseasonable cold" "unseasonably cold"
## [401] "unseasonably warm" "unseasonably warm and dry"
## [403] "unseasonal rain" "urban and small"
## [405] "urban and small stream floodin" "urban flood"
## [407] "urban flooding" "urban floods"
## [409] "urban small" "urban/small stream"
## [411] "urban/small stream flood" "urban/sml stream fld"
## [413] "volcanic ash" "warm weather"
## [415] "waterspout" "waterspout tornado"
## [417] "waterspout-" "waterspout-tornado"
## [419] "waterspout/ tornado" "waterspout/tornado"
## [421] "wet microburst" "whirlwind"
## [423] "wild fires" "wild/forest fire"
## [425] "wild/forest fires" "wildfire"
## [427] "wildfires" "wind"
## [429] "wind and wave" "wind damage"
## [431] "wind storm" "wind/hail"
## [433] "winds" "winter storm"
## [435] "winter storm high winds" "winter storms"
## [437] "winter weather" "winter weather mix"
## [439] "winter weather/mix" "wintry mix"
Check out the ? and other strange entries.
storms.post.1993 %>%
filter(evtype %in% c("?", "apache county", "drowning", "high"))
## # A tibble: 4 x 15
## refnum state bgn_date end_date evtype fatalities injuries propdmg
## <dbl> <chr> <date> <date> <chr> <dbl> <dbl> <dbl>
## 1 189191 AZ 1994-07-30 NA apach… 0 0 5
## 2 192272 CA 1994-11-16 NA high 0 1 0
## 3 246124 WV 1994-02-09 1994-02-09 ? 0 0 5
## 4 474015 PA 2002-05-02 2002-05-02 drown… 1 0 0
## # … with 7 more variables: propdmgexp <chr>, cropdmg <dbl>, cropdmgexp <chr>,
## # propdmg_actual <dbl>, cropdmg_actual <dbl>, propdmg_actual_2019 <dbl>,
## # cropdmg_actual_2019 <dbl>
It’ll probably be best to drop these - I can’t tell what type of event they’re related to.
storms.post.1993 <- storms.post.1993 %>%
filter(!(evtype %in% c("?", "apache county", "drowning", "high")))
Some of the event types have special characters - brackets and slashes, etc. - remove them now (might reduce a little the number of strings to match).
## define the patterns to remove (as regular expressions). For added fun, most are special characters that need escaping.
remove.patterns <- c("\\(", "\\)", "\\/", "\\\\", "\\.", "\\&", "-")
## to avoid creating monstrous merged strings, I'm going to first replace these patterns with a white space, then trim the extra white spaces again
storms.post.1993 <- storms.post.1993 %>%
mutate(evtype = str_replace_all(evtype, pattern = paste(remove.patterns, collapse = "|"), replacement = " "),
evtype = str_squish(evtype))
Now let’s see how many remain..
storms.post.1993 %>%
select(evtype) %>%
unique() %>%
tally()
## # A tibble: 1 x 1
## n
## <int>
## 1 414
There are still too many. Matching them all to the NOAA list will be too painful, and is not really strictly necessary in this case. It will be better to define some broader categories of similar types, and classify the events into them instead.
I’m using the following:
* floods - including dam breaks and ice jams, but not coastal floods
* tornado-like events - over land, so excluding hurricanes and tropical storms
* wind/thunderstorm/rain - heavy winds and rains, thunderstorms and lightning; excluding heavy snow
* severe cold/winter weather - snow, blizzards, ice, freezing
* extreme heat/drought
* fires
* marine/coastal events - tide, surf, waves, coastal floods, hurricanes, tropical storms, typhoons and waterspouts, tsunamis
* others - smoke, ash, fog, dust storms, landslides, mudslides and rock slides.
Check which broad categories are roughly the most common - just to get a rough idea; this approach may contain duplicates or miss some entries due to all the misspellings.
## floods
storms.post.1993 %>%
filter(str_detect(evtype, "flood")) %>%
tally()
## # A tibble: 1 x 1
## n
## <int>
## 1 32485
## winds/rains/thunderstorms
storms.post.1993 %>%
filter(str_detect(evtype, paste(c("wind", "thunder", "rain"), collapse = "|"))) %>%
tally()
## # A tibble: 1 x 1
## n
## <int>
## 1 129871
## winter weather
storms.post.1993 %>%
filter(str_detect(evtype, paste(c("wint", "snow", "freez", "frost", "ice", "bliz"), collapse = "|"))) %>%
tally()
## # A tibble: 1 x 1
## n
## <int>
## 1 5204
## heat and drought
storms.post.1993 %>%
filter(str_detect(evtype, paste(c("heat", "dry", "drought", "warm"), collapse = "|"))) %>%
tally()
## # A tibble: 1 x 1
## n
## <int>
## 1 1334
## fires
storms.post.1993 %>%
filter(str_detect(evtype, "fire")) %>%
tally()
## # A tibble: 1 x 1
## n
## <int>
## 1 1259
## marine
storms.post.1993 %>%
filter(str_detect(evtype, paste(c("marine", "surf", "tide", "coast", "hurricane", "tropical", "typhoon", "waterspout"), collapse = "|"))) %>%
tally()
## # A tibble: 1 x 1
## n
## <int>
## 1 1491
## tornadoes
storms.post.1993 %>%
filter(str_detect(evtype, paste(c("tornado", "dust devil", "funnel", "landspout", "whirlwind"), collapse = "|"))) %>%
tally()
## # A tibble: 1 x 1
## n
## <int>
## 1 14087
So, winds/thunderstorms are the most common, then the floods, tornadoes, winter weather; heat, fires and marine-related events have similar numbers.
I will classify the event types one by one, assigning NA to anything NOT in the current category, and then I’ll coalesce all columns to get the new reclassified event types - effectively, this will remove the NAs by assigning to them the first non-NA value encountered. Therefore, the order in which the vectors are specified matters (when there are 2 or several non-NAs at the same position, the first one is retained over all others).
So, in order to be as unambiguous as possible in the reclassification, and avoid “losing” events to thunderstorms - the category with the most “variety” - I will go in this order: winter weather > heat/drought > floods > marine > tornado > winds > fire (and I’ll exclude whatever I think should go to a downstream category).
For the rest of the event types missed by these rules, I will check and assign them manually afterwards - either to a specific category or to “other”.
Note: this is unfortunately very manual. I looked into partial/fuzzy string matching at first, and tried various strategies/algorithms/distance metrics, read a whole lot of articles on the subject, but it soon became unmanageable and swallowed countless hours. In the end, I came back to this semi-manual way.
Define rules for matching events and reclassifying, and a helper function for applying them.
snow <- c("snow", "wint", "freez", "frost", "chill", "sleet", "glaze", "cold", "hypo", "avalan", "ice", "icy", "bliz")
heat <- c("drought", "warm", "heat", "hyper")
floods <- c("flood", "urban", "dam break", "ice jam")
marine <- c("tide", "beach", "coast", "surf", "seiche", "seas", "swell", "high water", "wave", "hurricane", "marine", "current", "tropical", "tsunami", "typhoon", "waterspout", "erosion")
tornado <- c("torn", "dust devil", "funnel", "landspout", "whirl")
winds <- c("rain", "wet", "hail", "precip", "light", "shower", "wind", "tstm", "thund", "gust", "burst", "mix")
## fires don't need a long rule
reclass_hlp <- function(include, exclude = NULL) {
if(is.null(exclude)) {
str_detect(storms.post.1993$evtype, paste(include, collapse = "|"))
} else {
str_detect(storms.post.1993$evtype, paste(include, collapse = "|")) & !str_detect(storms.post.1993$evtype, paste(exclude, collapse = "|"))
}
}
Test if the rules give adequate/expected results.
## winter weather
storms.post.1993 %>%
filter(str_detect(evtype, paste(snow, collapse = "|")), !str_detect(evtype, paste(c("tornado", "flood", "ice jam"), collapse = "|"))) %>%
pull(evtype) %>%
unique() %>%
sort()
## [1] "agricultural freeze" "avalance"
## [3] "avalanche" "black ice"
## [5] "blizzard" "blizzard winter storm"
## [7] "blowing snow" "cold"
## [9] "cold and snow" "cold and wet conditions"
## [11] "cold temperature" "cold wave"
## [13] "cold weather" "cold wind chill"
## [15] "cold winds" "damaging freeze"
## [17] "early frost" "excessive snow"
## [19] "extended cold" "extreme cold"
## [21] "extreme cold wind chill" "extreme wind chill"
## [23] "extreme windchill" "falling snow ice"
## [25] "fog and cold temperatures" "freeze"
## [27] "freezing drizzle" "freezing fog"
## [29] "freezing rain" "freezing rain sleet"
## [31] "freezing rain snow" "freezing spray"
## [33] "frost" "frost freeze"
## [35] "glaze" "glaze ice"
## [37] "glaze ice storm" "ground blizzard"
## [39] "hard freeze" "heavy lake snow"
## [41] "heavy rain snow" "heavy snow"
## [43] "heavy snow and high winds" "heavy snow and strong winds"
## [45] "heavy snow blizzard" "heavy snow blizzard avalanche"
## [47] "heavy snow freezing rain" "heavy snow ice"
## [49] "heavy snow shower" "heavy snow squalls"
## [51] "heavy snow wind" "heavy snow winter storm"
## [53] "heavy snowpack" "high wind blizzard"
## [55] "high wind heavy snow" "high winds cold"
## [57] "high winds snow" "hypothermia"
## [59] "hypothermia exposure" "ice"
## [61] "ice and snow" "ice floes"
## [63] "ice on road" "ice roads"
## [65] "ice storm" "ice strong winds"
## [67] "icy roads" "lake effect snow"
## [69] "late season snow" "light freezing rain"
## [71] "light snow" "light snowfall"
## [73] "rain snow" "record cold"
## [75] "record snow" "sleet"
## [77] "sleet ice storm" "snow"
## [79] "snow accumulation" "snow and heavy snow"
## [81] "snow and ice" "snow and ice storm"
## [83] "snow bitter cold" "snow blowing snow"
## [85] "snow cold" "snow freezing rain"
## [87] "snow heavy snow" "snow high winds"
## [89] "snow ice" "snow ice storm"
## [91] "snow sleet" "snow sleet freezing rain"
## [93] "snow squall" "snow squalls"
## [95] "thundersnow" "unseasonable cold"
## [97] "unseasonably cold" "winter storm"
## [99] "winter storm high winds" "winter storms"
## [101] "winter weather" "winter weather mix"
## [103] "wintry mix"
## heat/drought
storms.post.1993 %>%
filter(str_detect(evtype, paste(heat, collapse = "|"))) %>%
pull(evtype) %>%
unique() %>%
sort()
## [1] "drought" "drought excessive heat"
## [3] "excessive heat" "extreme heat"
## [5] "heat" "heat wave"
## [7] "heat wave drought" "heat waves"
## [9] "hyperthermia exposure" "record excessive heat"
## [11] "record heat" "unseasonably warm"
## [13] "unseasonably warm and dry" "warm weather"
## floods
storms.post.1993 %>%
filter(str_detect(evtype, paste(floods, collapse = "|")), !str_detect(evtype, paste(c("coast", "erosion"), collapse = "|"))) %>%
pull(evtype) %>%
unique() %>%
sort()
## [1] "breakup flooding" "dam break"
## [3] "flash flood" "flash flood flood"
## [5] "flash flood from ice jams" "flash flood heavy rain"
## [7] "flash flood landslide" "flash flood landslides"
## [9] "flash flood street" "flash flood winds"
## [11] "flash flooding" "flash flooding flood"
## [13] "flash flooding thunderstorm wi" "flash floods"
## [15] "flood" "flood flash"
## [17] "flood flash flood" "flood flashflood"
## [19] "flood heavy rain" "flood rain winds"
## [21] "flood river flood" "flooding"
## [23] "flooding heavy rain" "floods"
## [25] "heavy rain and flood" "heavy rain small stream urban"
## [27] "heavy rains flooding" "heavy snow high winds flood"
## [29] "ice jam" "ice jam flood minor"
## [31] "ice jam flooding" "ice storm flash flood"
## [33] "lake flood" "lakeshore flood"
## [35] "major flood" "minor flooding"
## [37] "mud slides urban flooding" "river and stream flood"
## [39] "river flood" "river flooding"
## [41] "rural flood" "small stream flood"
## [43] "snowmelt flooding" "thunderstorm winds flood"
## [45] "thunderstorm winds flooding" "tidal flooding"
## [47] "urban and small" "urban and small stream floodin"
## [49] "urban flood" "urban flooding"
## [51] "urban floods" "urban small"
## [53] "urban small stream" "urban small stream flood"
## [55] "urban sml stream fld"
## marine
storms.post.1993 %>%
filter(str_detect(evtype, paste(marine, collapse = "|")), !str_detect(evtype, paste(c("unseas", "torn"), collapse = "|"))) %>%
pull(evtype) %>%
unique() %>%
sort()
## [1] "astronomical high tide" "astronomical low tide"
## [3] "beach erosion" "coastal erosion"
## [5] "coastal flood" "coastal flooding"
## [7] "coastal flooding erosion" "coastal storm"
## [9] "coastal surge" "coastalstorm"
## [11] "cold wave" "dust devil waterspout"
## [13] "erosion cstl flood" "hazardous surf"
## [15] "heat wave" "heat wave drought"
## [17] "heat waves" "heavy rain high surf"
## [19] "heavy seas" "heavy surf"
## [21] "heavy surf and wind" "heavy surf coastal flooding"
## [23] "heavy surf high surf" "heavy swells"
## [25] "high seas" "high surf"
## [27] "high surf advisory" "high swells"
## [29] "high tides" "high water"
## [31] "high waves" "high wind and seas"
## [33] "high wind seas" "high winds coastal flood"
## [35] "hurricane" "hurricane edouard"
## [37] "hurricane emily" "hurricane erin"
## [39] "hurricane felix" "hurricane generated swells"
## [41] "hurricane gordon" "hurricane opal"
## [43] "hurricane opal high winds" "hurricane typhoon"
## [45] "late season snow" "marine accident"
## [47] "marine hail" "marine high wind"
## [49] "marine mishap" "marine strong wind"
## [51] "marine thunderstorm wind" "marine tstm wind"
## [53] "rip current" "rip currents"
## [55] "rip currents heavy surf" "rogue wave"
## [57] "rough seas" "rough surf"
## [59] "seiche" "storm surge tide"
## [61] "tropical depression" "tropical storm"
## [63] "tropical storm alberto" "tropical storm dean"
## [65] "tropical storm gordon" "tropical storm jerry"
## [67] "tsunami" "typhoon"
## [69] "waterspout" "wind and wave"
## tornado
storms.post.1993 %>%
filter(str_detect(evtype, paste(tornado, collapse = "|"))) %>%
pull(evtype) %>%
unique() %>%
sort()
## [1] "cold air tornado" "dust devil"
## [3] "dust devil waterspout" "funnel cloud"
## [5] "landspout" "thunderstorm winds funnel clou"
## [7] "tornado" "tornado f0"
## [9] "tornado f1" "tornado f2"
## [11] "tornado f3" "tornadoes"
## [13] "tornadoes, tstm wind, hail" "torndao"
## [15] "waterspout tornado" "whirlwind"
## wind/thunderstorm
storms.post.1993 %>%
filter(str_detect(evtype, paste(winds, collapse = "|"))) %>%
pull(evtype) %>%
unique() %>%
sort()
## [1] "cold and wet conditions" "cold wind chill"
## [3] "cold winds" "cool and wet"
## [5] "downburst" "dry microburst"
## [7] "dry mircoburst winds" "dust storm high winds"
## [9] "excessive rainfall" "excessive wetness"
## [11] "extreme cold wind chill" "extreme wind chill"
## [13] "extreme windchill" "flash flood heavy rain"
## [15] "flash flood winds" "flash flooding thunderstorm wi"
## [17] "flood heavy rain" "flood rain winds"
## [19] "flooding heavy rain" "freezing rain"
## [21] "freezing rain sleet" "freezing rain snow"
## [23] "gradient wind" "gustnado"
## [25] "gusty wind" "gusty wind hail"
## [27] "gusty wind hvy rain" "gusty wind rain"
## [29] "gusty winds" "hail"
## [31] "hail 0 75" "hail 075"
## [33] "hail 100" "hail 125"
## [35] "hail 150" "hail 175"
## [37] "hail 200" "hail 275"
## [39] "hail 450" "hail 75"
## [41] "hail damage" "hail wind"
## [43] "hail winds" "hailstorm"
## [45] "heavy mix" "heavy precipitation"
## [47] "heavy rain" "heavy rain and flood"
## [49] "heavy rain high surf" "heavy rain lightning"
## [51] "heavy rain severe weather" "heavy rain small stream urban"
## [53] "heavy rain snow" "heavy rains"
## [55] "heavy rains flooding" "heavy shower"
## [57] "heavy snow and high winds" "heavy snow and strong winds"
## [59] "heavy snow freezing rain" "heavy snow high winds flood"
## [61] "heavy snow shower" "heavy snow wind"
## [63] "heavy surf and wind" "high wind"
## [65] "high wind 48" "high wind and seas"
## [67] "high wind blizzard" "high wind damage"
## [69] "high wind g40" "high wind heavy snow"
## [71] "high wind seas" "high winds"
## [73] "high winds coastal flood" "high winds cold"
## [75] "high winds heavy rain" "high winds heavy rains"
## [77] "high winds snow" "hurricane opal high winds"
## [79] "hvy rain" "ice strong winds"
## [81] "light freezing rain" "light snow"
## [83] "light snowfall" "lighting"
## [85] "lightning" "lightning and heavy rain"
## [87] "lightning and thunderstorm win" "lightning fire"
## [89] "lightning heavy rain" "lightning injury"
## [91] "lightning thunderstorm winds" "lightning wauseon"
## [93] "marine hail" "marine high wind"
## [95] "marine strong wind" "marine thunderstorm wind"
## [97] "marine tstm wind" "microburst"
## [99] "microburst winds" "mixed precip"
## [101] "mixed precipitation" "non severe wind damage"
## [103] "non tstm wind" "rain"
## [105] "rain snow" "rain wind"
## [107] "rainstorm" "record rainfall"
## [109] "severe thunderstorm" "severe thunderstorm winds"
## [111] "severe thunderstorms" "small hail"
## [113] "snow freezing rain" "snow high winds"
## [115] "snow sleet freezing rain" "storm force winds"
## [117] "strong wind" "strong winds"
## [119] "thuderstorm winds" "thundeerstorm winds"
## [121] "thunderestorm winds" "thundersnow"
## [123] "thunderstorm" "thunderstorm damage to"
## [125] "thunderstorm hail" "thunderstorm wind"
## [127] "thunderstorm wind 60 mph" "thunderstorm wind 65 mph"
## [129] "thunderstorm wind 65mph" "thunderstorm wind 98 mph"
## [131] "thunderstorm wind awning" "thunderstorm wind g40"
## [133] "thunderstorm wind g50" "thunderstorm wind g52"
## [135] "thunderstorm wind g55" "thunderstorm wind g60"
## [137] "thunderstorm wind hail" "thunderstorm wind lightning"
## [139] "thunderstorm wind tree" "thunderstorm wind trees"
## [141] "thunderstorm winds" "thunderstorm winds 13"
## [143] "thunderstorm winds 63 mph" "thunderstorm winds and"
## [145] "thunderstorm winds flood" "thunderstorm winds flooding"
## [147] "thunderstorm winds funnel clou" "thunderstorm winds g60"
## [149] "thunderstorm winds hail" "thunderstorm winds lightning"
## [151] "thunderstorm winds53" "thunderstorm windshail"
## [153] "thunderstorm windss" "thunderstorm wins"
## [155] "thunderstorms" "thunderstorms wind"
## [157] "thunderstorms winds" "thunderstormw"
## [159] "thunderstormwinds" "thunderstrom wind"
## [161] "thundertorm winds" "thunerstorm winds"
## [163] "tornadoes, tstm wind, hail" "torrential rainfall"
## [165] "tstm wind" "tstm wind 40"
## [167] "tstm wind 41" "tstm wind 45"
## [169] "tstm wind 55" "tstm wind 65"
## [171] "tstm wind and lightning" "tstm wind damage"
## [173] "tstm wind g35" "tstm wind g40"
## [175] "tstm wind g45" "tstm wind g58"
## [177] "tstm wind hail" "tstm winds"
## [179] "tstmw" "tunderstorm wind"
## [181] "unseasonal rain" "wet microburst"
## [183] "whirlwind" "wind"
## [185] "wind and wave" "wind damage"
## [187] "wind hail" "wind storm"
## [189] "winds" "winter storm high winds"
## [191] "winter weather mix" "wintry mix"
It looks well enough.
Reclassify event types, then coalesce to form a new, hopefully clean variable.
storms.post.1993 <- storms.post.1993 %>%
mutate(snow = if_else(reclass_hlp(snow, excl = c("tornado", "flood", "ice jam")), "winter weather", NA_character_),
heat = if_else(reclass_hlp(heat), "heat/drought", NA_character_),
floods = if_else(reclass_hlp(floods, excl = c("coast", "erosion")), "floods", NA_character_),
marine = if_else(reclass_hlp(marine, excl = c("unseas", "torn")), "marine/coastal", NA_character_),
tornado = if_else(reclass_hlp(tornado), "tornado", NA_character_),
winds = if_else(reclass_hlp(winds), "thunderstorm/wind/rain", NA_character_),
fire = if_else(reclass_hlp(include = "fire"), "fire", NA_character_),
## make clean event type variable
evtypes_clean = coalesce(snow, heat, floods, marine, tornado, winds, fire))
Check the NAs in the clean event types to see if any of them should be manually added to a category; all the rest will be assigned to “other”.
storms.post.1993 %>%
filter(is.na(evtypes_clean)) %>%
pull(evtype) %>%
unique() %>%
sort()
## [1] "blowing dust" "dense fog" "dense smoke"
## [4] "dust storm" "fog" "landslide"
## [7] "landslides" "landslump" "ligntning"
## [10] "low temperature" "mud slide" "mud slides"
## [13] "mudslide" "mudslides" "other"
## [16] "rapidly rising water" "rock slide" "severe turbulence"
## [19] "storm surge" "volcanic ash"
I’m putting the misspelled lightning in thunderstorms, the low temperature - in winter weather, the storm surge - in marine; all the rest can be other.
storms.post.1993.cl <- storms.post.1993 %>%
mutate(evtypes_clean = case_when(is.na(evtypes_clean) & evtype == "ligntning" ~ "thunderstorm/wind/rain",
is.na(evtypes_clean) & evtype == "low temperature" ~ "winter weather",
is.na(evtypes_clean) & evtype == "storm surge" ~ "marine/coastal",
is.na(evtypes_clean) ~ "other",
TRUE ~ evtypes_clean))
Final check of the clean event types..
storms.post.1993.cl %>%
select(evtypes_clean) %>%
table()
## .
## fire floods heat/drought
## 1258 32957 1256
## marine/coastal other thunderstorm/wind/rain
## 2351 534 168811
## tornado winter weather
## 14087 5999
## subset and reshape df for making the plots
storms.df.plots <- storms.post.1993.cl %>%
select(evtypes_clean, propdmg_actual_2019, cropdmg_actual_2019, injuries, fatalities) %>%
pivot_longer(cols = c(propdmg_actual_2019, cropdmg_actual_2019, injuries, fatalities), names_to = "damage_type") %>%
mutate(damage_type = case_when(damage_type == "propdmg_actual_2019" ~ "property",
damage_type == "cropdmg_actual_2019" ~ "crops",
TRUE ~ damage_type))
## source the half-violin ggplot geom (originally from https://gist.github.com/dgrtwo/eb7750e74997891d7c20 , slightly modified to fix typos)
source("https://raw.githubusercontent.com/datavizpyr/data/master/half_flat_violinplot.R")
## plot property and crop damages
damage.plot <- ggplot(storms.df.plots %>% filter(damage_type %in% c("property", "crops")), aes(x = evtypes_clean, y = log(value + 1), fill = evtypes_clean)) +
geom_flat_violin(position = position_nudge(x = 0.3, y = 0)) +
geom_jitter(aes(color = evtypes_clean), width = 0.05, alpha = 0.25) +
facet_wrap(~damage_type) +
coord_flip() +
labs(x = "", y = "Damages in 2019 $ (log(y+1))") +
scale_fill_viridis_d() +
scale_color_viridis_d() +
theme(legend.position = "none",
plot.margin = margin(r = 0, l = 0, unit = "pt"))
## plot injuries and deaths
health.plot <- ggplot(storms.df.plots %>% filter(damage_type %in% c("injuries", "fatalities")), aes(x = evtypes_clean, y = log(value + 1), fill = evtypes_clean)) +
geom_flat_violin(position = position_nudge(x = 0.3, y = 0)) +
geom_jitter(aes(color = evtypes_clean), width = 0.05, alpha = 0.25) +
facet_wrap(~damage_type) +
coord_flip() +
labs(x = "", y = "Number of cases (log(y+1))") +
scale_fill_viridis_d() +
scale_color_viridis_d() +
theme(legend.position = "none",
plot.margin = margin(r = 0, l = 0, unit = "pt"))
Figure 1 shows the distribution of property and crop damages (A), and injuries and deaths (B) caused by extreme weather events in the USA in the period 1993-2011.
## NB for cross-referencing to work, chunk names MUST NOT have special characters (like _)
## make a panel plot with both economic and population health damages (library patchwork)
(combined.plot <- damage.plot/health.plot + plot_annotation(tag_levels = 'A')
)
Figure 1: Economic damages (as 2019 $) and health damages (as number of cases) caused by extreme weather events in the USA, 1993-2011. Values are log-transformed to better display the distribution.
Most often, extreme weather events did not result in serious property and crop damages, and did not cause many injuries or deaths. However, there were occasional very destructive events (e.g. tornadoes, floods, heat an droughts, cold weather), driving up the economic costs - most often by causing significant property damages - and the damages to population health.
## summarize the total damages to property or crops by type of event
damage.smry <- storms.post.1993.cl %>%
group_by(evtypes_clean) %>%
summarise(across(c(propdmg_actual_2019, cropdmg_actual_2019), list(min = min, median = median, mean = mean, max = max, total = sum), .names = "{.col}.{.fn}")) %>%
## join the total number of events in each group
left_join(storms.post.1993.cl %>% group_by(evtypes_clean) %>% tally(), by = "evtypes_clean")
On average, the most damages to property were caused by marine and coastal extreme weather events (Table 1). According to the definition used, these included hurricanes and tropical storms, which are often very destructive. Fires and floods also caused a lot of property damage on average. Judging by the distribution of the costs by event type, however (Fig. 1 A), most often the property damages were relatively small, and there were a few extremely destructive events - the medians for each group are vastly smaller than the means and maximum values.
damage.smry %>%
select(-contains(c("min", "total"))) %>%
arrange(desc(n)) %>%
kable(col.names = c("Event type", rep(c("median", "mean", "max"), times = 2), "Total number of events"), row.names = F, format.args = list(big.mark = ","), caption = "Property and crop damages (as 2019 $) caused by extreme weather events in the USA, 1993-2011.") %>%
kableExtra::kable_classic() %>%
kableExtra::add_header_above(c(" " = 1, "Property damages (2019 $)" = 3, "Crop damages (2019 $)" = 3, " " = 1))
|
Property damages (2019 $)
|
Crop damages (2019 $)
|
||||||
|---|---|---|---|---|---|---|---|
| Event type | median | mean | max | median | mean | max | Total number of events |
| thunderstorm/wind/rain | 7,841.851 | 311,871.4 | 4,194,313,956 | 0 | 51,562.39 | 313,674,045 | 168,811 |
| floods | 30,825.508 | 6,644,645.0 | 145,842,352,528 | 0 | 575,783.85 | 8,848,828,959 | 32,957 |
| tornado | 70,576.660 | 2,668,531.6 | 3,182,374,938 | 0 | 43,092.07 | 91,637,393 | 14,087 |
| winter weather | 30,695.773 | 3,342,040.7 | 8,848,828,959 | 0 | 2,370,281.06 | 8,623,964,131 | 5,999 |
| marine/coastal | 11,724.420 | 81,018,647.6 | 40,975,005,633 | 0 | 3,769,042.04 | 1,976,749,473 | 2,351 |
| fire | 72,192.418 | 9,497,961.2 | 2,226,980,981 | 0 | 433,208.02 | 125,469,618 | 1,258 |
| heat/drought | 0.000 | 1,200,793.0 | 896,601,852 | 0 | 17,133,709.24 | 1,268,194,370 | 1,256 |
| other | 32,290.977 | 852,522.5 | 73,179,004 | 0 | 57,923.94 | 25,363,887 | 534 |
The single events most destructive to property were a flood in CA on 2006-01-01 (damages for nearly 146 billion 2019 dollars) and a storm surge (marine-related event) in LA on 2005-08-29 (damages for almost 41 billion 2019 dollars).
The most damages to crops on average were caused by heat/drought, marine or coastal events, and winter weather (Table 1). Crop damages were much smaller than property damages. Here again, according to the distribution of the costs by event type (Fig. 1 A), most often the crop damages were relatively small (all medians are 0), with a few extremely destructive events - most often droughts, marine-related events, and cold weather (probably early frosts and snows).
The single events most destructive to crops were a river flood in IL on 1993-08-31 (damages for about 8.8 billion 2019 dollars) and a ice storm (winter weather) in MS on 1994-02-09 (damages for about 8.6 billion 2019 dollars).
## summarize the total injuries and deaths by type of event
health.smry <- storms.post.1993.cl %>%
group_by(evtypes_clean) %>%
summarise(across(c(injuries, fatalities), list(min = min, median = median, mean = mean, max = max, total = sum), .names = "{.col}.{.fn}")) %>%
## join the total number of events in each group
left_join(storms.post.1993.cl %>% group_by(evtypes_clean) %>% tally(), by = "evtypes_clean")
Injuries and deaths caused by extreme weather events in the USA were, fortunately, not too frequent. Most often, there were no injuries or fatalities (all medians are 0, and so are even the 3d quantiles), with several highly destructive events causing more damages to population health, which drives the total numbers up (Fig. 1 B).
On average, the most injuries were related to extreme heat (Table 2). The maximum number of injuries caused by a single event were associated with winter weather and tornadoes: an ice storm in OH on 1994-02-08 (1568 injuries) and a tornado in MO on 2011-05-22 (1150 injuries).
health.smry %>%
select(-contains(c("min", "total"))) %>%
arrange(desc(n)) %>%
kable(col.names = c("Event type", rep(c("median", "mean", "max"), times = 2), "Total number of events"), row.names = F, digits = 2, caption = "Injuries and deaths caused by extreme weather events in the USA, 1993-2011.") %>%
kableExtra::kable_classic() %>%
kableExtra::add_header_above(c(" " = 1, "Injuries" = 3, "Deaths" = 3, " " = 1))
|
Injuries
|
Deaths
|
||||||
|---|---|---|---|---|---|---|---|
| Event type | median | mean | max | median | mean | max | Total number of events |
| thunderstorm/wind/rain | 0 | 0.09 | 109 | 0 | 0.01 | 19 | 168811 |
| floods | 0 | 0.26 | 800 | 0 | 0.05 | 20 | 32957 |
| tornado | 0 | 1.66 | 1150 | 0 | 0.12 | 158 | 14087 |
| winter weather | 0 | 1.15 | 1568 | 0 | 0.23 | 14 | 5999 |
| marine/coastal | 0 | 1.19 | 780 | 0 | 0.46 | 32 | 2351 |
| fire | 0 | 1.28 | 150 | 0 | 0.07 | 14 | 1258 |
| heat/drought | 0 | 7.36 | 519 | 1 | 2.53 | 583 | 1256 |
| other | 0 | 2.95 | 78 | 0 | 0.28 | 14 | 534 |
On average, the most fatalities were related again to extreme heat (Table 2). The maximum number of deaths caused by a single event were associated with heat and tornadoes: a heat in IL on 1995-07-12 (583 deaths) and a tornado in MO on 2011-05-22 (158 deaths) - actually, the same one that also caused the highest total number of injuries.