ABOUT THIS DOCUMENT

This is the second assessment of the Reproducible Research course offered by Johns Hopkins University in partnership with Coursera. For this reason, the design and other choices regarding the creation and presentation of this document may seem somewhat peculiar, as the main goal was to meet the required assessment criteria.These criteria are not described here to prevent the document from becoming even more verbose and overloaded. Nonetheless, it is worth emphasizing that the focus of the course is research reproducibility.

A real analysis of this type would require a better disaggregation (e. g., by State) of the data, which was not required for this work and would make it much more complex, lengthy, and labor-intensive.


knitr::opts_chunk$set(echo = TRUE, fig.dpi = 400)
options(scipen=999)

#install.packages(c("tidyquant", "tidyverse", "R.utils"))

# These colors will be used after, to tie text to graph colors.

color_to_hex <- function(color_name) {
  
  rgb_vals <- col2rgb(color_name)
  rgb(rgb_vals[1,], rgb_vals[2,], rgb_vals[3,], maxColorValue = 255)
  
}

color_to_hex("brown4") #"#8B2323"
## [1] "#8B2323"
color_to_hex("orange4") #"#8B5A00"
## [1] "#8B5A00"
color_to_hex("gray30") #"#4D4D4D"
## [1] "#4D4D4D"
.big-text {
font-size: 16px;
font-weight: bold;
color:#333333
}

.text-col1 {
color:#8B2323;
font-weight: bold
}

.text-col2 {

font-weight: bold;
color:#8B5A00
}

.text-col3 {

font-weight: bold;
color:#4D4D4D
}
sessionInfo() # Important in case of error.
## R version 4.4.2 (2024-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26100)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=Portuguese_Brazil.utf8  LC_CTYPE=Portuguese_Brazil.utf8   
## [3] LC_MONETARY=Portuguese_Brazil.utf8 LC_NUMERIC=C                      
## [5] LC_TIME=Portuguese_Brazil.utf8    
## 
## time zone: America/Sao_Paulo
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.37     R6_2.5.1          fastmap_1.2.0     xfun_0.50        
##  [5] cachem_1.1.0      knitr_1.49        htmltools_0.5.8.1 rmarkdown_2.29   
##  [9] lifecycle_1.0.4   cli_3.6.3         sass_0.4.9        jquerylib_0.1.4  
## [13] compiler_4.4.2    rstudioapi_0.17.1 tools_4.4.2       evaluate_1.0.3   
## [17] bslib_0.9.0       yaml_2.3.10       rlang_1.1.5       jsonlite_1.8.9


1. SYNOPSIS

This analyses attempt to answer which types of weather events are most harmful with respect to population health and which have the greatest economic consequences across the United States. Therefore, there is no concern, for example, in analyzing each state individually, but rather in identifying the events that, considering the country as a whole, caused the greatest human and economic losses.

Data processing was done with the aid of the documentation provided by course mentor’s, instructor and an auxiliary set of reports automatically generated, which can be found in the previously mentioned section. Initially, some columns were dropped and January 1996 was set as the starting point of the analysis since the data is more complete from this year onward (according to the instructor). Date column required some processing to be properly interpreted as a date, economic losses needed treatments regarding to multipliers and monetary adjustment (set to January 2025).

The analysis covers a period of 15.9 years, during which property damage totaled approximately $602 billion, crop losses amounted to $60.5 billion, and there were 8,732 fatalities and 57,975 injuries.

Overall, the most significant events are associated with wind and water, such as floods, hurricanes, and storms. However, excessive heat stands out as the leading cause of fatalities, while rip currents also represent a major threat in terms of deaths. Regarding economic losses, hail, drought, wildfires, and ice storms emerge as significant events that are not primarily driven by the mechanical energy of wind or water but rather by other environmental and climatic factors.

Further research should focus on a more granular analysis, examining the distribution of events by location, identifying simultaneous occurrences, and assessing which types of events are more preventable.


2. DATA PROCESSING


2.1 Loading the Dataset

Initially the data must be downloaded and loaded inside R environment. The number of rows and columns also must be checked at the very beginning.


library(R.utils) # to use function bunzip to extract bz2 type of file

library(tidyverse)

Sys.setlocale("LC_TIME", "en_US.UTF-8") # This is to fix my R behavior translating dates to my mother language
## [1] "en_US.UTF-8"
#url.data <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"

#dir.create("ProjectData")

#download.file(url = url.data, destfile = "ProjectData/StormData.csv.bz2", mode = "wb")

#bunzip2("ProjectData/StormData.csv.bz2", "ProjectData/StormData.csv", remove = F)

url.df <- "ProjectData/StormData.csv"

df.raw <- read.csv(file = url.df,header = T) 

dim(df.raw) # There are 902297 rows x 37 columns. According to the instructor's post, it is correct.
## [1] 902297     37


2.2 Making Sense of the Variables

At this moment the columns identification must be done, aiming overall for a better understanding of the data and initial cleaning.

df.raw %>% str # Getting acquainted with the data.
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
sapply(df.raw, function(x) length(unique(x))) 
##    STATE__   BGN_DATE   BGN_TIME  TIME_ZONE     COUNTY COUNTYNAME      STATE 
##         70      16335       3608         22        557      29601         72 
##     EVTYPE  BGN_RANGE    BGN_AZI BGN_LOCATI   END_DATE   END_TIME COUNTY_END 
##        985        272         35      54429       6663       3647          1 
## COUNTYENDN  END_RANGE    END_AZI END_LOCATI     LENGTH      WIDTH          F 
##          1        266         24      34506        568        293          7 
##        MAG FATALITIES   INJURIES    PROPDMG PROPDMGEXP    CROPDMG CROPDMGEXP 
##        226         52        200       1390         19        432          9 
##        WFO STATEOFFIC  ZONENAMES   LATITUDE  LONGITUDE LATITUDE_E LONGITUDE_ 
##        542        250      25112       1781       3841       1729       3778 
##    REMARKS     REFNUM 
##     436774     902297
# The result of this function
# returned interesting values, but to this analysis many of them
# does not need to be addressed, therefore later must be removed.

# Other preliminary analysis is to calculate the size of columns

columns.size.mb <- sapply(df.raw, function(x) object.size(x) / (1024^2)) 

df.cols.size.mb <- data.frame (variable = names(columns.size.mb), size.mb = round(columns.size.mb,2))

df.cols.size.mb <- df.cols.size.mb %>% arrange(desc(size.mb))

rownames(df.cols.size.mb) <- NULL

df.cols.size.mb
##      variable size.mb
## 1     REMARKS  216.72
## 2   ZONENAMES   10.76
## 3  BGN_LOCATI   10.17
## 4  COUNTYNAME    9.07
## 5  END_LOCATI    8.95
## 6    BGN_DATE    8.13
## 7    END_DATE    7.39
## 8    END_TIME    7.10
## 9    BGN_TIME    7.09
## 10     EVTYPE    6.95
## 11        WFO    6.91
## 12 STATEOFFIC    6.90
## 13  TIME_ZONE    6.89
## 14      STATE    6.89
## 15    BGN_AZI    6.89
## 16    END_AZI    6.89
## 17 PROPDMGEXP    6.89
## 18    STATE__    6.88
## 19     COUNTY    6.88
## 20  BGN_RANGE    6.88
## 21 COUNTY_END    6.88
## 22  END_RANGE    6.88
## 23     LENGTH    6.88
## 24      WIDTH    6.88
## 25        MAG    6.88
## 26 FATALITIES    6.88
## 27   INJURIES    6.88
## 28    PROPDMG    6.88
## 29    CROPDMG    6.88
## 30 CROPDMGEXP    6.88
## 31   LATITUDE    6.88
## 32  LONGITUDE    6.88
## 33 LATITUDE_E    6.88
## 34 LONGITUDE_    6.88
## 35     REFNUM    6.88
## 36 COUNTYENDN    3.44
## 37          F    3.44


A quick glance at the variables suggests some conclusions. Given our objective, some variables seem unnecessary. Probably location variables, some specifications about weather events and other details not important to assess globally (the goal of this analysis) can be removed.

Be that as it may, it is necessary to proceed with further analyses and to make use of the provided documentation.


2.2.1. Documentation Provided by the Instructor

The dataset documentation provided by the instructor can be accessed here:

https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf

And here:

https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf


2.2.2. Mentor’s Course Post

Another very useful aid is this mentor’s course post:

https://www.coursera.org/learn/reproducible-research/discussions/forums/FZGy9Sj0Eea8jw6UvTi2Tw/threads/38y35MMiEeiERhLphT2-QA


2.2.3. My One-Size-Fits-All Approach to Help Get a Sense of the Data

Furthermore, I shared on my GitHub general analysis reports generated for each of the data.raw variables:

https://github.com/antoniovcm/WeatherEventAnalyses/tree/main/Preliminary.Variables.Analyses

These reports are generic, a one-size-fits-all approach to help get a sense of the data. The images generated in the reports are located inside the respective folders, labeled with variables names.

These steps can be taken in this order initially, but indeed they are an iterative process. Commenting beside the variable name is a good way to keep track and after retrieve the necessary information to treat the variables properly.


2.2.4. Preliminary Analyses of the Variables

STATE__ : Number identifying state, can be removed.


BGN_DATE : Begin date of weather event. Best variable to be used as date. According to mentor’s post:

“According to NOAA the data recording start from Jan. 1950. At that time they recorded one event type, tornado. They add more events gradually and only from Jan. 1996 they start recording all events type. Since our objective is comparing the effects of different weather events, do we need to include all years, even it has only single event type?”

BGN_DATE Necessary processing:

  1. Check if all dates have same format
  2. Format as date object
  3. Check summary and NAs


BGN_TIME : Same above, but hour, can be removed.

TIME_ZONE : Can be removed.

COUNTY : Can be removed.

COUNTYNAME : Can be removed.

STATE : Can be removed.


EVTYPE : Surely must be kept. Type of event. According to mentor’s post there is a mess here to be cleaned:

“The official events type are 48. However, if you use ‘unique’ function on ‘EVTYPE’ column you will get near one thousand events! All that is just typo. The regular expression (‘regex’, grepl, regexpr, gregexpr) and ‘tolower’, and ‘toupper’ functions can be a great help here”

EVTYPE Necessary processing:

  1. Harmonize all EVTYPEs to the original ones.
  2. Exclude irrelevant events if necessary.


BGN_RANGE : Can be removed.

BGN_AZI : Can be removed.

BGN_LOCATI : Can be removed.

END_DATE : Can be removed.

END_TIME : Can be removed.

COUNTY_END : Can be removed.

COUNTYENDN : Can be removed.

END_RANGE : Can be removed.

END_AZI : Can be removed.

END_LOCATI : Can be removed.

LENGTH : Can be removed.

WIDTH : Can be removed.

F : Can be removed.

MAG : Can be removed.

FATALITIES : Must be kept. Numbers, easy understanding. No processing required.

INJURIES : Must be kept. Numbers, easy understanding. No processing required.

PROPDMG : Must be kept. Property damage. Numbers, easy understanding. No processing required.


PROPDMGEXP : Must be kept. Represents a number by which PROPDMG must be multiplied.

PROPDMGEXP Necessary processing:

  1. Transform to numbers properly (check mentor’s post).


CROPDMG : Must be kept. Crop damage. Numbers, easy understanding. No processing required.


CROPDMGEXP : Must be kept. Represents a number by which CROPDMG must be multiplied.

CROPDMGEXP Necessary processing:

  1. Transform to numbers properly (check mentor’s post).


WFO : Can be useful. It represents Weather Forecast Office. Maybe I can consider the information more reliable when WFO is available. 142069 rows has “” (empty text) as value.

STATEOFFIC : Can be removed.

ZONENAMES : Can be removed.

LATITUDE : Can be removed.

LONGITUDE : Can be removed.

LATITUDE_E : Can be removed.

LONGITUDE_ : Can be removed.

REMARKS : Description of events, some of them really large. By itself, it accounts for almost half the size of the dataset. For now, will be removed. I will create a split dataset with only REMARKS and REFNUM as columns to access if necessary.

REFNUM : For now, it will be kept. It is just the index of the events.


2.3 Handling the Issues Found

This part of data processing will handle what has been found up to this point.

First, it is intelligent remove all possible columns.


index.col.remove <- -c(1, 3, 4, 5, 6,  7, 9:22, 30:36) # index of columns to remove

df2 <- df.raw[(index.col.remove)]


Now, BGN_DATE has to be analysed, treated and checked. Finally, values should be filtered starting from January 1996.


library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
range(nchar(df2$BGN_DATE)) # Looking for pattern.
## [1] 16 18
as.Date(df2$BGN_DATE, format = "%m/%d/%Y") %>% summary # The majority of the measurements are more recent. There is no NA values.
##         Min.      1st Qu.       Median         Mean      3rd Qu.         Max. 
## "1950-01-03" "1995-04-20" "2002-03-18" "1998-12-27" "2007-07-28" "2011-11-30"
df2$BGN_DATE <- as.Date(df2$BGN_DATE, format = "%m/%d/%Y")

class(df2$BGN_DATE) # Checking.
## [1] "Date"
df2 <- df2 %>% filter(BGN_DATE >= as.Date("1996-01-01")) # Filtering as stated before.


Now, CROPDMGEXP and PROPDMGEXP has to be analysed, treated and checked. Mentor’s post has good tips. In fact, the replacement of values was done relying entirely on the provided analysis.

One issue I just realized after advancing is whether the values are current or constant values. They were current values, then all values were adjusted to January 2025, considering BGN_DATE’s month as reference. So that the values from different years, and therefore with different purchasing powers, can be comparable.


library(tidyverse)
a <- c("h" = 100,
       "H" = 100,
       "k" = 1000,
       "K" = 1000,
       "m" = 1000000,
       "M" = 1000000,
       "b" = 1000000000,
       "B" = 1000000000,
       "+" = 1,
       "-" = 0,
       "?" = 0,
       # "" = 0, not allowed
       setNames(rep(10, 9), 0:8)
)

prop.multiplier <- a[df2$PROPDMGEXP] %>% unname()

na.prop.multiplier <- is.na(prop.multiplier)

prop.multiplier[na.prop.multiplier] <- 0


crop.multiplier <- a[df2$CROPDMGEXP] %>% unname()

na.crop.multiplier <- is.na(crop.multiplier)

crop.multiplier[na.crop.multiplier] <- 0


table(prop.multiplier)
## prop.multiplier
##          0         10       1000    1000000 1000000000 
##     276185          1     369938       7374         32
table(crop.multiplier)
## crop.multiplier
##          0       1000    1000000 1000000000 
##     373069     278686       1771          4
length(prop.multiplier)
## [1] 653530
length(crop.multiplier)
## [1] 653530
df2 <- df2 %>% mutate(PROP.MULTIPLIED =PROPDMG*prop.multiplier,
                      CROP.MULTIPLIED = CROPDMG*crop.multiplier)

# Dealing with inflation
library(tidyquant)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## ── Attaching core tidyquant packages ─────────────────────── tidyquant 1.0.11 ──
## ✔ PerformanceAnalytics 2.0.8      ✔ TTR                  0.24.4
## ✔ quantmod             0.4.26     ✔ xts                  0.14.1
## ── Conflicts ────────────────────────────────────────── tidyquant_conflicts() ──
## ✖ zoo::as.Date()                 masks base::as.Date()
## ✖ zoo::as.Date.numeric()         masks base::as.Date.numeric()
## ✖ dplyr::filter()                masks stats::filter()
## ✖ xts::first()                   masks dplyr::first()
## ✖ dplyr::lag()                   masks stats::lag()
## ✖ xts::last()                    masks dplyr::last()
## ✖ PerformanceAnalytics::legend() masks graphics::legend()
## ✖ quantmod::summary()            masks base::summary()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# CPI-U (Consumer Price Index for All Urban Consumers)
cpi_data <- tq_get("CPIAUCSL", get = "economic.data", from = "1996-01-01")

# Just taking a look.
#plot(cpi_data$date, y = cpi_data$price)


last.cpi.index <- cpi_data %>%
  arrange(desc(date)) %>% 
  slice(1) %>% 
  pull(price) # seems to be monthly

df2$FIRST.DAY.MONTH <- floor_date(df2$BGN_DATE, unit = "month")

df2 <- df2 %>% left_join( y = cpi_data[2:3], by = c("FIRST.DAY.MONTH" = "date"))

# Just taking a look
#plot(df2$FIRST.DAY.MONTH, y = df2$price)

df2 <- df2 %>% mutate(MONETARY.ADJUST = 1 + ((last.cpi.index - price)/price) )

df2 <- df2 %>% mutate(
  PROP.MULTIPLIED.ADJ = PROP.MULTIPLIED*MONETARY.ADJUST,
  CROP.MULTIPLIED.ADJ = CROP.MULTIPLIED*MONETARY.ADJUST)


Now is time to handle EVTYPE, event type. Documentation provided has the correct names of events. They are originally 48.


original.evtypes <- c("ASTRONOMICAL LOW TIDE",
                      "AVALANCHE",
                      "BLIZZARD",
                      "COASTAL FLOOD",
                      "COLD/WIND CHILL",
                      "DEBRIS FLOW",
                      "DENSE FOG",
                      "DENSE SMOKE",
                      "DROUGHT",
                      "DUST DEVIL",
                      "DUST STORM",
                      "EXCESSIVE HEAT",
                      "EXTREME COLD/WIND CHILL",
                      "FLASH FLOOD",
                      "FLOOD",
                      "FREEZING FOG",
                      "FROST/FREEZE",
                      "FUNNEL CLOUD",
                      "HAIL",
                      "HEAT",
                      "HEAVY RAIN",
                      "HEAVY SNOW",
                      "HIGH SURF",
                      "HIGH WIND",
                      "HURRICANE/TYPHOON",
                      "ICE STORM",
                      "LAKESHORE FLOOD",
                      "LAKE-EFFECT SNOW",
                      "LIGHTNING",
                      "MARINE HAIL",
                      "MARINE HIGH WIND",
                      "MARINE STRONG WIND",
                      "MARINE THUNDERSTORM WIND",
                      "RIP CURRENT",
                      "SEICHE",
                      "SLEET",
                      "STORM SURGE/TIDE",
                      "STRONG WIND",
                      "THUNDERSTORM WIND",
                      "TORNADO",
                      "TROPICAL DEPRESSION",
                      "TROPICAL STORM",
                      "TSUNAMI",
                      "VOLCANIC ASH",
                      "WATERSPOUT",
                      "WILDFIRE",
                      "WINTER STORM",
                      "WINTER WEATHER")

# Creating names as reference that will serve as index to numerate
# wrong EVTYPE's names.
names(original.evtypes) <- 1:length(original.evtypes)

# Harmonizing
original.evtypes <- toupper(original.evtypes) %>% str_squish()

# Harmonizing
df3 <- df2 %>%
  mutate(EVTYPE = str_squish(toupper(EVTYPE)))

# Getting EVTYPEs with all 0
EVTYPE_DROP1 <- df3 %>% 
  filter(!(EVTYPE %in% original.evtypes)) %>% 
  group_by(EVTYPE) %>% 
  summarise(Freq = n(),
            CROPT = sum(CROP.MULTIPLIED.ADJ),
            PROPT = sum(PROP.MULTIPLIED.ADJ),
            INJT = sum(INJURIES),
            FATT = sum(FATALITIES)) %>%
  filter(CROPT == 0, PROPT == 0, INJT == 0, FATT == 0) %>% pull(EVTYPE)

# Cleaning EVTYPEs with all 0.
df3 <- df3 %>% filter(!(EVTYPE %in% EVTYPE_DROP1))

# Getting the remaning not original EVTYPES
EVTYPE_DROP2 <- df3 %>% filter(!(EVTYPE %in% original.evtypes)) %>% pull(EVTYPE) %>% unique

# Looking if this can be ignored
df_t <- df3 %>% 
  filter(EVTYPE %in% EVTYPE_DROP2) %>% 
  group_by(EVTYPE) %>% 
  summarise(Freq = n(),
            CROPT = sum(CROP.MULTIPLIED.ADJ),
            PROPT = sum(PROP.MULTIPLIED.ADJ),
            INJT = sum(INJURIES),
            FATT = sum(FATALITIES))

(df_t$PROPT %>% sum)/10^9 # surely not
## [1] 108.4712
(df_t$CROPT %>% sum)/10^9
## [1] 9.879925
df_t$INJT %>% sum
## [1] 6557
df_t$FATT %>% sum
## [1] 1018
# To help writing numbers beside
format_dput <- function(vector.dput) {
  vector.dput <- sort(vector.dput)
  cat("c(\n")
  cat(paste0('  "', vector.dput, '" = ,\n', collapse = ""))
  cat(")\n")
}

# Would print too much making the RPubs even bigger unnecessarily.
# format_dput(EVTYPE_DROP2)

# Here the numerical names of original EVTYPES will be useful.
evtype.correction.numbers <-c(
  "AGRICULTURAL FREEZE" = 17,  # Related to frost affecting crops
  "ASTRONOMICAL HIGH TIDE" = 1,
  "BEACH EROSION" = 23,  # High surf contributes to erosion
  "BLACK ICE" = 16,  # Caused by freezing fog or rain
  "BLOWING DUST" = 11,  # Dust storm-related
  "BLOWING SNOW" = 22,  # Can lead to blizzards and low visibility
  "BRUSH FIRE" = 46,  # Wildfire category
  "COASTAL EROSION" = 23,  
  "COASTAL FLOODING" = 4,  
  "COASTAL FLOODING/EROSION" = 4,  
  "COASTAL STORM" = 47,  
  "COASTALSTORM" = 47,  
  "COLD" = 5,  
  "COLD AND SNOW" = 5,  
  "COLD TEMPERATURE" = 5,  
  "COLD WEATHER" = 5,  
  "DAM BREAK" = 15,  # Can trigger major flooding
  "DAMAGING FREEZE" = 17,  
  "DOWNBURST" = 39,  # Associated with strong thunderstorm winds
  "DROWNING" = 34,  # Most commonly due to rip currents
  "DRY MICROBURST" = 39,  
  "EARLY FROST" = 17,  
  "EROSION/CSTL FLOOD" = 4,  
  "EXCESSIVE SNOW" = 22,  
  "EXTENDED COLD" = 5,  
  "EXTREME COLD" = 13,  
  "EXTREME WINDCHILL" = 13,  
  "FALLING SNOW/ICE" = 22,  
  "FLASH FLOOD/FLOOD" = 14,  
  "FLOOD/FLASH/FLOOD" = 14,  
  "FOG" = 7,  
  "FREEZE" = 17,  
  "FREEZING DRIZZLE" = 16,  
  "FREEZING RAIN" = 16,  
  "FREEZING SPRAY" = 16,  
  "FROST" = 17,  
  "GLAZE" = 16,  # Thin ice layer from freezing rain
  "GRADIENT WIND" = 38,  
  "GUSTY WIND" = 38,  
  "GUSTY WIND/HAIL" = 39,  
  "GUSTY WIND/HVY RAIN" = 39,  
  "GUSTY WIND/RAIN" = 39,  
  "GUSTY WINDS" = 38,  
  "HARD FREEZE" = 17,  
  "HAZARDOUS SURF" = 23,  
  "HEAT WAVE" = 12,  
  "HEAVY RAIN/HIGH SURF" = 21,  
  "HEAVY SEAS" = 23,  
  "HEAVY SNOW SHOWER" = 22,  
  "HEAVY SURF" = 23,  
  "HEAVY SURF AND WIND" = 23,  
  "HEAVY SURF/HIGH SURF" = 23,  
  "HIGH SEAS" = 23,  
  "HIGH SURF ADVISORY" = 23,  
  "HIGH SWELLS" = 23,  
  "HIGH WATER" = 15,  
  "HIGH WIND (G40)" = 24,  
  "HIGH WINDS" = 24,  
  "HURRICANE" = 25,  
  "HURRICANE EDOUARD" = 25,  
  "HYPERTHERMIA/EXPOSURE" = 12,  
  "HYPOTHERMIA/EXPOSURE" = 13,  
  "ICE JAM FLOOD (MINOR" = 15,  
  "ICE ON ROAD" = 16,  
  "ICE ROADS" = 16,  
  "ICY ROADS" = 16,  
  "LAKE EFFECT SNOW" = 28,  
  "LANDSLIDE" = 6,  
  "LANDSLIDES" = 6,  
  "LANDSLUMP" = 6,  
  "LANDSPOUT" = 40,  # Similar to tornadoes
  "LATE SEASON SNOW" = 22,  
  "LIGHT FREEZING RAIN" = 16,  
  "LIGHT SNOW" = 22,  
  "LIGHT SNOWFALL" = 22,  
  "MARINE ACCIDENT" = 34,  
  "MARINE TSTM WIND" = 33,  
  "MICROBURST" = 39,  
  "MIXED PRECIP" = 48,  
  "MIXED PRECIPITATION" = 48,  
  "MUD SLIDE" = 6,  
  "MUDSLIDE" = 6,  
  "MUDSLIDES" = 6,  
  "NON-SEVERE WIND DAMAGE" = 38,  
  "NON-TSTM WIND" = 38,  
  "NON TSTM WIND" = 38,  
  "OTHER" = NA,  # No clear match
  "RAIN" = 21,  
  "RAIN/SNOW" = 48,
  "RECORD HEAT" = 12,
  "RIP CURRENTS" = 34,  
  "RIVER FLOOD" = 15,  
  "RIVER FLOODING" = 15,  
  "ROCK SLIDE" = 6,  
  "ROGUE WAVE" = 23,  
  "ROUGH SEAS" = 23,  
  "ROUGH SURF" = 23,  
  "SMALL HAIL" = 19,  
  "SNOW" = 22,  
  "SNOW AND ICE" = 22,  
  "SNOW SQUALL" = 22,  
  "SNOW SQUALLS" = 22,  
  "STORM SURGE" = 37,  
  "STRONG WINDS" = 38,  
  "THUNDERSTORM" = 39,  
  "THUNDERSTORM WIND (G40)" = 39,  
  "TIDAL FLOODING" = 4,  
  "TORRENTIAL RAINFALL" = 21,  
  "TSTM WIND" = 39,  
  "TSTM WIND (41)" = 39,  
  "TSTM WIND (G35)" = 39,  
  "TSTM WIND (G40)" = 39,  
  "TSTM WIND (G45)" = 39,  
  "TSTM WIND 40" = 39,  
  "TSTM WIND 45" = 39,  
  "TSTM WIND AND LIGHTNING" = 39,  
  "TSTM WIND G45" = 39,  
  "TSTM WIND/HAIL" = 39,  
  "TYPHOON" = 25,  
  "UNSEASONABLE COLD" = 5,  
  "UNSEASONABLY COLD" = 5,  
  "UNSEASONABLY WARM" = 12,  
  "UNSEASONAL RAIN" = 21,  
  "URBAN/SML STREAM FLD" = 15,  
  "WARM WEATHER" = 12,  
  "WET MICROBURST" = 39,  
  "WHIRLWIND" = 40,  
  "WILD/FOREST FIRE" = 46,  
  "WIND" = 38,  
  "WIND AND WAVE" = 32,  
  "WIND DAMAGE" = 38,  
  "WINDS" = 38,  
  "WINTER WEATHER MIX" = 48,  
  "WINTER WEATHER/MIX" = 48,  
  "WINTRY MIX" = 48  
)

# Creating new version.
df4 <- df3

# df4$EVTYPE[df4$EVTYPE  %in% EVTYPE_DROP2] gets all rows with the remaining wrong
# names of EVTYPEs. It becomes a vector. Putting it inside evtype.correction.numbers[]
# the EVTYPE's names work as keys. The result is a vector with the correct numbers
# of original EVTYPES since I put in evtype.correction.numbers considering original.evtypes.
correct.evtypes.index <- evtype.correction.numbers[df4$EVTYPE[df4$EVTYPE  %in% EVTYPE_DROP2]]

# With all wrong names having correct index numbers, putting them inside the
# numerated vector with only original EVTYPE names, it will bring the correspondent
# EVTYPE.
corrected.evtypes <- original.evtypes[correct.evtypes.index]


df4$EVTYPE[df4$EVTYPE  %in% EVTYPE_DROP2] <- corrected.evtypes

refnum.other <- df4 %>% filter(is.na(EVTYPE)) %>% pull(REFNUM)

# Would show to much text, then I put hash
# df.raw$REMARKS[refnum.other]

df4 %>% filter(is.na(EVTYPE)) %>% summarise(FATALITIES = sum(FATALITIES),
                                            INJURIES = sum(INJURIES),
                                            PROP.MULTIPLIED.ADJ = sum(PROP.MULTIPLIED.ADJ),
                                            CROP.MULTIPLIED.ADJ = sum(CROP.MULTIPLIED.ADJ))
##   FATALITIES INJURIES PROP.MULTIPLIED.ADJ CROP.MULTIPLIED.ADJ
## 1          0        4              113111             2064181
# There is no clear pattern, some values are really not available. Although it is possible to draw some conclusions, the fatality values are null, injuries are only 4, and the economic losses are low. I'll drop this.

df5 <- df4 %>% filter(!(is.na(EVTYPE)))


There were many EVTYPEs that differed from the 48 original categories. These non-standard EVTYPEs appeared in more than 100 000 rows. Despite some challenges in handling this issue, I believe a good job was done in addressing it. In the end, only 51 rows were excluded from the dataset.


Other Data Processing and Computations

This section is the end of the data processing.

In this study, for the purpose of assessing human damage, each fatality will be counted as equivalent to 50 injuries. Conversely, each injury will be counted as 1/50 of a fatality. The variable FATALITES.SCORE will represent this. For example, an event with 10 fatalities and 100 injuries will have a FATALITIES SCORE = 10 + 100/50 = 10 + 2 = 12


fatalities.equal.injuries <- 50

df5 <- df5 %>% 
  mutate(FATALITIES.SCORE = INJURIES/fatalities.equal.injuries + FATALITIES)


The next chunk of code calculates the number of days and years covered by the analysis and generates the table that will be used to create the plots showing total damage and fatalities score by month.


(total.days <- (max(df5$BGN_DATE) - min(df5$BGN_DATE)))
## Time difference of 5812 days
attr(total.days, "units" ) <- "years"

(total.years <- (total.days/365) %>% as.numeric())
## [1] 15.92329
df5.date <- df5 %>% mutate(BGN_MONTH = month(BGN_DATE),
                           BGN_YEAR = year(BGN_DATE)) %>% 
  group_by(BGN_MONTH, BGN_YEAR) %>%
  summarise(TOTAL.DMG.DAY = sum(PROP.MULTIPLIED.ADJ) + sum(CROP.MULTIPLIED.ADJ),
            FATALITIES.SCORE.DAY = sum(FATALITIES.SCORE)) %>% 
  arrange(BGN_YEAR, BGN_MONTH) %>% 
  mutate(BGN_MONTHS_YEARS = paste(month.abb[BGN_MONTH],", ", BGN_YEAR))
## `summarise()` has grouped output by 'BGN_MONTH'. You can override using the
## `.groups` argument.


The next chunk groups the results by event type (EVTYPE), summarizing the frequency of occurrences and the total number of fatalities, injuries, property damage, and crop damage.


df.summaries1 <- df5 %>%
  group_by(EVTYPE) %>% # 1
  summarise(
    FREQ.EVTYPE = n(), # 2
    FATALITIES.EVTYPE = sum(FATALITIES), # 3
    INJURIES.EVTYPE = sum(INJURIES), # 4
    PROPDMG.EVTYPE = round(sum (PROP.MULTIPLIED.ADJ)/1000000000, 3), # 5
    CROPDMG.EVTYPE = round(sum (CROP.MULTIPLIED.ADJ)/1000000000, 3), # 6
    MEAN.PROPDMG.EVTYPE = mean(PROP.MULTIPLIED.ADJ)/1000,
    MEAN.CROPDMG.EVTYPE = mean(CROP.MULTIPLIED.ADJ)/1000, # 7
    MEAN.FATALITIES.EVTYPE = mean(FATALITIES), # 8
    MEAN.INJURIES.EVTYPE = mean(INJURIES),
    FATALITIES.SCORE = sum(FATALITIES.SCORE)# 9
  ) %>% 
  mutate(TOTAL.DMG = PROPDMG.EVTYPE+CROPDMG.EVTYPE) %>%
  arrange(desc(TOTAL.DMG))


The next code chunk calculates some statistical summaries and other important numbers to presenting the results.


sum(df5$PROP.MULTIPLIED.ADJ)/10^9 # billions
## [1] 601.9409
sum(df5$CROP.MULTIPLIED.ADJ)/10^9
## [1] 60.49417
prop.losses.total <- round(sum(df5$PROP.MULTIPLIED.ADJ)/10^9,3)
crop.losses.total <- round(sum(df5$CROP.MULTIPLIED.ADJ)/10^9,3)
fatalities.total <- sum(df5$FATALITIES)
injuries.total <- sum(df5$INJURIES)
total.years
## [1] 15.92329
total.years.rounded <- round(total.years,1)
prop.losses.mean.year <- round(prop.losses.total/total.years, 3)
crop.losses.mean.year <- round(crop.losses.total/total.years, 3)
fatalities.mean.year <- round(fatalities.total/total.years, 0)
injuries.mean.year <- round(injuries.total/total.years, 0)

total.months <- nrow(df5.date)

prop.losses.mean.month <- round(prop.losses.total/total.months, 3)
crop.losses.mean.month <- round(crop.losses.total/total.months, 3)
prop.crop.losses.mean.month <- round((prop.losses.total + crop.losses.total)/total.months, 3)
fatalities.mean.month <- round(fatalities.total/total.months, 0)
injuries.mean.month <- round(injuries.total/total.months, 0)

min.date.format <- min(df5$BGN_DATE)
min.date.format <- format(min.date.format, "%d %b %Y")

max.date.format <- max(df5$BGN_DATE)
max.date.format <- format(max.date.format, "%d %b %Y")


3. RESULTS

After data processing, there are some results to be presented.


3.1 Impacts on Public Health and the Economy


Total Losses and Fatalities Over 15.9 Years

Considering the period from 01 jan 1996 to 30 nov 2011 (15.9 years), the total property damage amounted to $601.941 billions, while crop losses reached $60.494 billions. Additionally, there were 8732 fatalities and 57971 injuries.


Annual Average

On an annual average, these variables correspond to $37.803 billion in property damage, $3.799 billion in crop losses, 548 fatalities, and 3641 injuries.


Per Month

Below, it is possible to see how the total damage has evolved over the years.

Vertical dashed lines were drawn on dates close to hurricane formations. It is possible to observe an increase in total damages around the dates of the hurricanes, but without a doubt, no hurricane was as impactful as Katrina within the analyzed time frame regarding to damage.

Considering the orange/yellow lines, the TWO BILLIONS LINE and the 20 FATALITIES SCORE LINE, it is evident that weather events frequently cause more than $2 billion in economic losses and over 20 fatalities per month. Only a few months were below these thresholds, while many experienced significantly higher values.

In fact, over the span of 191 months, the monthly averages were 3.468 billions dollars of economic loss, 46 fatalities and 304 injuries.

Given this, it is likely that weather events deserve proper attention. The human and economic costs are very high, even on a monthly basis.


# dates close to hurricane formation
dates.days.hurricanes <- c(
  "1996-09-01",  # Hurricane Fran
  "1998-09-01",  # Hurricane Georges
  "1998-10-01",  # Hurricane Mitch
  "2005-08-01",  # Hurricane Katrina
  "2008-09-01"   # Hurricane Ike
)

dates.days.hurricanes <- as.Date(dates.days.hurricanes)

dates.months.hurricanes <- month(dates.days.hurricanes)

dates.years.hurricanes <- year(dates.days.hurricanes)

dates.months.years.hurricanes <- paste(month.abb[dates.months.hurricanes], ", ", dates.years.hurricanes)

par(mfrow = c (1,2))

# Plot Damage by Month
{
  plot(y=1:length(df5.date$BGN_MONTHS_YEARS), df5.date$TOTAL.DMG.DAY/10^9, type = "l", bty = "n",
       col.main = "gray20",
       col.lab = "gray20",
       col.axis = "gray20",
       col = "gray30",
       main = "Total Economic Losses by Month",
       ylab = "Month Year",
       xlab = "Total Economic Losses (Properties + Crops) ($ billions)",
       yaxt = "n",
       cex.lab = 0.8)
  abline(v= c(0,2), col=c("black", "orange4"))
  text(3,1, "TWO BILLIONS LINE", col = "orange4", cex = 0.7, adj = c(0,1))
  axis(2, at = 1:length(df5.date$BGN_MONTHS_YEARS), labels = df5.date$BGN_MONTHS_YEARS, las = 1, tick = FALSE, cex.axis=0.45)
  
  index.months.years.hurricanes <- which(df5.date$BGN_MONTHS_YEARS %in% dates.months.years.hurricanes)
  
  adjusts <- list(c(0.5, 0),  # up
                  c(0.5, 1),  # down
                  c(0.5, 0),  # up
                  c(0.5, 0),  # up
                  c(0.5, 0))  # up
  
  
  abline(h = index.months.years.hurricanes, lty = 2, lwd = 2,col = c(
    rep(adjustcolor("gray30", alpha.f = 0.5),3),
    "brown4",
    adjustcolor("gray30", alpha.f = 0.5)
  ))
  x.coord <- rep(100,5)
  y.coord <- index.months.years.hurricanes + 1.5*c(+1,-1,+1,+1,+1)
  colrs <- c("gray30", "gray30", "gray30", "brown4", "gray30")
  hurricanes.names <-  c("Fran", "Georges", "Mitch", "Katrina", "Ike")
  
  for (i in seq_along(hurricanes.names)) {
    text(x.coord[i], y.coord[i], labels = hurricanes.names[i], col = colrs[i], adj = adjusts[[i]], cex = 0.8)
  }
  
}

# Plot Fatalities by Month
{
  plot(y=1:length(df5.date$BGN_MONTHS_YEARS), df5.date$FATALITIES.SCORE.DAY, type = "l", bty = "n",
       col.main = "gray20",
       col.lab = "gray20",
       col.axis = "gray20",
       col = "gray30",
       main = "Fatalities Score by Month",
       ylab = "Month Year",
       xlab = "Fatalities Score",
       yaxt = "n",
       cex.lab = 0.8)
  
  abline(v= c(0,20), col=c("black", "orange4"))
  text(21,1, "20 FATALITIES SCORE LINE", col = "orange4", cex = 0.7, adj = c(0,1))
  
  axis(2, at = 1:length(df5.date$BGN_MONTHS_YEARS), labels = df5.date$BGN_MONTHS_YEARS, las = 1, tick = FALSE, cex.axis=0.45)
  
  index.months.years.hurricanes <- which(df5.date$BGN_MONTHS_YEARS %in% dates.months.years.hurricanes)
  
  adjusts <- list(c(0.5, 0),  # up
                  c(0.5, 1),  # down
                  c(0.5, 0),  # up
                  c(0.5, 0),  # up
                  c(0.5, 0))  # up
  
  
  abline(h = index.months.years.hurricanes, lty = 2, lwd = 2,col = c(
    rep(adjustcolor("gray30", alpha.f = 0.5),3),
    "brown4",
    adjustcolor("gray30", alpha.f = 0.5)
  ))
  x.coord <- rep(280,5)
  y.coord <- index.months.years.hurricanes + 1.5*c(+1,-1,+1,+1,+1)
  colrs <- c("gray30", "gray30", "gray30", "brown4", "gray30")
  hurricanes.names <-  c("Fran", "Georges", "Mitch", "Katrina", "Ike")
  
  for (i in seq_along(hurricanes.names)) {
    text(x.coord[i], y.coord[i], labels = hurricanes.names[i], col = colrs[i], adj = adjusts[[i]], cex = 0.8)
  }
  
}

The next chart shows the worst types of events based on total sums.

The first plot is divided into four quadrants, based on annual averages of at least 30 Fatalities Score and $0.3 billion in economic losses.

The most critical is HIGH LEVEL OF DAMAGE AND FATALITES, where all the most deadly and costly types of events are located.

At second level of importance are HIGH LEVEL OF FATALITIES and HIGH LEVEL OF DAMAGE. In this case, the event is considered either deadly or costly, but not both at the same time.

Finally, the LOW LEVEL OF DAMAGE AND FATALITIES quadrant represents the least impactful events.

par(mfrow = c(3,1))

# TOTAL DAMAGE VERSUS FATALITIES SCORE
{
  plot(df.summaries1$TOTAL.DMG, df.summaries1$FATALITIES.SCORE,
       main = "Toll on Human Lives and Economic Losses Caused by Weather Events",
       xlab = "Total Damage (billions $)",
       ylab = "Fatalities Score",
       pch = NA,cex = .55, col = ifelse(
         (df.summaries1$TOTAL.DMG>0.3*total.years & df.summaries1$FATALITIES.SCORE>30*total.years),"brown4",
         ifelse(
           (df.summaries1$TOTAL.DMG<=0.3*total.years & df.summaries1$FATALITIES.SCORE<=30*total.years), "gray30", "orange3"
         )),
       col.main = "gray20",
       col.lab = "gray20",
       col.axis = "gray20")
  
  abline(h = 30*total.years, col = "gray30", v = 0.3*total.years)
  
  text(df.summaries1$TOTAL.DMG, df.summaries1$FATALITIES.SCORE, labels = df.summaries1$EVTYPE, adj = c(0.5, 0.5),
       cex = ifelse(
         (df.summaries1$TOTAL.DMG>0.3*total.years & df.summaries1$FATALITIES.SCORE>30*total.years),0.85,
         ifelse(
           (df.summaries1$TOTAL.DMG<=0.3*total.years & df.summaries1$FATALITIES.SCORE<=30*total.years), 0.4, 0.6
         )),
       col = ifelse(
         (df.summaries1$TOTAL.DMG>0.3*total.years & df.summaries1$FATALITIES.SCORE>30*total.years),"brown4",
         ifelse(
           (df.summaries1$TOTAL.DMG<=0.3*total.years & df.summaries1$FATALITIES.SCORE<=30*total.years), "gray30", "orange3"
         )))
  
  
  
  text(75, 1250, labels = "HIGH LEVEL OF DAMAGE AND FATALITIES", cex = 1, col = "brown4", font = 2)
  text(125, 250, labels = "HIGH LEVEL OF DAMAGE", cex = 0.7, col = "orange3", font = 2)
  text(0, 1500, labels = "HIGH LEVEL\nOF\nFATALITIES", cex = 0.7, col = "orange3", font = 2)
  text(c(0), 400, labels = "LOW LEVEL\nOF DAMAGE\nAND\nFATALITIES", cex = 0.7, col = "gray30", font = 2)
}

# FATALITIES SCORE VERSUS FREQUENCY
{
  
  plot(df.summaries1$FREQ.EVTYPE, df.summaries1$FATALITIES.SCORE,
       main = "The Frequency of Weather Events and Their Toll on Human Lives.",
       xlab = "Frequency",
       ylab = "Fatalities Score",
       pch = NA,cex = .55, col = ifelse(
         (df.summaries1$FATALITIES.SCORE>30*total.years),"brown4","gray30"),
       col.main = "gray20",
       col.lab = "gray20",
       col.axis = "gray20")
  
  
  abline(h = 30*total.years, col = "gray30")
  
  text(df.summaries1$FREQ.EVTYPE, df.summaries1$FATALITIES.SCORE, labels = df.summaries1$EVTYPE, adj = c(0.5, 0.5),
       cex = ifelse(
         (df.summaries1$FATALITIES.SCORE>30*total.years),0.85, 0.6),
       col = ifelse(
         df.summaries1$FATALITIES.SCORE>30*total.years,"brown4","gray30"))
  
  text(10^5, 1250, labels = "HIGH LEVEL OF FATALITIES", cex = 0.9, col = "brown4", font = 2)
  
}

# TOTAL DAMAGE VERSUS FREQUENCY
{
  
  plot(df.summaries1$FREQ.EVTYPE, df.summaries1$TOTAL.DMG,
       main = "The Frequency of Weather Events and the Total Economic Damage They Cause",
       xlab = "Frequency",
       ylab = "Total Damage ($ Billion)",
       pch = NA,cex = .55, col = ifelse(
         (df.summaries1$FATALITIES.SCORE>30*total.years),"brown4","gray30"),
       col.main = "gray20",
       col.lab = "gray20",
       col.axis = "gray20")
  
  
  abline(h = 0.3*total.years, col = "gray30")
  
  text(df.summaries1$FREQ.EVTYPE, df.summaries1$TOTAL.DMG, labels = df.summaries1$EVTYPE, adj = c(0.5, 0.5),
       cex = ifelse(
         (df.summaries1$TOTAL.DMG>0.3*total.years),0.85, 0.6),
       col = ifelse(
         df.summaries1$TOTAL.DMG>0.3*total.years,"brown4","gray30"))
  
  text(10^5, 150, labels = "HIGH LEVEL OF TOTAL DAMAGE", cex = 0.9, col = "brown4", font = 2)
  
}


As seen on the figure with three plots, the major events regarding fatalities and economic losses at same time are:

These events are frequent (comparing to the others), and the authorities must study them carefully to prevent economic losses and human casualties. Probably, a considerable amount of fatalities, injuries and economic damages of theses events (except Thunderstorm Wind) are related to Hurricanes/Typhoons. According to the documentation (p. 48):


The hurricane/typhoon will usually include many individual hazards, such as storm tide, freshwater flooding, tornadoes, rip currents, etc. The Hurricane/Typhoon data header-strip will only include fatalities, injuries, and damage amounts associated with wind damage (the other hazards will already be reported in their respective Storm Data entries).


The public police manager must consider this. How can people be protected from strong winds and floods? An analysis by location could clarify which areas need preventive measures.


However, there are events that cause small economic losses but have a high level of fatalities compared to other types of events. They are:

It seems that, in general, these events have low frequency, but they likely have a small number of observations in the dataset despite their frequent occurrence. For example, Rip Currents and Lightning happen all the time. Do only significant occurrences of these events appear in the dataset?

According to the documentation (p. 65):

Rip currents will be listed in Storm Data only when they cause a drowning(s), near-drowning(s), result in numerous rescues (i.e., 5 or more at one beach community), or damage watercraft.


Later, on page 59:

Fatalities and injuries directly related to lightning strikes will be included in Storm Data. Report the specific location (see Table 2 in Section 2.6.1.2), gender and age of fatalities. If reliable estimates of lightning-related damages (such as costs associated with structural fires, equipment loss, and electrical power and/or communications outages) are available or can be made, they should be entered.


These cause of fatalities seem highly preventable. If they indeed are, there may be a mitigation of deaths through educational and informational means. However, the EVTYPE Lightning should be further investigated to determine whether the victims were tragically affected by misfortune.


On the other hand, there are events that cause a small number of fatalities, but the economic losses are high:

These events are low frequent in general, but Hail. Nevertheless, they are major cause of economic losses. As stated before, an analysis by location could provide deeper insight.


This overview provides only a starting point for considering which climate events were the most harmful during the analyzed time period.

Anyway, it is likely that a significant group of scientists dedicated to climate and specific issues, as well as people working in public safety (e.g., firefighters, lifeguards), have valuable insights on how to mitigate economic losses and human casualties. Their opinions, combined with data analysis in an iterative manner, should contribute to building a solid assessment of how to address the challenges posed by climate events.


The complete table can be checked below:


df.summaries2 <- df.summaries1 %>% arrange(EVTYPE)
knitr::kable(df.summaries2, digits = c(NA,
                                       0,
                                       0,
                                       0,
                                       3,
                                       3,
                                       0,
                                       0,
                                       5,
                                       1,
                                       1,
                                       0),
             align = "c",
             caption = "Statistical Summaries by Event Types",
             format = "simple",
             col.names = c("Event Type (EVTYPE)",
                           "Frequency",
                           "Fatalities",
                           "Injuries",
                           "Property Damage ($ billion)",
                           "Crop Damage ($ billion)",
                           "Mean Property Damage ($ thousand)",
                           "Mean Crop Damage ($ thousand)",
                           "Mean Fatalities",
                           "Mean Injuries",
                           "Fatalities Score",
                           "Total Damage ($ billion)"))
Statistical Summaries by Event Types
Event Type (EVTYPE) Frequency Fatalities Injuries Property Damage ($ billion) Crop Damage ($ billion) Mean Property Damage ($ thousand) Mean Crop Damage ($ thousand) Mean Fatalities Mean Injuries Fatalities Score Total Damage ($ billion)
ASTRONOMICAL LOW TIDE 277 0 0 0.017 0.000 60 0 0.00000 0.0 0.0 0
AVALANCHE 378 223 156 0.006 0.000 16 0 0.58995 0.4 226.1 0
BLIZZARD 2633 70 385 1.011 0.014 384 5 0.02659 0.1 77.7 1
COASTAL FLOOD 776 6 8 0.701 0.000 904 0 0.00773 0.0 6.2 1
COLD/WIND CHILL 608 132 24 0.004 0.061 7 101 0.21711 0.0 132.5 0
DEBRIS FLOW 619 43 55 0.513 0.032 828 51 0.06947 0.1 44.1 1
DENSE FOG 1725 69 855 0.036 0.000 21 0 0.04000 0.5 86.1 0
DENSE SMOKE 10 0 0 0.000 0.000 15 0 0.00000 0.0 0.0 0
DROUGHT 2433 0 4 1.839 23.873 756 9812 0.00000 0.0 0.1 26
DUST DEVIL 136 2 39 0.001 0.000 8 0 0.01471 0.3 2.8 0
DUST STORM 419 11 376 0.009 0.005 22 12 0.02625 0.9 18.5 0
EXCESSIVE HEAT 1818 1800 6480 0.013 0.774 7 426 0.99010 3.6 1929.6 1
EXTREME COLD/WIND CHILL 1829 264 108 0.054 2.577 30 1409 0.14434 0.1 266.2 3
FLASH FLOOD 51003 887 1674 25.602 2.210 502 43 0.01739 0.0 920.5 28
FLOOD 27756 447 6838 232.736 8.189 8385 295 0.01610 0.2 583.8 241
FREEZING FOG 353 12 272 0.007 0.000 18 0 0.03399 0.8 17.4 0
FROST/FREEZE 1471 1 3 0.033 2.212 22 1504 0.00068 0.0 1.1 2
FUNNEL CLOUD 6063 0 1 0.000 0.000 0 0 0.00000 0.0 0.0 0
HAIL 207766 7 723 24.313 4.291 117 21 0.00003 0.0 21.5 29
HEAT 716 237 1222 0.002 0.000 3 0 0.33101 1.7 261.4 0
HEAVY RAIN 11540 94 234 1.020 1.329 88 115 0.00815 0.0 98.7 2
HEAVY SNOW 14657 116 753 1.185 0.140 81 10 0.00791 0.1 131.1 1
HIGH SURF 1067 158 254 0.146 0.000 136 0 0.14808 0.2 163.1 0
HIGH WIND 19912 235 1083 8.688 1.091 436 55 0.01180 0.1 256.7 10
HURRICANE/TYPHOON 271 125 1328 136.348 9.448 503128 34865 0.46125 4.9 151.6 146
ICE STORM 1879 82 318 6.427 0.029 3421 15 0.04364 0.2 88.4 6
LAKE-EFFECT SNOW 656 0 0 0.062 0.000 95 0 0.00000 0.0 0.0 0
LAKESHORE FLOOD 23 0 0 0.011 0.000 466 0 0.00000 0.0 0.0 0
LIGHTNING 13204 651 4141 1.240 0.013 94 1 0.04930 0.3 733.8 1
MARINE HAIL 442 0 0 0.000 0.000 0 0 0.00000 0.0 0.0 0
MARINE HIGH WIND 135 1 1 0.002 0.000 14 0 0.00741 0.0 1.0 0
MARINE STRONG WIND 49 14 22 0.003 0.000 51 0 0.28571 0.4 14.4 0
MARINE THUNDERSTORM WIND 11987 19 34 0.010 0.000 1 0 0.00159 0.0 19.7 0
RIP CURRENT 736 544 505 0.000 0.000 1 0 0.73913 0.7 554.1 0
SEICHE 21 0 0 0.002 0.000 87 0 0.00000 0.0 0.0 0
SLEET 58 0 0 0.000 0.000 0 0 0.00000 0.0 0.0 0
STORM SURGE/TIDE 401 13 42 77.056 0.001 192159 3 0.03242 0.1 13.8 77
STRONG WIND 4195 133 403 0.294 0.106 70 25 0.03170 0.1 141.1 0
THUNDERSTORM WIND 211360 382 5154 13.370 1.743 63 8 0.00181 0.0 485.1 15
TORNADO 23159 1512 20667 39.643 0.468 1712 20 0.06529 0.9 1925.3 40
TROPICAL DEPRESSION 60 0 0 0.003 0.000 44 0 0.00000 0.0 0.0 0
TROPICAL STORM 682 57 338 13.539 1.164 19852 1707 0.08358 0.5 63.8 15
TSUNAMI 20 33 129 0.211 0.000 10567 1 1.65000 6.4 35.6 0
VOLCANIC ASH 23 0 0 0.001 0.000 39 0 0.00000 0.0 0.0 0
WATERSPOUT 3391 2 2 0.008 0.000 2 0 0.00059 0.0 2.0 0
WILDFIRE 4176 87 1458 13.276 0.678 3179 162 0.02083 0.3 116.2 14
WINTER STORM 11328 195 1294 2.454 0.023 217 2 0.01721 0.1 220.9 2
WINTER WEATHER 8234 68 588 0.043 0.022 5 3 0.00826 0.1 79.8 0