Statistical Analysis

Synopsis

The objective of this report is to analyse the harm caused to population health and economic health by extreme weather events in the US. We began by loading in the Storm Data dataset from the National Oceanic and Atmospheric Administration (NOAA) official website. We performed an extensive data cleanup - this entailed correcting the weather event names so that they conformed to the 48 official events listed in the NOAA documentation, quantifying the crop and property damage amounts, and removing unnecessary observations. We then performed data analysis to assess which events had the most impact on human health, and which had the greatest economic consequences. After showing the results of our analysis in the form of tables and graphs, we presented our results.

Data Processing

1. Loading The Data Set

Before downloading the Storm Data dump from the NOAA’s official website, we loaded in the following packages:

tidyverse: a set of packages for data cleaning, transformation and visualisation (tidyr, dplyr, ggplot, etc.)
stringr: a package for working with character strings in R
lubridate: a package for working with dates, datetimes, and times in R
stringdist: a package for approximate string matching based on a variety of statistical methods
kableExtra: a package for styling and formatting tables

###install the packages (ex. install.packages("xlsx")) if using for the first time
library(tidyverse)
library(stringr)
library(lubridate)
library(stringdist)
library(kableExtra)

We downloaded the file to our working directory. Then, we loaded it in R as the file stormData.

#download the file to your working directory, and then load the dataset
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
              destfile = "StormData.csv.bz2")
stormData <-read.csv("StormData.csv.bz2", stringsAsFactors = FALSE)

#tibbles are similar to data frames, and easier to use
stormData <- as_tibble(stormData)

stormData has 902297 rows and 37 columns. Some of the important columns that we used in our analysis include:

EVTYPE: the extreme weather event type (thunderstorm wind, flash flood, etc.)
PROPDMG & PROPDMGEXP: quantify the property damage caused by an extreme weather event in USD
CROPDMG & CROPDMGEXP: quantify the crop damage caused by an extreme weather event in USD

2. Subsetting Part 1 - Removing all observations prior to January 1996

The NOAA started collecting data for all 48 event names (alphabetically ranging from Astronomical Low Tide to Winter Weather) only from 1996 onward - from 1955 till 1995, data was only collected for tornadoes, thunderstorms and hail¹. Since our objective is to compare damages per event type, we removed data from before 1996 from the dataset.

###remove weather events beginning before the year 1996

#remove the hours:minutes:seconds signifier from the BGN_DATE variable
stormData$BGN_DATE <- str_replace(stormData$BGN_DATE, " 0:00:00", "")

#convert the BGN_DATE variable from character to date
stormData$BGN_DATE <- parse_date(stormData$BGN_DATE, "%m/%d/%Y")

#subset stormData to only include events beginning in 1996 or later
stormData <- stormData %>%
        filter(year(BGN_DATE) > 1995)

3. Subsetting Part 2 - Removing unnecessary variables

Since we used only a few variables in our data analysis, we removed columns that seemed extraneous to our objective. This had the benefit of making analysis faster by reducing the size of our dataset.

stormData <- stormData %>%
        select(-STATEOFFIC, -ZONENAMES, -REMARKS, -LATITUDE:-LONGITUDE_, 
               -TIME_ZONE, -BGN_AZI, -BGN_LOCATI, -contains("END"))

We now have 653530 rows and 20 columns in stormData.

4. Computing exact values for Property and Crop Damage

The values in the variables CROPDMG and PROPDMG are meant to be multiplied by the exponents in CROPDMGEXP and PROPDMGEXP, respectively. However, the latter variables contain values like K, M, B, or letters between 0 and 9. We used an online resource to understand that K stands for 1,000, M for 1,000,000, B for 1,000,000,000, and letters from 0-9 for 10². We then multiplied the values by the exponents.

###calculate exact values for CROPDMG and PROPDMG variables

#multiply the value in CROPDMG by the deduced value of the exponent symbols in CROPDMGEXP
stormData <- stormData %>%
        mutate(CROPDMG = ifelse(CROPDMGEXP == "K", CROPDMG * 1000,
                                ifelse(CROPDMGEXP == "M", CROPDMG * 10^6,
                                       ifelse(CROPDMGEXP == "B", 
                                              CROPDMG * 10^9, CROPDMG))))

#multiply the value in PROPDMG by the deduced value of the exponent symbols in PROPDMGEXP
stormData <- stormData %>%
        mutate(PROPDMG = ifelse(PROPDMGEXP == "K", PROPDMG * 1000,
                                ifelse(PROPDMGEXP == "M", PROPDMG * 10^6,
                                       ifelse(PROPDMGEXP == "B", 
                                              PROPDMG * 10^9, 
                                              ifelse(PROPDMGEXP == "0",
                                                     PROPDMG * 10, PROPDMG)))))

5. Reducing “EVTYPE” labels to 48 - Part 1

The variable EVTYPE specifies the weather event for each observation. The NOAA recognizes 48 different extreme weather events³. We first looked at whether the “EVTYPE” variable also listed 48 events.

###compute the total number of event names in stormData
stormData %>% count(EVTYPE) %>% dim() %>% .[1]

  [1] 516

There turned out to be as many as 516 different event names! This indicated thepossibility of typographical errors - it also meant that we would have to perform a comprehensive cleanup of the EVTYPE variable to reduce the total event names from 516 to 48.

Before proceeding with the cleanup, we looked through the event names in stormData to gain an idea of how to proceed with correcting them. While we only show the first 10 entries below, you can use the View() function to see the complete list.

###view the event names in the EVTYPE variable along with their frequency
kable(
        stormData %>% count(EVTYPE) %>% rename(FREQENCY = n) %>% head(10),
        align = "cc",
        col.names = c("EVTYPE", "Frequency")
) %>%
        kable_styling(full_width = TRUE)

EVTYPE	Frequency
HIGH SURF ADVISORY	1
COASTAL FLOOD	1
FLASH FLOOD	1
LIGHTNING	1
TSTM WIND	4
TSTM WIND (G45)	1
WATERSPOUT	1
WIND	1
ABNORMAL WARMTH	4
ABNORMALLY DRY	2

We noticed the following issues in the event names:

Some event names are listed in lower case, others in upper case
There are extra spaces at the beginning or end of some of the event names
Some event names are given in plural
There are parenthesis after some of the event names
Some event names are abbreviated
A significant number of entries do not have event names at all - they are in the format “Summary of xyz” (where xyz is usually a date)

In the first part of our cleanup operation, we looked to reduce the event names on the basis of the above observations.

A. Convert all “EVTYPE” entries to upper case:

stormData <- stormData %>%
        mutate(EVTYPE = str_to_upper(EVTYPE))

B. Replace the abbreviation TSTM with THUNDERSTORM:

stormData <- stormData %>%
        mutate(EVTYPE = str_replace(EVTYPE, "TSTM", "THUNDERSTORM"))

C. Remove parentheses and the characters inside them:

stormData <- stormData %>%
        mutate(EVTYPE = str_replace(EVTYPE, "\\(.{0,}$", ""))

D. Remove whitespaces from the beginning and end of a string:

stormData <- stormData %>%
        mutate(EVTYPE = str_trim(EVTYPE))

E. Replace 2 or more consecutive whitespaces with a single space:

stormData <- stormData %>%
        mutate(EVTYPE = str_replace(EVTYPE, "\\s{2,}", " "))

F. Remove plurals from the ends of “EVTYPE” entries:

stormData <- stormData %>%
        mutate(EVTYPE = str_replace(EVTYPE, "S$", ""),
               EVTYPE = str_trim(EVTYPE))

G. Remove observations in the format “Summary of xyz”:

stormData <- stormData %>%
        filter(!str_detect(EVTYPE, "SUMMARY[^\\d]+[\\d]+"))

Then, we computed the number of EVTYPE event names again.

###compute the total event names in the EVTYPE variable along with their frequency 
stormData %>% count(EVTYPE) %>% dim() %>% .[1]

  [1] 332

Thus, we managed to reduce the event names from 516 to 332 - a significant reduction.

We also note that our filtering out of “Summary of xyz” event names leaves stormData with 653455 rows.

6. Create the “events” file with the 48 correct event names

As we have remarked before, the total number of event names listed on the NOAA website total 48. However, there are still as many as 653455 in the stormData dataset.

We created a file that contained only the 48 event names. Since there was no ready dataset or file available, we copied and pasted all 48 names from page 6 of the NCDC Storm Data Preparation PDF. We then converted these names to upper case, so that we could match them to the “EVTYPE” variable in stormData later.

#Enter all 48 event names as recognized by the NOAA as a data frame
events <- tibble(name = c("Astronomical Low Tide","Avalanche","Blizzard","Coastal Flood",
                             "Cold/Wind Chill","Debris Flow","Dense Fog","Dense Smoke",
                             "Drought","Dust Devil","Dust Storm","Excessive Heat",
                             "Extreme Cold/Wind Chill","Flash Flood","Flood","Frost/Freeze",
                             "Funnel Cloud","Freezing Fog","Hail","Heat","Heavy Rain",
                             "Heavy Snow","High Surf","High Wind","Hurricane (Typhoon)","Ice Storm",
                             "Lake-Effect Snow","Lakeshore Flood","Lightning","Marine Hail",
                             "Marine High Wind","Marine Strong Wind","Marine Thunderstorm Wind",
                             "Rip Current","Seiche","Sleet","Storm Surge/Tide","Strong Wind",
                             "Thunderstorm Wind","Tornado","Tropical Depression","Tropical Storm","Tsunami",
                             "Volcanic Ash","Waterspout","Wildfire","Winter Storm","Winter Weather"))

#convert all event names to upper case
events <- events %>% mutate(name = str_to_upper(name))

7. Reducing “EVTYPE” labels to 48 - Part 2

Reducing EVTYPE labels further from 653455 to 48 required us to filter out a dataset from stormData with incorrect “EVTYPE” labels, provide the correct event names in the new dataset using various techniques, and then merge the datasets back together. The following sub-sections illustrate this process step-by-step.

A. Add a new column to stormData indicating whether the EVTYPE matches with the 48 event names:

#add a new column EVlogical which indicates whether the EVTYPE is "correct"
stormData <- stormData %>%
        mutate(EVlogical = EVTYPE %in% events$name)

#how many "correct" and "incorrect" event names in stormData?
stormData %>% count(EVlogical) %>% arrange(desc(n))

  # A tibble: 2 x 2
    EVlogical      n
    <lgl>      <int>
  1 TRUE      640033
  2 FALSE      13422

There are 13422 entries with event names that do not match the 48 event names in the NOAA classifications.

B. Create a new dataset (stormWIP) by filtering out “incorrect” EVTYPE entries:

#REFNUM is a unique key in stormData - this allows us to merge the datasets back later
stormWIP <- stormData %>%
        filter(!EVlogical) %>%
        select(EVTYPE, EVlogical, REFNUM)

C. METHOD 1: Manually label the top 10 entries in stormWIP:

None of the string matching methods we used in this section gave us perfect results - there are lots of similar event names in the NOAA dataset (ex. “COLD/WIND CHILL” and “EXTREME COLD/WIND CHILL”, or “FLASH FLOOD” and “FLOOD”), which make string matching functions somewhat unreliable. Hence, we started out by manually naming the 10 most common event names in the stormWIP dataset.

First, we computed the top 10 most common event names, and their frequency.

knitr::kable(
        stormWIP %>% count(EVTYPE) %>% arrange(desc(n)) %>% filter(row_number() <=10),
        align = "cc",
        col.names = c("EVTYPE", "Frequency")
) %>%
        kable_styling(full_width = TRUE)

EVTYPE	Frequency
URBAN/SML STREAM FLD	3392
WILD/FOREST FIRE	1443
WINTER WEATHER/MIX	1104
THUNDERSTORM WIND/HAIL	1028
EXTREME COLD	617
LANDSLIDE	590
FOG	532
SNOW	425
WIND	330
STORM SURGE	253

We manually changed these names by looking for the most likely event from the 48 in the events dataset for each entry. In some cases, that meant doing a little bit of research.

For example, we categorized “URBAN/SML STREAM FLD” as a “FLASH FLOOD” rather than a “FLOOD”. Flash floods are the result of quick flooding due to heavy rainfall, and are normally day-long events at most, while floods are a slower, more long-lasting phenomenon. Normally, flash floods are more likely in urban areas⁴.

We add the new names to a new column in the stormWIP dataset, EVNEW1.

stormWIP$EVNEW1 <- NA
stormWIP$EVNEW1[stormWIP$EVTYPE == "URBAN/SML STREAM FLD"] <- "FLASH FLOOD"
stormWIP$EVNEW1[stormWIP$EVTYPE == "WILD/FOREST FIRE"] <- "WILDFIRE"
stormWIP$EVNEW1[stormWIP$EVTYPE == "WINTER WEATHER/MIX"] <- "WINTER WEATHER"
stormWIP$EVNEW1[stormWIP$EVTYPE == "THUNDERSTORM WIND/HAIL"] <- "THUNDERSTORM WIND"
stormWIP$EVNEW1[stormWIP$EVTYPE == "EXTREME COLD"] <- "EXTREME COLD/WIND CHILL"
stormWIP$EVNEW1[stormWIP$EVTYPE == "LANDSLIDE"] <- "DEBRIS FLOW"
stormWIP$EVNEW1[stormWIP$EVTYPE == "FOG"] <- "DENSE FOG"
stormWIP$EVNEW1[stormWIP$EVTYPE == "SNOW"] <- "HEAVY SNOW"
stormWIP$EVNEW1[stormWIP$EVTYPE == "WIND"] <- "STRONG WIND"
stormWIP$EVNEW1[stormWIP$EVTYPE == "STORM SURGE"] <- "STORM SURGE/TIDE"

D. METHOD 2: Match EVTYPE in stormWIP with unique last names from the “events” file:

The logic of this method is based on an observation - for event names with more than one word in the events file, the last word usually describes the event, while the other words are adjectives (ex. “EXCESSIVE HEAT”, “HEAVY RAIN”, “HIGH SURF”). We took the following steps.

We extracted the last word from each event in the events file, and displayed the result as a separate column (lastWord) in the same file (first 5 results shown below).

#extract the last word from each event name in the "events" file
events <- events %>% mutate(lastWord = str_extract(name, "[^\\s/]+$"))

#display the first 5 results 
knitr::kable(
        head(events, 5),
        align = "cc",
        col.names = c("Event Name", "lastWord")
) %>%
        kable_styling(full_width = TRUE)

Event Name	lastWord
ASTRONOMICAL LOW TIDE	TIDE
AVALANCHE	AVALANCHE
BLIZZARD	BLIZZARD
COASTAL FLOOD	FLOOD
COLD/WIND CHILL	CHILL

Then, we identified those names in the lastWord column of events which occur more than once, and filtered them out. We named the new file uniqueLastName.

uniqueLastName <- events %>% count(lastWord) %>% filter(n==1)

Finally, we created a new file, eventsLast, by merging the events and uniqueLastName files on the common lastWord column, and filtered out the observations with missing values (“NA”) in the same column.

eventsLast  <- left_join(events, uniqueLastName) %>%
        filter(!is.na(n))

The eventsLast column has 22 observations. Each observation is an event type with a unique last name.

We used a for() loop and the str_detect() function as follows - if an EVTYPE in stormWIP matched with only 1 word from the lastWord column of eventsLast, then we entered the corresponding full event name from eventsLast (“eventsLast$name”) into the newly created variable EVNEW2 in stormWIP.

stormWIP$EVNEW2 <- NA
a <- vector("numeric", 1)

for (i in 1:nrow(stormWIP)){
        a <- sum(str_detect(stormWIP$EVTYPE[i], 
                            eventsLast$lastWord))
        if(a == 1){
                stormWIP$EVNEW2[i] <- eventsLast$name[str_detect(stormWIP$EVTYPE[i], 
                                                                     eventsLast$lastWord)]
        }
}

E. METHOD 3: Using the amatch() function in the “stringdist” package:

The stringdist package features a range of approximate string matching methods to match character strings to each other. The amatch() function takes a string and a lookup table (of strings), and indicates the position of the lookup table with the closest string match⁵.

We used the amatch() function with the “jaccard” method and 4 “qgrams”. This decides the closest string match based on which string in the lookup table has the highest proportion of matches of “4 simultaneous characters” with the other string in the function.

We mapped the “name” column from the events table to the new EVNEW3 variable in the stormWIP table using the amatch() function.

stormWIP$EVNEW3 <- NA
stormWIP <- stormWIP %>%
        mutate(EVNEW3 = events$name[amatch(EVTYPE, 
                                                 events$name,
                                                 method = "jaccard", 
                                                 maxDist = 0.7,
                                                 q = 4)])

F. Collate the final event names in a single column:

The first 10 columns of the stormWIP column are as follows.

knitr::kable(
        head(stormWIP, 10),
        align = "cc"
) %>%
        kable_styling(full_width = TRUE)

EVTYPE	EVlogical	REFNUM	EVNEW1	EVNEW2	EVNEW3
FREEZING RAIN	FALSE	248800	NA	HEAVY RAIN	FREEZING FOG
EXTREME COLD	FALSE	248801	EXTREME COLD/WIND CHILL	NA	EXTREME COLD/WIND CHILL
EXTREME COLD	FALSE	248802	EXTREME COLD/WIND CHILL	NA	EXTREME COLD/WIND CHILL
EXTREME COLD	FALSE	248803	EXTREME COLD/WIND CHILL	NA	EXTREME COLD/WIND CHILL
EXTREME WINDCHILL	FALSE	249414	NA	NA	EXTREME COLD/WIND CHILL
EXTREME COLD	FALSE	249416	EXTREME COLD/WIND CHILL	NA	EXTREME COLD/WIND CHILL
URBAN/SML STREAM FLD	FALSE	249831	FLASH FLOOD	NA	NA
URBAN/SML STREAM FLD	FALSE	249832	FLASH FLOOD	NA	NA
EXTREME COLD	FALSE	248889	EXTREME COLD/WIND CHILL	NA	EXTREME COLD/WIND CHILL
EXTREME COLD	FALSE	248892	EXTREME COLD/WIND CHILL	NA	EXTREME COLD/WIND CHILL

All 3 columns - EVNEW1, EVNEW2 and EVNEW3 - contain matches from the events file to the “EVTYPE” column in stormWIP.

We created a new column in stormWIP, EVNEW, where we mapped the final event name. EVNEW1 had the highest priority, since it was the result of manual matching. EVNEW3 was the result of an approximate string match algorithm, and thus had the least priority.

After creating the new variable, we removed all variables from stormWIP apart from REFNUM and EVNEW.

stormWIP <- stormWIP %>%
        mutate(EVNEW = ifelse(!is.na(EVNEW1), EVNEW1,
                              ifelse(!is.na(EVNEW2), EVNEW2, EVNEW3))) %>%
        select(REFNUM, EVNEW)

We then checked how many of the 13422 “incorrect” event names in stormWIP we managed to change.

sum(is.na(stormWIP$EVNEW))

  [1] 2023

Thus, only 2023 observations remain unresolved. Given that the total observations in our stormData dataset are 653455, we managed to name close to 99.7% of the total events.

G. Add the correct event names to stormWIP dataset:

Finally, we took the following 3 steps:

We used the left_join() function and the REFNUM column to merge the stormData and stormWIP datasets.

stormData <- left_join(stormData, stormWIP)

We changed the EVTYPE labels for the incorrect event names with the new names from the EVNEW variable. We then removed the 2023 observations whose correct event names we were not able to determine.

stormData <- stormData %>%
        mutate(EVTYPE = ifelse(EVlogical, EVTYPE, EVNEW)) %>%
        filter(!is.na(EVTYPE))

We confirmed that the total number of labels for the EVTYPE variable were now, finally, 48.

stormData %>%
        count(EVTYPE) %>%
        dim() %>%
        .[1]

  [1] 48

Analysis

1. Which events are most harmful to population health?

We defined harm to population health as the sum of deaths and injuries suffered. We looked at both the total harm that each event caused in the period under study (1996-2011), and also the average harm caused per individual instance of each event.

Total harm to population health:

We looked at the 5 events that caused the most harm to population health.

knitr::kable(
        stormData %>%
        rename(Event_Name = EVTYPE) %>%
        group_by(Event_Name) %>%
        summarize(Total_Events = as.double(n()),
                Harm_To_Population_Health = sum(FATALITIES, INJURIES, na.rm = TRUE)) %>%
        arrange(desc(Harm_To_Population_Health)) %>%
        head(5),
        align = "cc",
        col.names = c("Event Name", "Total Events", "Harm to Population Health")
) %>%
        kable_styling(full_width = TRUE)

Event Name	Total Events	Harm to Population Health
TORNADO	23155	22178
EXCESSIVE HEAT	1684	8190
FLOOD	24248	7172
THUNDERSTORM WIND	211210	5509
LIGHTNING	13205	4792

The top 5 events were tornadoes, excessive heat, floods, thunderstorms and lightning. We assessed the population harm caused by these 5 events over 4-year periods from 1996-2011 below.

stormData %>%
        filter(EVTYPE %in% c("TORNADO","EXCESSIVE HEAT","FLOOD",
                             "THUNDERSTORM WIND","LIGHTNING")) %>%
        mutate(YEAR = year(BGN_DATE),
               FOURYEAR = ifelse(YEAR < 2000, "1996-99",
                                 ifelse(YEAR < 2004, "2000-03", 
                                        ifelse(YEAR < 2008, "2003-07",
                                               "2008-11")))) %>%
        group_by(FOURYEAR, EVTYPE) %>%
        summarize(populationDamage = sum(FATALITIES, INJURIES, 
                                         na.rm = TRUE)) %>%
        ggplot(aes(FOURYEAR, populationDamage, fill = EVTYPE)) +
        geom_bar(stat ="identity", position = "dodge") +
        scale_fill_brewer(palette = "Set1",
                            labels = c("Ex. Heat", "Flood", "Lightning",
                                       "T'storm Wind", "Tornado"),
                            name = "EVENT TYPE") +
        theme(legend.text = element_text(size = 10)) +
        labs(title = "Impact of extreme weather events on human lives",
             subtitle = "Deaths + Injuries for top 5 weather events",
             x = "4-Year Periods", 
             y = "Fatalities + Injuries")

Tornadoes consistently caused harm from 1996-2011. On the other hand, the impact of floods was highest in the 1996-1999 period.

In the latest period under consideration (2008-2011), tornadoes caused more deaths and injuries than the rest of the top 5 events combined.

Average population harm per event:

We looked at the 5 events that caused the highest average harm per individual instance of each event.

knitr::kable(
        stormData %>%
        rename(Event_Name = EVTYPE) %>%
        group_by(Event_Name) %>%
        summarize(Total_Events = as.double(n()),
                  Harm_To_Population_Health = sum(FATALITIES, INJURIES, na.rm = TRUE),
                  Average_Population_Harm = round(Harm_To_Population_Health/Total_Events, 2)) %>%
        arrange(desc(Average_Population_Harm)) %>%
        head(5),
        align = "cc",
        col.names = c("Event Name", "Total Events", "Harm to Population Health", "Average Population Harm")
) %>%
        kable_styling(full_width = TRUE)

Event Name	Total Events	Harm to Population Health	Average Population Harm
TSUNAMI	20	162	8.10
HURRICANE (TYPHOON)	271	1453	5.36
EXCESSIVE HEAT	1684	8190	4.86
HEAT	716	1459	2.04
RIP CURRENT	734	1045	1.42

Tsunamis caused the greatest average population harm (at 8.1 deaths/injuries per tsunami), followed by hurricanes, excessive heat, heat, and rip currents. It is interesting to note that tornadoes, which caused the highest total deaths/injuries, do not feature in this category.

2. Which events have the greatest economic consequences?

We defined economic consequences as the total of crop and property damages (in USD million). We looked at both the total economic damage that each event caused in the period under study (1996-2011), and also the average damage caused per individual instance of each event.

Total economic damages:

We looked at the 5 events that caused the most economic damage.

knitr::kable(
        stormData %>%
        mutate(PROPDMG_M = round(PROPDMG/1000000),
               CROPDMG_M = round(CROPDMG/1000000)) %>%
        rename(Event_Name = EVTYPE) %>%
        group_by(Event_Name) %>%
        summarize(Total_Events = as.double(n()),
                  Total_Economic_Damage = sum(PROPDMG_M, CROPDMG_M, na.rm = TRUE)) %>%
        arrange(desc(Total_Economic_Damage)) %>%
        head(5),
        align = "cc",
        col.names = c("Event Name", "Total Events", "Total Economic Damage (USD million)")
) %>%
        kable_styling(full_width = TRUE)

Event Name	Total Events	Total Economic Damage (USD million)
FLOOD	24248	148344
HURRICANE (TYPHOON)	271	87064
STORM SURGE/TIDE	401	47819
TORNADO	23155	24054
HAIL	207716	16209

The top 5 events are floods, hurricanes, storm surges, tornadoes and hail.

We assessed the economic damage caused by these 5 events over 4-year periods from 1996-2011 below.

stormData %>%
        filter(EVTYPE %in% c("FLOOD","HURRICANE (TYPHOON)","STORM SURGE/TIDE",
                             "TORNADO","HAIL")) %>%
        mutate(YEAR = year(BGN_DATE),
               FOURYEAR = ifelse(YEAR < 2000, "1996-99",
                                 ifelse(YEAR < 2004, "2000-03", 
                                        ifelse(YEAR < 2008, "2003-07",
                                               "2008-11"))),
               PROPDMG_M = round(PROPDMG/1000000),
               CROPDMG_M = round(CROPDMG/1000000)) %>%
        group_by(FOURYEAR, EVTYPE) %>%
        summarize(propertyDamage = sum(PROPDMG_M, CROPDMG_M, na.rm = TRUE)) %>%
        ggplot(aes(FOURYEAR, propertyDamage, fill = EVTYPE)) +
        geom_bar(stat ="identity", position = "dodge") +
        scale_fill_brewer(palette = "Dark2",
                          labels = c("Flood", "Hail", "Hurricane",
                                     "Storm Surge", "Tornado"),
                          name = "EVENT TYPE") +
        theme(legend.text = element_text(size = 10)) +
        labs(title = "Economic impact of top 5 extreme weather events",
             subtitle = "Total damages recorded in USD million",
             x = "4-Year Periods", 
             y = "Property & Crop Damage (million USD)")

Most of the economic damage caused by the top 5 events took place in the 2003-2007 period, with floods alone causing damages of close to 125,000 million USD.

Average economic damage per individual event:

knitr::kable(
        stormData %>%
        mutate(PROPDMG_M = round(PROPDMG/1000000),
               CROPDMG_M = round(CROPDMG/1000000)) %>%
        rename(Event_Name = EVTYPE) %>%
        group_by(Event_Name) %>%
        summarize(Total_Events = as.double(n()),
                  Total_Economic_Damage = sum(PROPDMG_M, CROPDMG_M, na.rm = TRUE),
                  Average_Economic_Damage = round(Total_Economic_Damage/Total_Events, 2)) %>%
        arrange(desc(Average_Economic_Damage)) %>%
        head(5),
        align = "cc",
        col.names = c("Event Name", "Total Events", "Total Economic Damage (USD million)", 
                      "Average Economic Damage (USD million)")
) %>%
        kable_styling(full_width = TRUE)

Event Name	Total Events	Total Economic Damage (USD million)	Average Economic Damage (USD million)
HURRICANE (TYPHOON)	271	87064	321.27
STORM SURGE/TIDE	401	47819	119.25
TROPICAL STORM	682	8288	12.15
TSUNAMI	20	144	7.20
FLOOD	24248	148344	6.12

Hurricanes caused by far the greatest average economic damage (321 million USD per hurricane), followed by storm surges at 119. Tropical storms, tsunamis, and floods also came with considerable economic consequences.

Results

1. Harm to population health

Tornadoes, Excessive Heat, Floods, Thunderstorms and Lightning caused the greatest overall harm to population health.

Tsunamis, Hurricanes, Excessive Heat, Heat, and Rip Currents caused the greatest average harm to population health.

2. Economic Damages

Floods, Hurricanes, Storm Surges, Tornadoes and Hail caused the greatest economic damage.

Hurricanes, Storm Surges, Tropical Storms, Tsunamis, and Floods caused the greatest average economic damage.