SYNOPSIS

DATA PROCESSING

The data was downloaded to the report repository

fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if(!file.exists("StormData.bz2")) {
    download.file(fileURL, destfile = "StormData.bz2", method = "curl")
}

Steps involved in cleaning up of the data include,

  1. Reading Data : Storm data is read into R version 4.3.0 (2023-04-21 ucrt) using RStudio IDE
dat <- read.csv("StormData.bz2", header = TRUE, sep = ",")

This data set has dimensions 902297, 37 and contains the following variables

str(dat)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
  1. Converting all column names to lowercase for easier typing
names(dat) <- tolower(names(dat))
  1. Subsetting data : Subsetting the data set to only include data from January 1996 since this is when NOAA started monitoring all types of events. Only the relevant columns related to event type, beginning date, state of the location, fatalities, injuries, property damage and crop damage were selected
if(system.file(package = "dplyr") == "") install.packages("dplyr")
library(dplyr, quietly = TRUE, warn.conflicts = FALSE)
stormdat <- dat %>%
    mutate(bgn_date = as.Date(bgn_date, format = "%m/%d/%Y")) %>%
    filter(bgn_date >= "1996-01-01") %>%
    # Conervting all event types to lower case for easier comaprison later
    mutate(evtype = tolower(evtype)) %>%
    select(refnum, evtype, bgn_date, state__,
           fatalities, injuries,
           propdmg, propdmgexp, cropdmg, cropdmgexp)

Dimensions of this data set : 653530, 10

  1. Changing the representation of both damage types : Changing how the property and crop damage have been represented as number and an accompanying exponential to the actual number in US dollars representing the damage
# Creating the exponential to number conversion code
propdamagecode <- data.frame(code = unique(stormdat$propdmgexp),
                             key = c(10^3, 0, 10^6, 10^9, 0))
cropdamagecode <- data.frame(code = unique(stormdat$cropdmgexp),
                             key = c(10^3, 0, 10^6, 10^9))

# Changing exponential columns to numbers
stormdat$propdmgexp <- sapply(stormdat$propdmgexp, function(exp, code) {
    code$key[which(code$code == exp)]
}, propdamagecode)

stormdat$cropdmgexp <- sapply(stormdat$cropdmgexp, function(exp, code) {
    code$key[which(code$code == exp)]
}, cropdamagecode)

# Creating new columns for actual damage burden in dollars and removing the old ones
stormdat <- stormdat %>%
    mutate(propertyDamageBurden = propdmg*propdmgexp) %>%
    mutate(cropDamageBurden = cropdmg*cropdmgexp) %>%
    select(refnum, evtype, bgn_date, state__,
           fatalities, injuries,
           propertyDamageBurden, cropDamageBurden)
  1. Subsetting again : Subsetting further to only include the events which had some fatalities/injuries/property/crop damage
stormdat <- stormdat %>% filter(fatalities != 0 |
                                    injuries != 0 |
                                    propertyDamageBurden != 0 |
                                    cropDamageBurden != 0)

New dimensions of the data set : 201318, 8

  1. Correction of typos/mistakes in the event type column : This was done in the following steps,

    if(system.file(package = "rJava") == "") {install.packages("rJava")}
    library(rJava)
    
    # Downloading and loading the tabulizer package which allows to
    # extract tables from PDF files
    if(system.file(package = "tabulizer") == "") {
        remotes::install_github(c("ropensci/tabulizerjars", "ropensci/tabulizer"),
                            INSTALL_opts = "--no-multiarch")
    }
    library(tabulizer)
    
    # Extracting the table containing the official event list
    PDFfile <- "https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf"
    tabs <- extract_tables(PDFfile, pages = 6)
    eventlist <- as.vector(tabs[[1]][-(1:3),c(1,3)])
    • Getting the unique values from the event type column
    uniqueEvents <- unique(stormdat$evtype)
    • Replacing the term “tstm” with “thunderstorm” and removing the associated wind strength typed into the event type
    correctedUniqueEvents <- gsub("tstm", "thunderstorm", uniqueEvents)
    
    # Getting a vector of corrected event names with only "thunderstorm wind"
    thunderstormWindList <- strsplit(correctedUniqueEvents[grep("thunderstorm wind",
                                                            correctedUniqueEvents)], " ")
    
    # Removing the wind strength included with the event name
    thunderstormWindCorrected <- sapply(thunderstormWindList, function(name) {
    if(name[1] == "thunderstorm") {
        name[1:2]
    }else if(name[1] == "") {
        name[2:3]
    }else {
        name
    }
    })
    if(system.file(package = "stringr") == "") {install.packages("stringr")}
    library(stringr)
    thunderstormWindCorrected <- sapply(thunderstormWindCorrected, function(name) {
    paste(name[1], name[2], name[3], sep = " ")
    })
    thunderstormWindCorrected <- sub(" NA", "", thunderstormWindCorrected)
    
    # Replacing with the corrected names
    correctedUniqueEvents[grep("thunderstorm wind",
                           correctedUniqueEvents)] <- thunderstormWindCorrected
    • Replacing the term “heavy surf” with “high surf”
    correctedUniqueEvents <- gsub("heavy surf", "high surf", correctedUniqueEvents)
    • Replacing the term “hurricane edouard” with “hurricane” to avoid missing out this data
    correctedUniqueEvents <- gsub("hurricane edouard", "hurricane", correctedUniqueEvents)
    • Using the amatch() function from stringdist package to further whittle down any typos or mismatches, this function returns NA if no possible matches are found
    if(system.file(package = "stringdist") == "") {install.packages("stringdist")}
    library(stringdist)
    
    # Creating an index which will be used to replace values of corrected event
    # names based on the official event type list
    matchingIndex <- amatch(correctedUniqueEvents, eventlist, method = "osa", maxDist = 8)
    
    # Replacing the event names with the approximately matched names from
    # the official event type list
    correctedUniqueEvents <- sapply(matchingIndex, function(index) {
    eventlist[index]
    })
    
    # Replacing the NA values with the uncorrected raw values to not lose any data
    indNA <- which(is.na(correctedUniqueEvents))
    correctedUniqueEvents[indNA] <- uniqueEvents[indNA]

    Several values for the maxDist argument were tried, but the value 8 managed to achieve a good balance of NAs (22.58%) and corrections.

    • Replacing the event type values with the corrected event type matching the official list
    # Creating a replacement code
    eventsCode <- data.frame(code = uniqueEvents, key = correctedUniqueEvents)
    
    # Replacing the values
    stormdat$correctedEventType <- sapply(stormdat$evtype, function(name, code) {
    code$key[which(code$code == name)]
    }, eventsCode)

    Thus, unique number of event types were reduced from 186 to 90 which is pretty close to the number of event types in the official list (48)

  2. Changing the state fips column to represent the corresponding regions of the country : The state fips given can be used to generate a column of regions where the event took place. The regions used here include, South, West, Northeast, Midwest, Territories, and Maritime Areas. The state FIPS to state conversion was obtained from the Census Website. The state name to region conversion was obatined from the following github repository Github Repo which obtained its information from Census Region Division Map

# Downloading and reading in the state fips to state name conversion file
url <- "https://www2.census.gov/geo/docs/reference/state.txt"
if(!file.exists("state_fips.txt")) {
    download.file(url, "state_fips.txt", quiet = TRUE)
}
stateFips <- read.table("state_fips.txt", header = TRUE, sep = "|")
stateFips <- stateFips %>%
    select(STATE, STATE_NAME)
names(stateFips) <- tolower(names(stateFips))

# Rreading in the state name to region conversion file
url <- "https://raw.githubusercontent.com/cphalpert/census-regions/master/us%20census%20bureau%20regions%20and%20divisions.csv"
if(!file.exists("regions.csv")) {
    download.file(url, "regions.csv", quiet = TRUE)
}
regionsbyState <- read.csv("regions.csv", header = TRUE)
regionsbyState <- regionsbyState %>%
    select(State, Region)

# Extracting the unique state fips in the data set
uniqueStateFips <- unique(stormdat$state__)

# Matching the state fips in our data set with the official list
states <- match(uniqueStateFips, stateFips$state)

# The NAs in this represent the maritime areas
indNA <- which(is.na(states))

# Replacing the non NA values with the state names
states[-indNA] <- sapply(states[-indNA], function(index) {
    stateFips$state_name[index]
})

# Convert all the values into regions
regions <- sapply(states, function(state) {
    ifelse(is.na(state), "Maritime Areas",
           regionsbyState$Region[match(state, regionsbyState$State)])
})

# Now the NA values in this belong to the territories
regions[which(is.na(regions))] <- "Territories"

# Adding a region column to each event
regionCode <- data.frame(code = uniqueStateFips, key = as.vector(regions))
index <- match(stormdat$state__, regionCode$code)
stormdat$region <- regionCode$key[index]
  1. Final processing step : Removing the old event type column and the state fips column and keeping the corrected event types and regions only, refnum is kept as a reference for any values to the original data set and the beginning date is converted to year the event took place

    stormdat <- stormdat %>%
        mutate(year = format(bgn_date, "%Y")) %>%
        select(correctedEventType, year, region,
               fatalities, injuries,
               propertyDamageBurden, cropDamageBurden, refnum)

Thus the cleaned data set on which the analysis in this report is done

stormdat

str(stormdat)
## 'data.frame':    201318 obs. of  8 variables:
##  $ correctedEventType  : chr  "Winter Storm" "Tornado" "Thunderstorm Wind" "Thunderstorm Wind" ...
##  $ year                : chr  "1996" "1996" "1996" "1996" ...
##  $ region              : chr  "South" "South" "South" "South" ...
##  $ fatalities          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ injuries            : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ propertyDamageBurden: num  380000 100000 3000 5000 2000 400000 12000 8000 12000 75000 ...
##  $ cropDamageBurden    : num  38000 0 0 0 0 0 0 0 0 0 ...
##  $ refnum              : num  248768 248769 248770 248771 248772 ...

Code Book for this data set

ANALYSIS

The following analysis was done to arrive at the results,

Summarizing data by event type

Creating a summary data frame which summarizes the total number of fatalities and injuries, total property(in billions of US dollars) and crop damage(in billions of US dollars) burden and frequency of each event type. Mean values of each variable is also added to provide additional information on events which occur very rarely but have devastating effects. Total burden on population health and economy(in billions of US dollars) is also added

stormDataSummary <- stormdat %>%
    summarise(totalFatalities = sum(fatalities),
              meanFatalities = round(mean(fatalities), 2),
              totalInjuries = sum(injuries),
              meanInjuries = round(mean(injuries), 2),
              totalHealthBurden = totalFatalities + totalInjuries,
              totalPropertyDamageBurden = round(sum(propertyDamageBurden)/10^9, 2),
              totalCropDamageBurden = round(sum(cropDamageBurden)/10^9, 2),
              totalEconomicBurden = totalPropertyDamageBurden + totalCropDamageBurden,
              eventFrequency = length(correctedEventType),
              .by = correctedEventType)
# This will be used to figure out the top events

The top event types for effect on population health and economy were selected by first by getting the top 5 events for each sub category of effect like fatalities, injuries, property damage expenses, crop damage expenses and then both the lists were combined to make sure the effect is not being confounded by the sub categories

Analysis for the effects on population health

  • Top 5 Events with maximum fatalities
fatalityIndex <- order(stormDataSummary$totalFatalities, decreasing = TRUE)[1:5]
top5Fatalities <- stormDataSummary$correctedEventType[fatalityIndex]
  • Top 5 Events with maximum injuries
injuryIndex <- order(stormDataSummary$totalInjuries, decreasing = TRUE)[1:5]
top5Injuries <- stormDataSummary$correctedEventType[injuryIndex]
  • Most harmful events with respect to population health
topEventsbyHealth <- unique(c(top5Fatalities, top5Injuries))
  • Special Addition : Tsunami event type was also added to this since it had the maximum mean fatalities and injuries as compared to any other events but it did not make it into the top 5 list for either variable due to the low number of occurrences since the sum(total) statistic is highly influenced by the frequency of the event
topEventsbyHealth <- c(topEventsbyHealth, "Tsunami")
  • Creating the index for these events
healthIndex <- which(stormdat$correctedEventType %in% topEventsbyHealth)
  • Creating a table for these top events
ind1 <- which(stormDataSummary$correctedEventType %in% topEventsbyHealth)
topHealthEventsSummary <- stormDataSummary[ind1,] %>%
    select(correctedEventType, totalHealthBurden,
           totalFatalities, totalInjuries,
           meanFatalities, meanInjuries, eventFrequency) %>%
    arrange(desc(totalHealthBurden))
# This will be used later to create a table
  • Summarizing the data only for these events by event type for each year
# Calculating the total fatalities and injuries and creating a factor column
# to separate the 2 statistics
totalFatalities <- stormdat[healthIndex,] %>%
    summarise(total = sum(fatalities), .by = c("correctedEventType", "year")) %>%
    mutate(healthEffect = "Fatalities")
totalInjuries <- stormdat[healthIndex,] %>%
    summarise(total = sum(injuries), .by = c("correctedEventType", "year")) %>%
    mutate(healthEffect = "Direct Injuries")

# Combine the summaries
totalHealthEffect <- rbind(totalFatalities, totalInjuries)

# Changing the levels of event type to arrange in highest effect to lowest effect
    totalHealthEffect$correctedEventType <- factor(
        totalHealthEffect$correctedEventType,
        levels = topHealthEventsSummary$correctedEventType
        )
# This will be used later to create a plot

Analysis for the effects on economy

  • Top 5 Events with maximum property damage
propertyIndex <- order(stormDataSummary$totalPropertyDamageBurden, decreasing = TRUE)[1:5]
top5propertyDamage <- stormDataSummary$correctedEventType[propertyIndex]
  • Top 5 Events with maximum crop damage
cropIndex <- order(stormDataSummary$totalCropDamageBurden, decreasing = TRUE)[1:5]
top5cropDamage <- stormDataSummary$correctedEventType[cropIndex]
  • Most harmful events with respect to economy
topEventsbyEconomy <- unique(c(top5propertyDamage, top5cropDamage))
  • Creating the index for these events
economyIndex <- which(stormdat$correctedEventType %in% topEventsbyEconomy)
  • Creating a table for these top events
ind2 <- which(stormDataSummary$correctedEventType %in% topEventsbyEconomy)
topEconomyEventsSummary <- stormDataSummary[ind2,] %>%
    select(correctedEventType, totalPropertyDamageBurden,
           totalCropDamageBurden, totalEconomicBurden,
           eventFrequency) %>%
    arrange(desc(totalEconomicBurden))
# This will be used later to create a table
  • Summarizing the data only for these events by event type for each year
# Calculating the total property and crop damage in millions of dollars and creating a factor column
# to separate the 2 statistics
totalpropertyDamage <- stormdat[economyIndex,] %>%
    summarise(total = round(sum(propertyDamageBurden)/10^6, 2), .by = c("correctedEventType", "year")) %>%
    mutate(economyEffect = "Property Damage")
totalcropDamage <- stormdat[economyIndex,] %>%
    summarise(total = round(sum(cropDamageBurden)/10^6, 2), .by = c("correctedEventType", "year")) %>%
    mutate(economyEffect = "Crop Damage")

# Combine the summaries
totalEconomyEffect <- rbind(totalpropertyDamage, totalcropDamage)

# Changing the levels of event type to arrange in highest effect to lowest effect
totalEconomyEffect$correctedEventType <- factor(
    totalEconomyEffect$correctedEventType,
    levels = topEconomyEventsSummary$correctedEventType
)
# This will be used later to create a plot

Analysis of the top events by frequency and distribution across the United States

  • Most harmful events with respect to both health and economy
topEvents <- unique(c(topEventsbyHealth, topEventsbyEconomy))
  • Frequency of these top events for each year and distribution of across the country
topIndex <- which(stormdat$correctedEventType %in% topEvents)
frequencyByRegionYear <- stormdat[topIndex,] %>%
summarise(frequency = length(correctedEventType),
                  .by = c("correctedEventType", "year", "region"))
# This will be used later to create a plot

frequencyByYear <- summarise(stormdat[topIndex, ],
                             frequency = length(correctedEventType),
                             .by = c("year", "correctedEventType"))
# This will be used to draw some conclusions later

RESULTS

Effects on Population Health

# Installing if necessary and loading required packages
if(system.file(package = "tibble") == "") install.packages("tibble", quiet = TRUE)
if(system.file(package = "gt") == "") install.packages("gt", quiet = TRUE)
suppressPackageStartupMessages(library(gt, warn.conflicts = FALSE, quietly = TRUE))
## Warning: package 'gt' was built under R version 4.3.1
library(tibble)

# Creating a table object for top type of events most harmful for population health
table1 <- gt(as_tibble(topHealthEventsSummary)) %>%
    tab_header(
        title = md("**Top types of Events most harmful with respect to Population Health**"),
        subtitle = md("*January 1996 to 2011*")
    ) %>%
    tab_source_note(
        source_note = md("Reference : Storm Data collected by *U.S. National Oceanic and Atmospheric Administration (NOAA)*")
    ) %>%
    tab_spanner(
        label = md("**Fatalities**"),
        columns = c(totalFatalities, meanFatalities)
    ) %>%
    tab_spanner(
        label = md("**Direct Injuries**"),
        columns = c(totalInjuries, meanInjuries)
    ) %>%
    cols_label(
        correctedEventType = md("**Event Type**"),
        totalHealthBurden = md("**Total Health Burden**"),
        totalFatalities = md("*Total*"),
        totalInjuries = md("*Total*"),
        meanFatalities = md("*Mean*"),
        meanInjuries = md("*Mean*"),
        eventFrequency = md("**Frequency**")
    ) %>%
    tab_style(
        style = cell_borders(sides = "all", style = "solid"),
        locations = list(cells_body(columns = everything(), row = everything()),
                      cells_column_spanners(spanners = everything()),
                      cells_column_labels(columns = everything()))
    )

# Printing the table
table1
Top types of Events most harmful with respect to Population Health
January 1996 to 2011
Event Type Total Health Burden Fatalities Direct Injuries Frequency
Total Mean Total Mean
Tornado 22204 1515 0.12 20689 1.67 12384
Flood 8309 514 0.05 7795 0.80 9743
Excessive Heat 8190 1797 2.53 6393 9.00 710
Thunderstorm Wind 5509 379 0.00 5130 0.05 105374
Lightning 4796 653 0.06 4143 0.37 11294
Flash Flood 2562 888 0.05 1674 0.09 19094
Rip Current 1045 542 0.90 503 0.83 603
Tsunami 162 33 2.36 129 9.21 14
Reference : Storm Data collected by U.S. National Oceanic and Atmospheric Administration (NOAA)

Conclusions :

  • Tornado events top the list and account for 33.29% of the burden on the population health due to all event types with 93.18% of this burden being from direct injuries
  • On the other hand, Thunderstorm Wind events are 9 times more likely to occur and account for 8.26% of the burden on the population health with 93.12% of this burden being from direct injuries
  • All the top types of events have greater number of direct injuries as compared to fatalities and most of them have average fatalities and injuries less than 2 per event except, Tsunami and Excessive Heat events but this could most likely be due to lower frequency of these 2 events(Check plot on Frequency of top events by year and region below). Nonetheless, these 2 types of events should not be ignored since they both have a average fatality and injury per event more than 2 and 9 respectively

Summary of these top type of events across the years from January 1996 to 2011

if(system.file(package = "ggplot2") == "") install.packages("ggplot2", quiet = TRUE)
if(system.file(package = "lemon") == "") install.packages("lemon", quiet = TRUE)
library(ggplot2)
suppressPackageStartupMessages(library(lemon))
## Warning: package 'lemon' was built under R version 4.3.1
# Defining the first plot with years on x axis, total effect on health on y axis,
# color fill of the bar plots to fatalities or injuries
plot1 <- ggplot(totalHealthEffect, aes(year, total, fill = healthEffect)) +
    # Plotting the bar plot
    geom_col() +
    # Dividing the plot in panels for each type of event
    facet_wrap("correctedEventType", scales = "free") +
    # Flipping to the plots 90 degrees to the right
    coord_flip() +
    # Changing the theme to black and white and increasing the text size
    theme_bw(base_size = 40) +
    # Adds descriptive titles and axes labels
    labs(title = "Fatalities and Injuries for each top types of events for each year",
         x = "Years", y = "Counts", fill = "",
         caption = "Reference : Storm Data collected by U.S. National Oceanic and Atmospheric Administration (NOAA)") +
    # Changing the size of title, axes labels and legend
    theme(title = element_text(size = 60), legend.text = element_text(size = 60),
          plot.caption = element_text(face = "italic"))


# Printing the plot with Changed position of the legend to one of the empty panels
reposition_legend(plot1, panel = "panel-3-3", position = "center")

Conclusions :

  • Most of the top types of events have seen a general decreasing trend of effect on population health from 1996 to 2011, except Tornado events, which saw a rapid increase in the effect due to the 2011 Super Outbreak when the frequency of Tornado events also increased (Check plot on Frequency of top events by year and region below)
  • The differences between fatalities and direct injuries for most of the top types of events is large with direct injuries always more than fatalities, except Rip Current events which have a fairly equal proportions of fatalities and direct injuries, with fatalities sometimes being more than direct injuries (See years 2000, 2005, 2007 to 2009 in the plot above)
  • As mentioned earlier, very little data on Tsunami is in the data set most likely due to low frequency as can be seen in the year axis for the Tsunami panel showing only 4 years - 2006, 2009, 2010, 2011. A decrease in burden of health has occurred along these years despite an increase in frequency(Check plot on Frequency of top events by tear and region)

Economic Consequences

# Creating a table object for top type of events most harmful for economy
table2 <- gt(as_tibble(topEconomyEventsSummary)) %>%
    tab_header(
        title = md("**Top types of Events with greatest Economic Consequences**"),
        subtitle = md("*January 1996 to 2011*")
    ) %>%
    tab_source_note(
        source_note = md("Reference : Storm Data collected by *U.S. National Oceanic and Atmospheric Administration (NOAA)*")
    ) %>%
    tab_spanner(
        label = md("*(Expenses in billions of U.S. Dollars)*"),
        columns = c(totalPropertyDamageBurden, totalCropDamageBurden,
                    totalEconomicBurden)
    ) %>%
    cols_label(
        correctedEventType = md("**Event Type**"),
        totalPropertyDamageBurden = md("**Property Damage**"),
        totalCropDamageBurden = md("**Crop Damage**"),
        totalEconomicBurden = md("**Total**"),
        eventFrequency = md("**Frequency**")
    ) %>%
    tab_style(
        style = cell_borders(sides = "all", style = "solid"),
        locations = list(cells_body(columns = everything(), row = everything()),
                      cells_column_spanners(spanners = everything()),
                      cells_column_labels(columns = everything()))
    )

# Printing the table
table2
Top types of Events with greatest Economic Consequences
January 1996 to 2011
Event Type (Expenses in billions of U.S. Dollars) Frequency
Property Damage Crop Damage Total
Flood 144.56 4.98 149.54 9743
Hurricane (Typhoon) 69.31 2.61 71.92 72
Storm Surge/Tide 47.83 0.00 47.83 216
Tornado 24.62 0.28 24.90 12384
Hail 14.60 2.48 17.08 22683
Flash Flood 15.24 1.34 16.58 19094
Seiche 11.81 2.74 14.55 136
Drought 1.05 13.37 14.42 265
Reference : Storm Data collected by U.S. National Oceanic and Atmospheric Administration (NOAA)

Conclusions :

  • Flood events top the list and account for 37.24% of the economic burden due to all event types with 96.67% of this burden being from property damage
  • All the other event types do not have as high consequences as Flood type events with Drought events having a total economic burden of 14 billion US Dollars as compared to almost 150 billion US Dollars due to Flood
  • Hail events are the most frequent among these and contribute about 4.25% of the total economic burden from all event types with 85.48% of this burden being from property damage

Summary of these top types of events across the years from January 1996 to 2011

# Defining the second plot with years on x axis, total effect on economy on y axis,
# color fill of the bar plots to property or crop damage
plot2 <- ggplot(totalEconomyEffect, aes(year, total, fill = economyEffect)) +
    # Plotting the bar plot
    geom_col() +
    # Dividing the plot in panels for each type of event
    facet_wrap("correctedEventType", scales = "free") +
    # Flipping to the plots 90 degrees to the right
    coord_flip() +
    # Changing the theme to black and white and increasing the text size
    theme_bw(base_size = 40) +
    # Adds descriptive titles and axes labels
    labs(title = "Expenses on Property and Crop Damage for each top types of events for each year",
         x = "Years", y = "Expenses (in Millions of U.S. Dollars)", fill = "",
         caption = "Reference : Storm Data collected by U.S. National Oceanic and Atmospheric Administration (NOAA)") +
    # Changing the size of title, axes labels and legend
    theme(title = element_text(size = 60), legend.text = element_text(size = 60),
          plot.caption = element_text(face = "italic"))

# Printing the plot with Changed position of the legend to one of the empty panels
reposition_legend(plot2, panel = "panel-3-3", position = "center")

Conclusions :

  • Most of the events types have seen a general increase in economic consequences except Seiche and Storm Surge/Tide events
  • The differences between expenses from property damage and crop damage for most of the top types of events is large with property damage expenses always more than crop damage expenses, except Drought events which are completely opposite. Drought events have seen a drop in expenses after 2008 despite an increase in the frequency of such events(Check plot on Frequency of top events by year and region below)
  • The combined massive increase in expenses from Hurricane (Typhoon) events and Storm Surge/Tide events in the year 2005 represents Hurricane Katrina. These expenses are among the highest across all events suggesting the reason behind these events being present in the top events affecting economy

Frequency and Distribution of the these top types of events which have maximal effect on both population health and economy

# Defining the third plot with year on the x axis, frequency of events on the y axis,
# fill color of the bar plots set to region
if(system.file(package = "RColorBrewer") == "") install.packages("RColorBrewer")
library(RColorBrewer)

plot3 <- ggplot(frequencyByRegionYear, aes(year, frequency, fill = region)) +
    # Plotting the bar plot
    geom_col() +
    # Dividing the plot in panels for each type of event
    facet_wrap("correctedEventType", scales = "free") +
    # Adds descriptive titles and axes labels
    labs(title = "Frequency of each top types of events by year and region of the country",
         x = "Years", y = "Counts", fill = "Region",
         caption = "Reference : Storm Data collected by U.S. National Oceanic and Atmospheric Administration (NOAA)") +
    # Changing the palette for the fill color
    scale_fill_brewer(palette = "Dark2") +
    # Changing the theme to black and white and increasing the text size
    theme_bw(base_size = 45) +
    # Flipping the plots by 90 degrees to the right
    coord_flip() +
    # Changing the size of title, axes labels and legend
    theme(title = element_text(size = 65), legend.text = element_text(size = 65),
          plot.caption = element_text(face = "italic"))

# Printing the plot with Changed position of the legend to one of the empty panels
reposition_legend(plot3, panel = "panel-4-4", position = "center")

Conclusions :

The regions used to divide the areas in this plot were taken from the Census Region and Divisions of the United States

  • All the event types are most frequent in the Southern region of the US except Tsunami events, which are most frequent in the Western region and Hail which is much more common in Midwest region and this pattern has been maintained over the years from January 1996 to 2011
  • Hurricane (Typhoon) and Tsunami events have much lower frequencies as compared to others but have still made it to the list of top events affecting population health and economy
  • Except for Lightening events, all other events have seen drastic changes over the years

LIMITATIONS

The limitations of this report include,

REFRENCES

The U.S. National Oceanic and Atmospheric Administration (NOAA) maintains a storm database which records information on the location and timing of such events, the “magnitude” of such events which are measured differently for different types of events, estimates on fatalities, injuries, property and crop damages, any other remarks unique to the event.

Written in Rmarkdown file in R version 4.3.0 (2023-04-21 ucrt) using RStudio IDE
Packages used for this report,

Creation Date of Rmarkdown file : 2023-06-27 22:42:11.726139
Last Modified Date of Rmarkdown file : 2023-07-01 17:26:01.310686