Synopsis

This document analyzes weather events within the United States in relation to population health and economic impact, with the goal of determining which events pose the most severe risk. The names of Weather events were chosen based on a list provided by the National Weather Service (NWS).

The data included weather event names that did not map correctly to the NWS list, as well as damage statistics for property and crops that required cleaning prior to analysis. This document describes the remapping procedure for the event names, and the transformation process for the damage statistics.

Last but not least, this document analyzes the cleaned data, generates three related plots, and identifies the most severe weather events. The final sections of the document discuss some limitations of the analysis and restate the conclusions.

The analysis determined that Tornadoes were the most harmful weather event in relation to human population health, while Floods were the most harmful weather event in relation to economic impact. Note that Floods and Flash Floods are distinct events.

Thanks in advance for reading.

Package Dependencies

This code chunk installs the necessary packages

# install packages required for data processing and analysis

install.packages('dplyr', repos="http://cran.us.r-project.org")
library(dplyr)
install.packages('stringdist', repos="http://cran.us.r-project.org")
library(stringdist)
install.packages('ggplot2', repos="http://cran.us.r-project.org")
library(ggplot2)

Data Processing

This code chunk below loads the data into the working directory (if necessary). Additionally, it stores the measured data in the data frame storm_data and extracts the columns needed for processing and analysis. The analysis regards variables related to event type, population health and economic impact. The columnsFATALITIES and INJURIES correspond to population health, while columns PROPDMG, PROPDMGEXP, CROPDMGand CROPDMGEXP correspond to economic impact. The column EVTYPE corresponds to event type, which is a description of the weather event.

if (!file.exists('./repdata_data_StormData.csv')){ # download file if necessary
  
  url <-'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
  download.file(url, destfile = './repdata_data_StormData.csv')
  
}

storm_data <- read.csv('repdata_data_StormData.csv') # load data
storm_data <- storm_data[,c('EVTYPE','FATALITIES','INJURIES', # get relevant vars
                            'PROPDMG','PROPDMGEXP','CROPDMG','CROPDMGEXP')] 

Population health


The data related to fatalities and injuries is numeric and has no NAs, so the data has not been transformed. See the code below for confirmation.

fatality_class <- class(storm_data$FATALITIES) # confirm that fatality data is numeric
injury_class <- class(storm_data$INJURIES) # confirm that injury data is numeric
NA_fatality <- sum(is.na(storm_data$FATALITIES)) # check for NAs in fatality column
NA_injury <- sum(is.na(storm_data$INJURIES))# check for NAs in fatality column

TheFATALITIES variable is numeric and has 0 NA values. The INJURIES variable is numeric and has 0 NA values.


Economic Impact


The columns relating to economic impact require considerable cleaning and transformation. The property damage data is split into 2 columns– ‘PROPDMG’ and ‘PROPDMGEXP’. The ‘PROPDMG’ column is a number while ‘PROPDMGEXP’ indicates a multiplier (K = thousand, M = millions, B = billions) OR a number indicating the number of significant digits. This significant digits marker is often referred to as a label within this document. Crop damage data is stored in a similar fashion.

Based on a great deal of helpful work done by others, the meanings of all damage labels are known. The NWS documentation provides the meanings of the ‘K’,‘M’, and ‘B’ labels (thousands, millions, and billions respectively). See the Storm Data Event Table in section 2.1.1.
The values of other labels are explained here.
See the code chunk below for a quick look at these columns. Note that displayed crop damage labels are blank, but the labels do exist.

head(storm_data,5)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0           
## 2 TORNADO          0        0     2.5          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0

The code below creates two new columns containing multipliers for property damage and crop damage labels. Additionally it computes the “true” damage amounts for property damage, crop damage, and combined property & crop damage. The code also displays a few rows in the new data frame with the correct damage values visible.

labeled_storm_data <- storm_data[,c('EVTYPE','PROPDMG','PROPDMGEXP','CROPDMG',
                                   'CROPDMGEXP')] #subset relevant variables

labeled_storm_data$prop_dmg_mult <- # create property damage multipliers
                      case_when(labeled_storm_data$PROPDMGEXP == 'K' ~ 10^3, 
                            labeled_storm_data$PROPDMGEXP == 'M' ~ 10^6, 
                            labeled_storm_data$PROPDMGEXP == 'B' ~ 10^9,
                            labeled_storm_data$PROPDMGEXP == 'm' ~ 10^6,
                            labeled_storm_data$PROPDMGEXP == '0' ~ 10^0,
                            labeled_storm_data$PROPDMGEXP == '5' ~ 10^5,
                            labeled_storm_data$PROPDMGEXP == '6' ~ 10^6,
                            labeled_storm_data$PROPDMGEXP == '4' ~ 10^4,
                            labeled_storm_data$PROPDMGEXP == '2' ~ 10^2,
                            labeled_storm_data$PROPDMGEXP == '3' ~ 10^3,
                            labeled_storm_data$PROPDMGEXP == 'h' ~ 10^2,
                            labeled_storm_data$PROPDMGEXP == '7' ~ 10^7,
                            labeled_storm_data$PROPDMGEXP == 'H' ~ 10^2,
                            labeled_storm_data$PROPDMGEXP == '1' ~ 10^1,
                            labeled_storm_data$PROPDMGEXP == '8' ~ 10^8,
                            labeled_storm_data$PROPDMGEXP == '' ~ 10^0,
                            labeled_storm_data$PROPDMGEXP == '+' ~ 10^1,
                            labeled_storm_data$PROPDMGEXP == '-' ~ 10^0,
                            labeled_storm_data$PROPDMGEXP == '?' ~ 10^0)

labeled_storm_data$crop_dmg_mult <- # create crop damage multipliers
                      case_when(labeled_storm_data$CROPDMGEXP == 'K' ~ 10^3, 
                            labeled_storm_data$CROPDMGEXP == 'M' ~ 10^6, 
                            labeled_storm_data$CROPDMGEXP == 'B' ~ 10^9,
                            labeled_storm_data$CROPDMGEXP == 'm' ~ 10^6,
                            labeled_storm_data$CROPDMGEXP == '0' ~ 10^0,
                            labeled_storm_data$CROPDMGEXP == 'k' ~ 10^3,
                            labeled_storm_data$CROPDMGEXP == '2' ~ 10^2,
                            labeled_storm_data$CROPDMGEXP == '' ~ 10^0,
                            labeled_storm_data$CROPDMGEXP == '?' ~ 10^0)




labeled_storm_data$true_prop_dmg <- with(labeled_storm_data, # compute property damage
                                         PROPDMG*prop_dmg_mult)

labeled_storm_data$true_crop_dmg <- with(labeled_storm_data, # compute crop damage
                                         CROPDMG*crop_dmg_mult)

labeled_storm_data$true_total_dmg <-with(labeled_storm_data, # compute combined damage
                                         true_prop_dmg + true_crop_dmg)

head(labeled_storm_data, 5)
##    EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP prop_dmg_mult crop_dmg_mult
## 1 TORNADO    25.0          K       0                     1000             1
## 2 TORNADO     2.5          K       0                     1000             1
## 3 TORNADO    25.0          K       0                     1000             1
## 4 TORNADO     2.5          K       0                     1000             1
## 5 TORNADO     2.5          K       0                     1000             1
##   true_prop_dmg true_crop_dmg true_total_dmg
## 1         25000             0          25000
## 2          2500             0           2500
## 3         25000             0          25000
## 4          2500             0           2500
## 5          2500             0           2500


Event Type


The variable EVTYPE includes a much larger number of event types (985) than those listed by the National Weather Service (48). The additional types are a result of typos as well as a lack of systematization in recording event types. It was not realistic to convert each EVTYPE into the appropriate NWS event. Instead, thee names were cleaned using the stringdist package.

In particular the function amatch()was used to map strings in EVTYPE to the 48 weather events listed by the NWS. The ‘lcs’ distance metric was used because it compares strings based longest matching substring. Many `EVTYPE events contain keywords that correspond to words in the official list of NWS events. The hope is that these key words produced correct, or at least reasonable mappings, but there were some instances where it failed (e.g. this method mapped AVALANCE to HAIL despite the NWS event list containing AVALANCHE.

Furthermore, due to the large size of the original data set I further processed both the population health data frame and the economic impact data frame before remapping the events in EVTYPE. This was necessary as the remapping involved a for loop that my computer could not run in a reasonable amount of time.


For the population health data the additional processing consisted of taking observations with at least 1 casualty. For economic data observations with 0 total damage were excluded. These transformations removed low impact observations that do not contribute to the most high risk, high damage weather events. They also greatly reduced the size of both data sets, which permitted the remapping of the EVTYPE events in a reasonable time frame. The additional processing of the data frames can be found later in the document.

The code below creates a data frame, casualty_data for casualty variables (fatalities and injuries). The fatality and injury data is numeric, so it is sufficient to check for NAs. The code again verifies that the data has no NA values. A few rows of the new data frame are displayed

casualty_data <- storm_data[,c('EVTYPE','FATALITIES','INJURIES')] #subset casualty data
sum(!complete.cases(casualty_data)) # check for NAs.
## [1] 0
head(casualty_data,5) # display a few rows
##    EVTYPE FATALITIES INJURIES
## 1 TORNADO          0       15
## 2 TORNADO          0        0
## 3 TORNADO          0        2
## 4 TORNADO          0        2
## 5 TORNADO          0        2

The code below runs various summary statistics on casualty_data (population health) and the labeled_storm_data (economic impact).

casualty_summary <- sapply(casualty_data[,c('FATALITIES',
                                                 'INJURIES')], summary)

damage_summary <- sapply(labeled_storm_data[,c('true_prop_dmg',
                                                'true_crop_dmg',
                                               'true_total_dmg')], summary)
casualty_summary
##           FATALITIES     INJURIES
## Min.      0.00000000    0.0000000
## 1st Qu.   0.00000000    0.0000000
## Median    0.00000000    0.0000000
## Mean      0.01678494    0.1557447
## 3rd Qu.   0.00000000    0.0000000
## Max.    583.00000000 1700.0000000
damage_summary
##         true_prop_dmg true_crop_dmg true_total_dmg
## Min.     0.000000e+00  0.000000e+00   0.000000e+00
## 1st Qu.  0.000000e+00  0.000000e+00   0.000000e+00
## Median   0.000000e+00  0.000000e+00   0.000000e+00
## Mean     4.745941e+05  5.442132e+04   5.290155e+05
## 3rd Qu.  5.000000e+02  0.000000e+00   1.000000e+03
## Max.     1.150000e+11  5.000000e+09   1.150325e+11

The casualty and damage summaries indicate the following:

  • Casualties were uncommon in this data
  • Crop damage was uncommon in this data
  • Property damage was not common but not atypical.
  • The data is heavily right skewed in each case. This is reasonable as the effects generally result from infrequent weather phenomena.

 

The code below extracts data with at least one casualty and data involving some form of damage.


casualty_data_1 <- with(casualty_data,casualty_data[FATALITIES > 0 |
                                                      INJURIES > 0,])
damage_data <- labeled_storm_data[labeled_storm_data$true_total_dmg > 0,]

dim(casualty_data_1)
## [1] 21929     3
dim(damage_data)
## [1] 245031     10

At this point both data frames are small enough to remap the EVTYPEevents to their “closest” NWS event. The code below performs the remapping using the stringdistpackage. I’ll note that the damage data was ~ 10x larger than the casualty data and the remapping still took a few minutes.


The code below remaps the events by

  1. Setting up a lookup table of NWS events
  2. Using amatch() from stringdist to find the “closest” NWS event for each EVTYPE event

Note that maxDist = 30 in amatch(). The max distance parameter essentially corresponds to the maximum number of permitted character mismatches when comparing two strings. The function amatch() returns NA if the error distance between two strings exceeds the max distance. The longest string in EVTYPEcontained 30 characters. Setting maxDist = 30 allows for all EVTYPES to be mapped to some event in the official NWS list, though it is possible a smaller value could have achieved the same mappings.

max(nchar(storm_data$EVTYPE)) # identify longest string in original data
## [1] 30
NWS_events <- c('Astronomical Low Tide', 'Avalanche, Blizzard' , #data frame of NWS events
                'Coastal Flood', 'Cold/Wind Chill', 'Debris Flow', 'Dense Fog', 
                'Dense Smoke', 'Drought','Dust Devil',' Dust Storm', 
                'Excessive Heat', 'Extreme Cold/Wind Chill', 'Flash Flood', 
                'Flood', 'Frost/Freeze', 'Funnel Cloud', 'Freezing Fog', 'Hail', 
                'Heat', 'Heavy Rain', 'Heavy Snow', 'High Surf', 'High Wind', 
                'Hurricane (Typhoon)' , 'Ice Storm', 'Lake-Effect Snow', 
                'Lakeshore Flood', 'Lightning', 'Marine Hail', 
                'Marine High Wind', 'Marine Strong Wind', 'Marine Thunderstorm', 
                'Wind', 'Rip Current', 'Seiche',' Sleet', 'Storm Surge/Tide', 
                'Strong Wind', 'Thunderstorm Wind', 'Tornado', 
                'Tropical Depression', 'Tropical Storm', 'Tsunami',
                'Volcanic Ash', 'Waterspout', 'Wildfire', 'Winter Storm', 
                'Winter Weather')


upper_nws <- toupper(NWS_events) # make all uppercase to improve matching
NWS_events <- data.frame(upper_nws) # convert to data frame for lookups

casualty_data_1$NWS_index <- NULL #initialize empty vector to store NWS indices
casualty_data_1$NWS_index <- amatch(casualty_data_1$EVTYPE, # find index for closest name from NWS events
                                    table = NWS_events$upper_nws, 
                                    maxDist = 30, method = 'lcs')

casualty_data_1$NWS_event <- NULL # initialize empty vector to store NWS indices
for (i in 1:nrow(casualty_data_1)){casualty_data_1$NWS_event[i] <- # map current event to 'closest' name in NWS events
  NWS_events[as.numeric(casualty_data_1[i,"NWS_index"]),"upper_nws"]}



damage_data$NWS_index <- NULL #initialize empty vector
damage_data$NWS_index <- amatch(damage_data$EVTYPE, # find index for closest name from NWS events
                                    table = NWS_events$upper_nws, 
                                    maxDist = 30, method = 'lcs')

damage_data$NWS_event <- NULL # initialize empty vector
for (i in 1:nrow(damage_data)){damage_data$NWS_event[i] <- # map current name to 'closest' name in NWS events
  NWS_events[as.numeric(damage_data[i,"NWS_index"]),"upper_nws"]} # note this loop still took 4 minutes :(


At this stage the relevant data has been cleaned and the observations have been extracted. The code below creates new ‘clean’ data frames ready for data analysis. A few rows of each data frame are also displayed.

clean_casualty_data <- with(casualty_data_1, data.frame(NWS_event, FATALITIES,
                                                        INJURIES)) # create clean casualty data set
clean_damage_data <- with(damage_data, data.frame(NWS_event, true_prop_dmg,
                                                  true_crop_dmg, true_total_dmg)) # create clean damage data set

head(clean_casualty_data, 5)
##   NWS_event FATALITIES INJURIES
## 1   TORNADO          0       15
## 2   TORNADO          0        2
## 3   TORNADO          0        2
## 4   TORNADO          0        2
## 5   TORNADO          0        6
head(clean_damage_data, 5)
##   NWS_event true_prop_dmg true_crop_dmg true_total_dmg
## 1   TORNADO         25000             0          25000
## 2   TORNADO          2500             0           2500
## 3   TORNADO         25000             0          25000
## 4   TORNADO          2500             0           2500
## 5   TORNADO          2500             0           2500


Results

The code below analyzes the fatality, injury and total damage data. For each fatalities and injuries the totals are calculated based on event.The values are converted to proportions for relative comparison. Additionally, the data is sorted in descending order and the cumulative proportions are also provided. The total damage was also scaled to billions of dollars and plotted on that scale. This was done in order to demonstrate the raw scale of economic damage. Some of these events are very expensive!


The data was also summarized to get a sense of the distributions, after which the events within the highest quartile were subset for plotting. The choice to plot the top quartile is somewhat arbitrary, but shows the most extreme event(s) and their relative size in comparison to other extreme events.


The Code below creates three plots to visualize relationships between weather event type and population health or economic impact:

  1. Fatalities vs Weather Event
  2. Injuries vs Weather event
  3. Total Property & Crop Damage vs Weather event
# fatality analysis below

death_events <- clean_casualty_data %>% 
      group_by(NWS_event) %>% #total deaths by event sorted descending
      summarize(Fatalities = sum(FATALITIES)) %>% arrange(desc(Fatalities)) %>%
      mutate(fatality_proportion = Fatalities/sum(Fatalities))# convert to proportion


death_events$fatality_proportion <- round(death_events$fatality_proportion,3) # write using 3 decimal places

death_events$cum_fatality_proportion <- NULL # empty vector to hold cumulative proportions
for (i in 1:nrow(death_events)){death_events$cum_fatality_proportion[i] <- 
                            sum(death_events$fatality_proportion[1:i])} # compute cumulative proportions

summary(death_events$fatality_proportion) # summary statistics for fatality data
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00000 0.00100 0.00550 0.02490 0.01925 0.37200
Q3_death_event <-summary(death_events$fatality_proportion)['3rd Qu.'] # get 3rd quartile death proportion

top_quart_death <- death_events %>%  # subset top quartile of deaths
    filter(fatality_proportion > as.numeric(Q3_death_event)) 

top_quart_death[,c('NWS_event','fatality_proportion')]
## # A tibble: 10 × 2
##    NWS_event               fatality_proportion
##    <chr>                                 <dbl>
##  1 TORNADO                               0.372
##  2 EXCESSIVE HEAT                        0.133
##  3 HEAT                                  0.075
##  4 FLASH FLOOD                           0.068
##  5 LIGHTNING                             0.054
##  6 WIND                                  0.043
##  7 FLOOD                                 0.039
##  8 RIP CURRENT                           0.038
##  9 HAIL                                  0.025
## 10 EXTREME COLD/WIND CHILL               0.02
# fatality plot

fatality_events_fact <- factor(top_quart_death$NWS_event, # create ordered factor for plotting
                              levels = c('TORNADO','EXCESSIVE HEAT','HEAT',
                                         'FLASH FLOOD', 'LIGHTNING','WIND','FLOOD',
                                         'RIP CURRENT','HAIL',
                                         'EXTREME COLD/WIND CHILL'))

top_quart_death$NWS_event <- fatality_events_fact # convert NWS events to factors

top_quart_death_plot <- ggplot(top_quart_death, # create plot object
                               aes(x = NWS_event, y =  fatality_proportion)) 

death_plot <- top_quart_death_plot + geom_col() + # create and customize plot
  theme(axis.text.x = element_text(angle = 90, 
                                   vjust = 0.5, hjust=1), axis.text =element_text(size=8),
        plot.title = element_text(hjust = 0.5),
        axis.title = element_text(size = 10)) +
  labs(x = 'Weather Event', y = 'Proportion of Weather Related Deaths')+
  ggtitle('Death Statistics for Weather Events') + 
  scale_x_discrete(limits = fatality_events_fact)

death_plot

#injury analysis below

injury_events <- clean_casualty_data %>% group_by(NWS_event) %>% # injuries by event, sorted descending 
  summarize(Injuries = sum(INJURIES)) %>% arrange(desc(Injuries)) %>%
  mutate(injury_proportion = Injuries/sum(Injuries))# convert to proportion

injury_events$injury_proportion <- round(injury_events$injury_proportion,3) # write using 3 decimal places

injury_events$cum_injury_proportion <- NULL # empty vector to hold cumulative proportions
for (i in 1:nrow(injury_events)){injury_events$cum_injury_proportion[i] <- 
  sum(injury_events$injury_proportion[1:i])} # compute cumulative proportions

summary(injury_events$injury_proportion) # summary statistics for injury data
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00000 0.00000 0.00200 0.02498 0.01150 0.65000
Q3_injury_event <-summary(injury_events$injury_proportion)['3rd Qu.'] # get 3rd quartile injury proportion

top_quart_injury <- injury_events %>% 
  filter(injury_proportion > as.numeric(Q3_injury_event)) # subset top quartile of injuries

top_quart_injury[,c('NWS_event','injury_proportion')]
## # A tibble: 10 × 2
##    NWS_event         injury_proportion
##    <chr>                         <dbl>
##  1 TORNADO                       0.65 
##  2 WIND                          0.057
##  3 FLOOD                         0.054
##  4 EXCESSIVE HEAT                0.048
##  5 LIGHTNING                     0.037
##  6 HEAT                          0.018
##  7 THUNDERSTORM WIND             0.017
##  8 HAIL                          0.015
##  9 ICE STORM                     0.014
## 10 FLASH FLOOD                   0.013
# injury plot

injury_events_fact <- factor(top_quart_death$NWS_event, # create ordered factors
                               levels = c('TORNADO','WIND','FLOOD','EXCESSIVE HEAT',
                                          'LIGHTNING', 'HEAT', 'THUNDERSTORM WIND',
                                          'HAIL', 'ICE STORM', 'FLASH FLOOD'))
                                          
                                          
top_quart_injury$NWS_event <- injury_events_fact # convert NWS events to factors





top_quart_injury_plot <- ggplot(top_quart_injury, 
                               aes(x = NWS_event, y = injury_proportion)) 


injury_plot <- top_quart_injury_plot + geom_col() +
  theme(axis.text.x = element_text(angle = 90, 
                                   vjust = 0.5, hjust=1), axis.text =element_text(size=8),
        plot.title = element_text(hjust = 0.5),
        axis.title = element_text(size = 10)) +
  labs(x = 'Weather Event', y = 'Proportion of Weather Related Injuries')+
  ggtitle('Injury Statistics for Weather Events') + 
  scale_x_discrete(limits = injury_events_fact)

injury_plot

# total damage analysis below

damage_events <- clean_damage_data %>% group_by(NWS_event) %>% # total dmg by event, sorted descending 
  summarize(Damage = sum(true_total_dmg)) %>% arrange(desc(Damage)) %>%
  mutate(Damage_billions = Damage/10^9) %>% # scale damage to billions of dollars
  mutate(Damage_proportion = Damage/sum(Damage))# convert to proportion

damage_events$Damage_proportion <- round(damage_events$Damage_proportion,3) # write using 3 decimal places

damage_events$cum_damage_proportion <- NULL # empty vector to hold cumulative proportions
for (i in 1:nrow(damage_events)){damage_events$cum_damage_proportion[i] <- 
  sum(damage_events$Damage_proportion[1:i])} # compute cumulative proportions

  

summary(damage_events$Damage_billions) # summary statistics for damage data
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##   0.00000   0.00971   0.25291  10.15594   6.72997 161.71251
Q3_damage_event <-summary(damage_events$Damage_billions)['3rd Qu.'] # get 3rd quartile of damage

top_quart_damage <- damage_events %>% 
  filter(Damage_billions > as.numeric(Q3_damage_event)) # subset top quartile of damage

top_quart_damage[,c('NWS_event','Damage_billions')]
## # A tibble: 12 × 2
##    NWS_event           Damage_billions
##    <chr>                         <dbl>
##  1 FLOOD                        162.  
##  2 HURRICANE (TYPHOON)           75.5 
##  3 TORNADO                       57.4 
##  4 STORM SURGE/TIDE              48.0 
##  5 HAIL                          34.2 
##  6 FLASH FLOOD                   19.1 
##  7 DROUGHT                       15.1 
##  8 ICE STORM                      8.97
##  9 WILDFIRE                       8.89
## 10 TROPICAL STORM                 8.41
## 11 THUNDERSTORM WIND              7.67
## 12 WINTER STORM                   6.78
# damage plot

damage_events_fact <- factor(top_quart_damage$NWS_event, # turn top NWS events into ordered factors
                             levels = as.character(top_quart_damage$NWS_event))


top_quart_damage$NWS_event <- damage_events_fact # convert NWS events to factors




top_quart_damage_plot <- ggplot(top_quart_damage, 
                                aes(x = NWS_event, y = Damage_billions)) 


damage_plot <- top_quart_damage_plot + geom_col() +
  theme(axis.text.x = element_text(angle = 90, 
                                   vjust = 0.5, hjust=1), axis.text =element_text(size=8),
        plot.title = element_text(hjust = 0.5),
        axis.title = element_text(size = 10)) +
  labs(x = 'Weather Event', y = 'Total Damage ($ Billions)')+
  ggtitle('Economic Impact of Weather Events') + 
  scale_x_discrete(limits = damage_events_fact)

damage_plot

Population Health

The analysis and plots indicate that Tornadoes pose the greatest risk to population health both with respect to weather related fatalities and injuries. Tornadoes account for ~37% of weather related deaths and a whopping 65% of weather related injuries (per this method of analysis).Excessive heat and Heat were the next two largest contributors to both fatality and injury risk– note that these are distinct categories per the NWS event list.

 

Economic Impact

Floods contribute the most to weather related economic damage* in regards to total property damage and crop damage.

Limitations

This session discusses limitations to the preceding analysis. The limitations are mostly with respect to effects of the data cleaning process and also with regard to data quality. I will consider further analysis of these ideas outside of the scope of this project, but they remain noteworthy.

Ultimately a nontrivial proportion of EVTYPE events were mapped to different events and it is unclear whether they were mapped to reasonable NWS events. As such the mapping strategy is a current limitation of this analysis. I will note that ~77% of the events mapped to themselves, meaning roughly 1/4th of the names were remapped.

with(casualty_data_1, sum(EVTYPE == NWS_event))/nrow(casualty_data_1) # check % of unchanged names
## [1] 0.7690273

 

A deeper investigation of the maps could examine the distribution of remapped events and check whether high value events were mapped to reasonable categories.

It is also possible that numerical data was recorded incorrectly. Much of the data processing was necessary due record keeping errors, but these errors were identifiable. The quantitative analysis was conducted assuming that numerical values are correct. However, this may not be the case.

Conclusion

Overall, this method of analysis indicates that Tornadoes and Floods are the most dangerous weather events in relation to population health and economic impact. However, I also also acknowledge potential issues with the original data and with the processing method that may bias the results.

Thank you for taking the time to read this analysis and to review my project :).