This document analyzes weather events within the United States in relation to population health and economic impact, with the goal of determining which events pose the most severe risk. The names of Weather events were chosen based on a list provided by the National Weather Service (NWS).
The data included weather event names that did not map correctly to the NWS list, as well as damage statistics for property and crops that required cleaning prior to analysis. This document describes the remapping procedure for the event names, and the transformation process for the damage statistics.
Last but not least, this document analyzes the cleaned data, generates three related plots, and identifies the most severe weather events. The final sections of the document discuss some limitations of the analysis and restate the conclusions.
The analysis determined that Tornadoes were the most harmful weather event in relation to human population health, while Floods were the most harmful weather event in relation to economic impact. Note that Floods and Flash Floods are distinct events.
Thanks in advance for reading.
This code chunk installs the necessary packages
# install packages required for data processing and analysis
install.packages('dplyr', repos="http://cran.us.r-project.org")
library(dplyr)
install.packages('stringdist', repos="http://cran.us.r-project.org")
library(stringdist)
install.packages('ggplot2', repos="http://cran.us.r-project.org")
library(ggplot2)
This code chunk below loads the data into the working directory (if
necessary). Additionally, it stores the measured data in the data frame
storm_data
and extracts the columns needed for processing
and analysis. The analysis regards variables related to event type,
population health and economic impact. The
columnsFATALITIES
and INJURIES
correspond to
population health, while columns PROPDMG
,
PROPDMGEXP
, CROPDMG
and CROPDMGEXP
correspond to economic impact. The column EVTYPE
corresponds to event type, which is a description of the weather
event.
if (!file.exists('./repdata_data_StormData.csv')){ # download file if necessary
url <-'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
download.file(url, destfile = './repdata_data_StormData.csv')
}
storm_data <- read.csv('repdata_data_StormData.csv') # load data
storm_data <- storm_data[,c('EVTYPE','FATALITIES','INJURIES', # get relevant vars
'PROPDMG','PROPDMGEXP','CROPDMG','CROPDMGEXP')]
The data related to fatalities and injuries is numeric and has no NAs,
so the data has not been transformed. See the code below for
confirmation.
fatality_class <- class(storm_data$FATALITIES) # confirm that fatality data is numeric
injury_class <- class(storm_data$INJURIES) # confirm that injury data is numeric
NA_fatality <- sum(is.na(storm_data$FATALITIES)) # check for NAs in fatality column
NA_injury <- sum(is.na(storm_data$INJURIES))# check for NAs in fatality column
TheFATALITIES
variable is numeric and has 0 NA values.
The INJURIES
variable is numeric and has 0 NA values.
The columns relating to economic impact require considerable cleaning
and transformation. The property damage data is split into 2 columns–
‘PROPDMG’ and ‘PROPDMGEXP’. The ‘PROPDMG’ column is a number while
‘PROPDMGEXP’ indicates a multiplier (K = thousand, M = millions, B =
billions) OR a number indicating the number of significant
digits. This significant digits marker is often referred to as a
label within this document. Crop damage data is stored in a
similar fashion.
Based on a great deal of helpful work done by others, the meanings of
all damage labels are known. The NWS
documentation provides the meanings of the ‘K’,‘M’, and ‘B’ labels
(thousands, millions, and billions respectively). See the Storm Data
Event Table in section 2.1.1.
The values of other labels are explained here.
See the code chunk below for a quick look at these columns. Note that
displayed crop damage labels are blank, but the labels do exist.
head(storm_data,5)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
The code below creates two new columns containing multipliers for property damage and crop damage labels. Additionally it computes the “true” damage amounts for property damage, crop damage, and combined property & crop damage. The code also displays a few rows in the new data frame with the correct damage values visible.
labeled_storm_data <- storm_data[,c('EVTYPE','PROPDMG','PROPDMGEXP','CROPDMG',
'CROPDMGEXP')] #subset relevant variables
labeled_storm_data$prop_dmg_mult <- # create property damage multipliers
case_when(labeled_storm_data$PROPDMGEXP == 'K' ~ 10^3,
labeled_storm_data$PROPDMGEXP == 'M' ~ 10^6,
labeled_storm_data$PROPDMGEXP == 'B' ~ 10^9,
labeled_storm_data$PROPDMGEXP == 'm' ~ 10^6,
labeled_storm_data$PROPDMGEXP == '0' ~ 10^0,
labeled_storm_data$PROPDMGEXP == '5' ~ 10^5,
labeled_storm_data$PROPDMGEXP == '6' ~ 10^6,
labeled_storm_data$PROPDMGEXP == '4' ~ 10^4,
labeled_storm_data$PROPDMGEXP == '2' ~ 10^2,
labeled_storm_data$PROPDMGEXP == '3' ~ 10^3,
labeled_storm_data$PROPDMGEXP == 'h' ~ 10^2,
labeled_storm_data$PROPDMGEXP == '7' ~ 10^7,
labeled_storm_data$PROPDMGEXP == 'H' ~ 10^2,
labeled_storm_data$PROPDMGEXP == '1' ~ 10^1,
labeled_storm_data$PROPDMGEXP == '8' ~ 10^8,
labeled_storm_data$PROPDMGEXP == '' ~ 10^0,
labeled_storm_data$PROPDMGEXP == '+' ~ 10^1,
labeled_storm_data$PROPDMGEXP == '-' ~ 10^0,
labeled_storm_data$PROPDMGEXP == '?' ~ 10^0)
labeled_storm_data$crop_dmg_mult <- # create crop damage multipliers
case_when(labeled_storm_data$CROPDMGEXP == 'K' ~ 10^3,
labeled_storm_data$CROPDMGEXP == 'M' ~ 10^6,
labeled_storm_data$CROPDMGEXP == 'B' ~ 10^9,
labeled_storm_data$CROPDMGEXP == 'm' ~ 10^6,
labeled_storm_data$CROPDMGEXP == '0' ~ 10^0,
labeled_storm_data$CROPDMGEXP == 'k' ~ 10^3,
labeled_storm_data$CROPDMGEXP == '2' ~ 10^2,
labeled_storm_data$CROPDMGEXP == '' ~ 10^0,
labeled_storm_data$CROPDMGEXP == '?' ~ 10^0)
labeled_storm_data$true_prop_dmg <- with(labeled_storm_data, # compute property damage
PROPDMG*prop_dmg_mult)
labeled_storm_data$true_crop_dmg <- with(labeled_storm_data, # compute crop damage
CROPDMG*crop_dmg_mult)
labeled_storm_data$true_total_dmg <-with(labeled_storm_data, # compute combined damage
true_prop_dmg + true_crop_dmg)
head(labeled_storm_data, 5)
## EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP prop_dmg_mult crop_dmg_mult
## 1 TORNADO 25.0 K 0 1000 1
## 2 TORNADO 2.5 K 0 1000 1
## 3 TORNADO 25.0 K 0 1000 1
## 4 TORNADO 2.5 K 0 1000 1
## 5 TORNADO 2.5 K 0 1000 1
## true_prop_dmg true_crop_dmg true_total_dmg
## 1 25000 0 25000
## 2 2500 0 2500
## 3 25000 0 25000
## 4 2500 0 2500
## 5 2500 0 2500
The variable EVTYPE
includes a much larger number of event
types (985) than those listed by the National Weather Service (48). The
additional types are a result of typos as well as a lack of
systematization in recording event types. It was not realistic to
convert each EVTYPE
into the appropriate NWS event.
Instead, thee names were cleaned using the stringdist
package.
In particular the function amatch()
was used to map
strings in EVTYPE
to the 48 weather events listed by the
NWS. The ‘lcs’ distance metric was used because it compares strings
based longest matching substring. Many `EVTYPE events contain keywords
that correspond to words in the official list of NWS events. The hope is
that these key words produced correct, or at least reasonable mappings,
but there were some instances where it failed (e.g. this method mapped
AVALANCE to HAIL despite the NWS event list containing
AVALANCHE.
Furthermore, due to the large size of the original data set I further
processed both the population health data frame and the economic impact
data frame before remapping the events in EVTYPE
. This was
necessary as the remapping involved a for loop that my computer
could not run in a reasonable amount of time.
For the population health data the additional processing consisted of
taking observations with at least 1 casualty. For economic data
observations with 0 total damage were excluded. These transformations
removed low impact observations that do not contribute to the most high
risk, high damage weather events. They also greatly reduced the size of
both data sets, which permitted the remapping of the EVTYPE
events in a reasonable time frame. The additional processing of the data
frames can be found later in the document.
The code below creates a data frame, casualty_data
for
casualty variables (fatalities and injuries). The fatality and injury
data is numeric, so it is sufficient to check for NAs. The code again
verifies that the data has no NA values. A few rows of the new data
frame are displayed
casualty_data <- storm_data[,c('EVTYPE','FATALITIES','INJURIES')] #subset casualty data
sum(!complete.cases(casualty_data)) # check for NAs.
## [1] 0
head(casualty_data,5) # display a few rows
## EVTYPE FATALITIES INJURIES
## 1 TORNADO 0 15
## 2 TORNADO 0 0
## 3 TORNADO 0 2
## 4 TORNADO 0 2
## 5 TORNADO 0 2
The code below runs various summary statistics on
casualty_data
(population health) and the
labeled_storm_data
(economic impact).
casualty_summary <- sapply(casualty_data[,c('FATALITIES',
'INJURIES')], summary)
damage_summary <- sapply(labeled_storm_data[,c('true_prop_dmg',
'true_crop_dmg',
'true_total_dmg')], summary)
casualty_summary
## FATALITIES INJURIES
## Min. 0.00000000 0.0000000
## 1st Qu. 0.00000000 0.0000000
## Median 0.00000000 0.0000000
## Mean 0.01678494 0.1557447
## 3rd Qu. 0.00000000 0.0000000
## Max. 583.00000000 1700.0000000
damage_summary
## true_prop_dmg true_crop_dmg true_total_dmg
## Min. 0.000000e+00 0.000000e+00 0.000000e+00
## 1st Qu. 0.000000e+00 0.000000e+00 0.000000e+00
## Median 0.000000e+00 0.000000e+00 0.000000e+00
## Mean 4.745941e+05 5.442132e+04 5.290155e+05
## 3rd Qu. 5.000000e+02 0.000000e+00 1.000000e+03
## Max. 1.150000e+11 5.000000e+09 1.150325e+11
The casualty and damage summaries indicate the following:
The code below extracts data with at least one casualty and data involving some form of damage.
casualty_data_1 <- with(casualty_data,casualty_data[FATALITIES > 0 |
INJURIES > 0,])
damage_data <- labeled_storm_data[labeled_storm_data$true_total_dmg > 0,]
dim(casualty_data_1)
## [1] 21929 3
dim(damage_data)
## [1] 245031 10
At this point both data frames are small enough to remap the
EVTYPE
events to their “closest” NWS event. The code below
performs the remapping using the stringdist
package. I’ll
note that the damage data was ~ 10x larger than the casualty data and
the remapping still took a few minutes.
The code below remaps the events by
amatch()
from stringdist
to find the
“closest” NWS event for each EVTYPE
eventNote that maxDist = 30 in amatch()
. The max distance
parameter essentially corresponds to the maximum number of permitted
character mismatches when comparing two strings. The function
amatch()
returns NA if the error distance between two
strings exceeds the max distance. The longest string in
EVTYPE
contained 30 characters. Setting maxDist = 30 allows
for all EVTYPES
to be mapped to some event in the official
NWS list, though it is possible a smaller value could have achieved the
same mappings.
max(nchar(storm_data$EVTYPE)) # identify longest string in original data
## [1] 30
NWS_events <- c('Astronomical Low Tide', 'Avalanche, Blizzard' , #data frame of NWS events
'Coastal Flood', 'Cold/Wind Chill', 'Debris Flow', 'Dense Fog',
'Dense Smoke', 'Drought','Dust Devil',' Dust Storm',
'Excessive Heat', 'Extreme Cold/Wind Chill', 'Flash Flood',
'Flood', 'Frost/Freeze', 'Funnel Cloud', 'Freezing Fog', 'Hail',
'Heat', 'Heavy Rain', 'Heavy Snow', 'High Surf', 'High Wind',
'Hurricane (Typhoon)' , 'Ice Storm', 'Lake-Effect Snow',
'Lakeshore Flood', 'Lightning', 'Marine Hail',
'Marine High Wind', 'Marine Strong Wind', 'Marine Thunderstorm',
'Wind', 'Rip Current', 'Seiche',' Sleet', 'Storm Surge/Tide',
'Strong Wind', 'Thunderstorm Wind', 'Tornado',
'Tropical Depression', 'Tropical Storm', 'Tsunami',
'Volcanic Ash', 'Waterspout', 'Wildfire', 'Winter Storm',
'Winter Weather')
upper_nws <- toupper(NWS_events) # make all uppercase to improve matching
NWS_events <- data.frame(upper_nws) # convert to data frame for lookups
casualty_data_1$NWS_index <- NULL #initialize empty vector to store NWS indices
casualty_data_1$NWS_index <- amatch(casualty_data_1$EVTYPE, # find index for closest name from NWS events
table = NWS_events$upper_nws,
maxDist = 30, method = 'lcs')
casualty_data_1$NWS_event <- NULL # initialize empty vector to store NWS indices
for (i in 1:nrow(casualty_data_1)){casualty_data_1$NWS_event[i] <- # map current event to 'closest' name in NWS events
NWS_events[as.numeric(casualty_data_1[i,"NWS_index"]),"upper_nws"]}
damage_data$NWS_index <- NULL #initialize empty vector
damage_data$NWS_index <- amatch(damage_data$EVTYPE, # find index for closest name from NWS events
table = NWS_events$upper_nws,
maxDist = 30, method = 'lcs')
damage_data$NWS_event <- NULL # initialize empty vector
for (i in 1:nrow(damage_data)){damage_data$NWS_event[i] <- # map current name to 'closest' name in NWS events
NWS_events[as.numeric(damage_data[i,"NWS_index"]),"upper_nws"]} # note this loop still took 4 minutes :(
At this stage the relevant data has been cleaned and the observations have been extracted. The code below creates new ‘clean’ data frames ready for data analysis. A few rows of each data frame are also displayed.
clean_casualty_data <- with(casualty_data_1, data.frame(NWS_event, FATALITIES,
INJURIES)) # create clean casualty data set
clean_damage_data <- with(damage_data, data.frame(NWS_event, true_prop_dmg,
true_crop_dmg, true_total_dmg)) # create clean damage data set
head(clean_casualty_data, 5)
## NWS_event FATALITIES INJURIES
## 1 TORNADO 0 15
## 2 TORNADO 0 2
## 3 TORNADO 0 2
## 4 TORNADO 0 2
## 5 TORNADO 0 6
head(clean_damage_data, 5)
## NWS_event true_prop_dmg true_crop_dmg true_total_dmg
## 1 TORNADO 25000 0 25000
## 2 TORNADO 2500 0 2500
## 3 TORNADO 25000 0 25000
## 4 TORNADO 2500 0 2500
## 5 TORNADO 2500 0 2500
The code below analyzes the fatality, injury and total damage data. For each fatalities and injuries the totals are calculated based on event.The values are converted to proportions for relative comparison. Additionally, the data is sorted in descending order and the cumulative proportions are also provided. The total damage was also scaled to billions of dollars and plotted on that scale. This was done in order to demonstrate the raw scale of economic damage. Some of these events are very expensive!
The data was also summarized to get a sense of the distributions, after which the events within the highest quartile were subset for plotting. The choice to plot the top quartile is somewhat arbitrary, but shows the most extreme event(s) and their relative size in comparison to other extreme events.
The Code below creates three plots to visualize relationships between weather event type and population health or economic impact:
# fatality analysis below
death_events <- clean_casualty_data %>%
group_by(NWS_event) %>% #total deaths by event sorted descending
summarize(Fatalities = sum(FATALITIES)) %>% arrange(desc(Fatalities)) %>%
mutate(fatality_proportion = Fatalities/sum(Fatalities))# convert to proportion
death_events$fatality_proportion <- round(death_events$fatality_proportion,3) # write using 3 decimal places
death_events$cum_fatality_proportion <- NULL # empty vector to hold cumulative proportions
for (i in 1:nrow(death_events)){death_events$cum_fatality_proportion[i] <-
sum(death_events$fatality_proportion[1:i])} # compute cumulative proportions
summary(death_events$fatality_proportion) # summary statistics for fatality data
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.00100 0.00550 0.02490 0.01925 0.37200
Q3_death_event <-summary(death_events$fatality_proportion)['3rd Qu.'] # get 3rd quartile death proportion
top_quart_death <- death_events %>% # subset top quartile of deaths
filter(fatality_proportion > as.numeric(Q3_death_event))
top_quart_death[,c('NWS_event','fatality_proportion')]
## # A tibble: 10 × 2
## NWS_event fatality_proportion
## <chr> <dbl>
## 1 TORNADO 0.372
## 2 EXCESSIVE HEAT 0.133
## 3 HEAT 0.075
## 4 FLASH FLOOD 0.068
## 5 LIGHTNING 0.054
## 6 WIND 0.043
## 7 FLOOD 0.039
## 8 RIP CURRENT 0.038
## 9 HAIL 0.025
## 10 EXTREME COLD/WIND CHILL 0.02
# fatality plot
fatality_events_fact <- factor(top_quart_death$NWS_event, # create ordered factor for plotting
levels = c('TORNADO','EXCESSIVE HEAT','HEAT',
'FLASH FLOOD', 'LIGHTNING','WIND','FLOOD',
'RIP CURRENT','HAIL',
'EXTREME COLD/WIND CHILL'))
top_quart_death$NWS_event <- fatality_events_fact # convert NWS events to factors
top_quart_death_plot <- ggplot(top_quart_death, # create plot object
aes(x = NWS_event, y = fatality_proportion))
death_plot <- top_quart_death_plot + geom_col() + # create and customize plot
theme(axis.text.x = element_text(angle = 90,
vjust = 0.5, hjust=1), axis.text =element_text(size=8),
plot.title = element_text(hjust = 0.5),
axis.title = element_text(size = 10)) +
labs(x = 'Weather Event', y = 'Proportion of Weather Related Deaths')+
ggtitle('Death Statistics for Weather Events') +
scale_x_discrete(limits = fatality_events_fact)
death_plot
#injury analysis below
injury_events <- clean_casualty_data %>% group_by(NWS_event) %>% # injuries by event, sorted descending
summarize(Injuries = sum(INJURIES)) %>% arrange(desc(Injuries)) %>%
mutate(injury_proportion = Injuries/sum(Injuries))# convert to proportion
injury_events$injury_proportion <- round(injury_events$injury_proportion,3) # write using 3 decimal places
injury_events$cum_injury_proportion <- NULL # empty vector to hold cumulative proportions
for (i in 1:nrow(injury_events)){injury_events$cum_injury_proportion[i] <-
sum(injury_events$injury_proportion[1:i])} # compute cumulative proportions
summary(injury_events$injury_proportion) # summary statistics for injury data
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.00000 0.00200 0.02498 0.01150 0.65000
Q3_injury_event <-summary(injury_events$injury_proportion)['3rd Qu.'] # get 3rd quartile injury proportion
top_quart_injury <- injury_events %>%
filter(injury_proportion > as.numeric(Q3_injury_event)) # subset top quartile of injuries
top_quart_injury[,c('NWS_event','injury_proportion')]
## # A tibble: 10 × 2
## NWS_event injury_proportion
## <chr> <dbl>
## 1 TORNADO 0.65
## 2 WIND 0.057
## 3 FLOOD 0.054
## 4 EXCESSIVE HEAT 0.048
## 5 LIGHTNING 0.037
## 6 HEAT 0.018
## 7 THUNDERSTORM WIND 0.017
## 8 HAIL 0.015
## 9 ICE STORM 0.014
## 10 FLASH FLOOD 0.013
# injury plot
injury_events_fact <- factor(top_quart_death$NWS_event, # create ordered factors
levels = c('TORNADO','WIND','FLOOD','EXCESSIVE HEAT',
'LIGHTNING', 'HEAT', 'THUNDERSTORM WIND',
'HAIL', 'ICE STORM', 'FLASH FLOOD'))
top_quart_injury$NWS_event <- injury_events_fact # convert NWS events to factors
top_quart_injury_plot <- ggplot(top_quart_injury,
aes(x = NWS_event, y = injury_proportion))
injury_plot <- top_quart_injury_plot + geom_col() +
theme(axis.text.x = element_text(angle = 90,
vjust = 0.5, hjust=1), axis.text =element_text(size=8),
plot.title = element_text(hjust = 0.5),
axis.title = element_text(size = 10)) +
labs(x = 'Weather Event', y = 'Proportion of Weather Related Injuries')+
ggtitle('Injury Statistics for Weather Events') +
scale_x_discrete(limits = injury_events_fact)
injury_plot
# total damage analysis below
damage_events <- clean_damage_data %>% group_by(NWS_event) %>% # total dmg by event, sorted descending
summarize(Damage = sum(true_total_dmg)) %>% arrange(desc(Damage)) %>%
mutate(Damage_billions = Damage/10^9) %>% # scale damage to billions of dollars
mutate(Damage_proportion = Damage/sum(Damage))# convert to proportion
damage_events$Damage_proportion <- round(damage_events$Damage_proportion,3) # write using 3 decimal places
damage_events$cum_damage_proportion <- NULL # empty vector to hold cumulative proportions
for (i in 1:nrow(damage_events)){damage_events$cum_damage_proportion[i] <-
sum(damage_events$Damage_proportion[1:i])} # compute cumulative proportions
summary(damage_events$Damage_billions) # summary statistics for damage data
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.00971 0.25291 10.15594 6.72997 161.71251
Q3_damage_event <-summary(damage_events$Damage_billions)['3rd Qu.'] # get 3rd quartile of damage
top_quart_damage <- damage_events %>%
filter(Damage_billions > as.numeric(Q3_damage_event)) # subset top quartile of damage
top_quart_damage[,c('NWS_event','Damage_billions')]
## # A tibble: 12 × 2
## NWS_event Damage_billions
## <chr> <dbl>
## 1 FLOOD 162.
## 2 HURRICANE (TYPHOON) 75.5
## 3 TORNADO 57.4
## 4 STORM SURGE/TIDE 48.0
## 5 HAIL 34.2
## 6 FLASH FLOOD 19.1
## 7 DROUGHT 15.1
## 8 ICE STORM 8.97
## 9 WILDFIRE 8.89
## 10 TROPICAL STORM 8.41
## 11 THUNDERSTORM WIND 7.67
## 12 WINTER STORM 6.78
# damage plot
damage_events_fact <- factor(top_quart_damage$NWS_event, # turn top NWS events into ordered factors
levels = as.character(top_quart_damage$NWS_event))
top_quart_damage$NWS_event <- damage_events_fact # convert NWS events to factors
top_quart_damage_plot <- ggplot(top_quart_damage,
aes(x = NWS_event, y = Damage_billions))
damage_plot <- top_quart_damage_plot + geom_col() +
theme(axis.text.x = element_text(angle = 90,
vjust = 0.5, hjust=1), axis.text =element_text(size=8),
plot.title = element_text(hjust = 0.5),
axis.title = element_text(size = 10)) +
labs(x = 'Weather Event', y = 'Total Damage ($ Billions)')+
ggtitle('Economic Impact of Weather Events') +
scale_x_discrete(limits = damage_events_fact)
damage_plot
The analysis and plots indicate that Tornadoes pose the greatest risk to population health both with respect to weather related fatalities and injuries. Tornadoes account for ~37% of weather related deaths and a whopping 65% of weather related injuries (per this method of analysis).Excessive heat and Heat were the next two largest contributors to both fatality and injury risk– note that these are distinct categories per the NWS event list.
Floods contribute the most to weather related economic damage* in regards to total property damage and crop damage.
This session discusses limitations to the preceding analysis. The limitations are mostly with respect to effects of the data cleaning process and also with regard to data quality. I will consider further analysis of these ideas outside of the scope of this project, but they remain noteworthy.
Ultimately a nontrivial proportion of EVTYPE
events were
mapped to different events and it is unclear whether they were mapped to
reasonable NWS events. As such the mapping strategy is a current
limitation of this analysis. I will note that ~77% of the events mapped
to themselves, meaning roughly 1/4th of the names were remapped.
with(casualty_data_1, sum(EVTYPE == NWS_event))/nrow(casualty_data_1) # check % of unchanged names
## [1] 0.7690273
A deeper investigation of the maps could examine the distribution of remapped events and check whether high value events were mapped to reasonable categories.
It is also possible that numerical data was recorded incorrectly. Much of the data processing was necessary due record keeping errors, but these errors were identifiable. The quantitative analysis was conducted assuming that numerical values are correct. However, this may not be the case.
Overall, this method of analysis indicates that Tornadoes and Floods are the most dangerous weather events in relation to population health and economic impact. However, I also also acknowledge potential issues with the original data and with the processing method that may bias the results.
Thank you for taking the time to read this analysis and to review my project :).