In this study I will investigate the consequences of hazardous weather for the economy and public health of the U.S. To achieve this I analyzed the NOAA Storms Data base for the most severe events regarding injuries, fatalities and financial damage. For my public health analysis I first looked at the top 10 events that caused the most injuries and fatalities and then at the distribution of the top events in both categories. For the economic consequences I first calculated the total financial damage caused for every event and then proceeded the same way as with the injury and fatality numbers.
First I loaded my used libraries, which can be seen below:
library(tidyverse)
library(ggplot2)
library(data.table)
To load the data I first accessed the database here at the 02.07.2022.
#loading data
if(!file.exists("./FStormData.csv.bz2")){
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "./FStormData.csv.bz2")
}
I read in the downloaded file with tidyverses read_csv()
and transformed the resulting tibble to a data.table object
for faster processing.
stormdata = as.data.table(read_csv("./FStormData.csv.bz2"))
I processed the table by only selecting the necessary columns for my
analysis. Then, from the columns for PROPDMG and
PROPDMG as well as the corresponding columns for crop, I
calculated the actual value of financial damage caused for the exponents
“K”,“M”,“B”. In the cases of other exponents (like “+”,“?”,“h” etc.) I
disregarded these values, since I cannot interpret these exponents and
they are not explained in the Storm Data Documentation. Then I added the
new column DAMAGE which is the total financial damage (sum
of properties and crops). Lastly I removed the now no longer relevant
columns describing financial damage.
#processing data
eventdata <- stormdata %>%
select(EVTYPE, FATALITIES:CROPDMGEXP) %>%
mutate(prop = case_when(
PROPDMGEXP == "K" ~ PROPDMG*10^3,
PROPDMGEXP == "M" ~ PROPDMG*10^6,
PROPDMGEXP == "B" ~ PROPDMG*10^9,
#all other abbreviations become NA since I cannot interpret them
)) %>%
mutate(crop = case_when(
CROPDMGEXP == "K" ~ CROPDMG*10^3,
CROPDMGEXP == "M" ~ CROPDMG*10^6,
CROPDMGEXP == "B" ~ CROPDMG*10^9,
#all other abbreviations become NA since I cannot interpret them
)) %>%
mutate(DAMAGE = prop+crop) %>%
select(EVTYPE:INJURIES,DAMAGE)
I looked at injuries and fatalities separately since I think it would not be reasonable to try and weigh injuries vs fatalities for a combined metric for public health consequences.
So first I will take a look at the top 10 events for most injuries
and most fatalities. To achieve this I grouped my data by
EVTYPE and then summed up all injuries and fatalities.
#public health analysis
phdata <- eventdata %>%
group_by(EVTYPE) %>%
summarise(TOTAL_INJURIES = sum(INJURIES, na.rm = TRUE),
TOTAL_FATALITIES = sum(FATALITIES, na.rm = TRUE),
)
Here are the top 10 events for most injuries:
arrange(phdata, desc(TOTAL_INJURIES))[1:10, c("EVTYPE", "TOTAL_INJURIES")]
## # A tibble: 10 × 2
## EVTYPE TOTAL_INJURIES
## <chr> <dbl>
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
We can see that Tornadoes cause by far the most injuries and the rank 2 event is also wind/storm related. Floods and flash floods which describe similar events are both ranked in the top 10 and together would be ranked 2nd overall, but since I am not an expert on meteorology and don’t know the difference I will treat them separately. Also after the first 5 events there is quite a big gap so I decided to look at the distribution of the first 5 event types. I also only included events in the distribution that actually caused injuries since events in non-populated areas would drag these distributions down a lot.
topinjuryevents = arrange(phdata, desc(TOTAL_INJURIES))[1:5, c("EVTYPE")][[1]]
ggplot(
data = filter(eventdata, EVTYPE %in% topinjuryevents & INJURIES > 0),
aes(x = EVTYPE, y = INJURIES)
) + geom_boxplot() + scale_y_log10()
We can see that the high number of injuries from tornadoes is mostly caused by the high statistical outliers and the average heatwave is actually causes more injuries than the average tornado. I would assume this is likely due to that fact that heatwaves cover larger areas than a tornado, even though they probably are considered less dangerous intuitively. Floods seem to have on average a similar effect to tornadoes but just less high outliers. Lightning has a similar effect to tornadoes: The median Lightning event only causes one injury but the high outliers produce the high total of injuries. Same goes for TSTM Wind.
Now here are the top 10 events for most fatalities:
arrange(phdata, desc(TOTAL_FATALITIES))[1:10, c("EVTYPE", "TOTAL_FATALITIES")]
## # A tibble: 10 × 2
## EVTYPE TOTAL_FATALITIES
## <chr> <dbl>
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
Again tornadoes are ranked highest, even though the gap to second place is less here. Also we can see that 2nd and 4th ranked event types are both heat (just entered differently in the database). In sum they still would not cause more deaths than tornadoes. Flash floods and floods are also describing similar events. In sum they would not outrank the flash flood itself though so I will treat them separately again. Also I decided that the gap between rank 5 and 6 is big enough again to only look at the distributions of the 5 highest ranking event types.
topfatalityevents = arrange(phdata, desc(TOTAL_FATALITIES))[1:5, c("EVTYPE")][[1]]
ggplot(
data = filter(eventdata, EVTYPE %in% topfatalityevents & FATALITIES > 0),
aes(x = EVTYPE, y = FATALITIES)
) + geom_boxplot()+ scale_y_log10()
What surprises me the most here is that none of these events differ in median fatalities caused and the median of 1 is the lowest value possible since we only looked at events that actually caused fatalities, so the only difference in total fatalities is caused by the higher values of the distribution. We can see that the 75th percentile is highest for tornadoes, and then even across both heat event types and flash floods. Lightning only had 4 events that caused more than the minimum of 1 fatality. Even though in general tornadoes seem to cause more deaths, the by far most deadly event shown here was actually a heat event and I want to point out again that there are two events here describing heat.
The methodology I used after summing up the total financial damage caused by the events does not differ from what I did in the public health analysis.
Here are the top 10 even for most financial damage:
#economic analysis
econdata <- eventdata %>%
group_by(EVTYPE) %>%
summarise(TOTAL_DAMAGE = sum(DAMAGE, na.rm = TRUE))
arrange(econdata, desc(TOTAL_DAMAGE))[1:10, c("EVTYPE", "TOTAL_DAMAGE")]
## # A tibble: 10 × 2
## EVTYPE TOTAL_DAMAGE
## <chr> <dbl>
## 1 FLOOD 138007444500
## 2 HURRICANE/TYPHOON 29348167800
## 3 TORNADO 16520148150
## 4 HURRICANE 12405268000
## 5 RIVER FLOOD 10108369000
## 6 HAIL 10019978590
## 7 FLASH FLOOD 8715295130
## 8 ICE STORM 5925147300
## 9 STORM SURGE/TIDE 4641493000
## 10 THUNDERSTORM WIND 3813647990
We can see that floods are causing by far the highest financial damage out of all event types. In addition 3 of the top 10 event types are flood related. Since there are big gaps in total damage in between all of the first 4 events, I decided to only look at the distribution of the 3 highest ranking event types.
topdamageevents = arrange(econdata, desc(TOTAL_DAMAGE))[1:3, c("EVTYPE")][[1]]
ggplot(
data = filter(eventdata, EVTYPE %in% topdamageevents & DAMAGE > 0),
aes(x = EVTYPE, y = DAMAGE)
) + geom_boxplot()+ scale_y_log10()
We can see that on average a Hurricane/Typhoon causes actually more damage than floods and tornadoes, which are very similar, but for both there are a lot of high statistical outliers. In fact there apparently was a single flood event that caused by far the most financial damage out of all the events we looked at. I did not look at the numbers again disregarding this maximum outlier, since I think it would not be logical to disregard it in hope that another flood like this has a low probability just based on this exploratory data analysis. Also I would suspect that the frequency of floods in comparison to hurricanes/typhoons makes up the high amount of total financial damage caused by floods.
In regards of public health I would based on this exploratory data analysis describe tornadoes and heatwaves as the most severe weather hazards. For economic consequences I would claim that floods are most severe but hurricanes, typhoons and tornadoes cannot be disregarded as economical dangers. In addition I would like to mention that for economical consequences a couple of resulting costs are not tracked in the database I accessed and for example the public health consequences probably also effect the economy indirectly or the destruction of working facilities could cause likely higher economical damage than “just” the financial value that has been destroyed.