This study delves into the ramifications of severe weather phenomena on both public health and economic sectors in the United States, leveraging NOAA storm database records. Tornadoes, Excessive Heat, and TSTM Wind stand out as the most fatal events, each displaying distinct temporal patterns. Furthermore, floods emerge as the primary contributors to economic losses, surpassing $144 billion in total property damage. These insights underscore the critical importance of implementing robust disaster preparedness and mitigation measures to alleviate the detrimental impacts on both human lives and the economy.
The data were processed starting from the raw CSV file containing the NOAA storm database. The following steps were taken to prepare the data for analysis:
First, all necessary libraries are loaded and installed.
Then, the bz2 file is downloaded locally to later load the csv file containing the data for this project.
Finally, time and date parameters are transformed into the correct format and all the data set was converted to a data frame for easy handle.
library(ggplot2)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(gridExtra)
library(kableExtra)
library(knitr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:kableExtra':
##
## group_rows
## The following object is masked from 'package:gridExtra':
##
## combine
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(viridisLite)
library(cowplot)
##
## Attaching package: 'cowplot'
## The following object is masked from 'package:lubridate':
##
## stamp
# Local Download the bz2 file
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
dest_file <- "StormData.csv.bz2"
download.file(url, dest_file)
# Read the file
dt <- readr::read_csv(dest_file)
## Rows: 902297 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (18): BGN_DATE, BGN_TIME, TIME_ZONE, COUNTYNAME, STATE, EVTYPE, BGN_AZI,...
## dbl (18): STATE__, COUNTY, BGN_RANGE, COUNTY_END, END_RANGE, LENGTH, WIDTH, ...
## lgl (1): COUNTYENDN
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Convert BGN_DATE to Date format
dt$BGN_DATE <- as.Date(dt$BGN_DATE, format = "%m/%d/%Y")
# Convert BGN_TIME to Hour format (HH:MM)
dt$BGN_TIME <- strptime(dt$BGN_TIME, format = "%H%M")
dt$BGN_TIME <- format(dt$BGN_TIME, format = "%H:%M")
# Create dataframe df
df <- as.data.frame(dt)
A summary of the database is performed by grouping events and their total count of deaths and injuries. Subsequently, the top 10 most catastrophic events are displayed.
# Calculate total fatalities and injuries for each event type
events_health_impact <- df %>%
group_by(EVTYPE) %>%
summarise(total_fatalities = sum(FATALITIES, na.rm = TRUE),
total_injuries = sum(INJURIES, na.rm = TRUE)) %>%
filter(!is.na(total_fatalities) & !is.na(total_injuries)) %>%
arrange(desc(total_fatalities + total_injuries))
# Display table of most harmful events to population health
head(events_health_impact,10)
## # A tibble: 10 × 3
## EVTYPE total_fatalities total_injuries
## <chr> <dbl> <dbl>
## 1 TORNADO 5633 91346
## 2 EXCESSIVE HEAT 1903 6525
## 3 TSTM WIND 504 6957
## 4 FLOOD 470 6789
## 5 LIGHTNING 816 5230
## 6 HEAT 937 2100
## 7 FLASH FLOOD 978 1777
## 8 ICE STORM 89 1975
## 9 THUNDERSTORM WIND 133 1488
## 10 WINTER STORM 206 1321
From the previous table, it is possible to discern the 10 most damaging events and their consequences. The order is based on the sum of both indicators, resulting in the total impact on the population. Thus, Tornado, Excessive Heat, and TSTM Wind are the deadliest events with 104,828 injured and 8,040 fatalities.
Then, by analyzing the top 3 deadliest events, it’s possible to make a comparison for each fatality indicator and event, as shown below:
top_3_events <- head(events_health_impact, 3)$EVTYPE
df_top3 <- df %>%
filter(EVTYPE %in% top_3_events)
df_top3$BGN_DATE <- as.Date(df_top3$BGN_DATE, format = "%m/%d/%Y")
df_top3$Year <- year(df_top3$BGN_DATE)
injuries_by_year <- df_top3 %>%
group_by(Year, EVTYPE) %>%
summarise(total_injuries = sum(INJURIES, na.rm = TRUE),
total_fatalities = sum(FATALITIES, na.rm = TRUE)) %>%
ungroup()
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.
#Injuries plot over the years for Top 3 Events
plot1 <- ggplot(injuries_by_year, aes(x = Year, y = total_injuries , color = EVTYPE)) +
geom_line() +
labs(title = "Total Injuries by Year for Top 3 Most Harmful Events",
x = "Year",
y = "Total Injuries",
color = "Event Type") +
scale_color_manual(values = c("black", "blue", "red"))+
theme_minimal()
#Fatalities plot over the years for Top 3 Events
plot2 <- ggplot(injuries_by_year, aes(x = Year, y = total_fatalities , color = EVTYPE)) +
geom_line() +
labs(title = "Total Fatalities by Year for Top 3 Most Harmful Events",
x = "Year",
y = "Total Fatalities",
color = "Event Type") +
scale_color_manual(values = c("black", "blue", "red"))+
theme_minimal()
combined_plot <- plot_grid(plot1, plot2, ncol = 1)
combined_plot
From Figure 1, it can be observed that over the past 40 years, the trends in injuries and fatalities have remained relatively stable for Tornadoes and TSTM Wind until the beginning of 2010, where a significant increase in tornado-related consequences is evident. Conversely, Excessive Heat exhibited an increasing trend in fatalities towards the end of the 1990s, which subsequently declined. Furthermore, for injuries, it has remained relatively constant and low since the 1990s.
Similarly to the previous question, a summary of the total damages caused in dollars by the events is performed. It is noteworthy that there are two columns associated with property damages: one contains the unit (PROPDMGEXP), which is in thousands (k), millions (m), or billions (b) of dollars. On the other hand, there is PROPDMG, which indicates the magnitude of the damage without units. Therefore, the total damage in dollars must be calculated to enable comparison.
# Calculate total property damage for each event type
events_economic_impact <- df %>%
mutate(prop_damage = PROPDMG * ifelse(PROPDMGEXP %in% c("K", "k"), 1e3,
ifelse(PROPDMGEXP %in% c("M", "m"), 1e6,
ifelse(PROPDMGEXP %in% c("B", "b"), 1e9, 1)))) %>%
group_by(EVTYPE) %>%
summarise(total_prop_damage = sum(prop_damage, na.rm = TRUE)) %>%
filter(!is.na(total_prop_damage)) %>%
arrange(desc(total_prop_damage))
# Display table of events with the greatest economic consequences
head(events_economic_impact,10)
## # A tibble: 10 × 2
## EVTYPE total_prop_damage
## <chr> <dbl>
## 1 FLOOD 144657709807
## 2 HURRICANE/TYPHOON 69305840000
## 3 TORNADO 56937160779.
## 4 STORM SURGE 43323536000
## 5 FLASH FLOOD 16140862067.
## 6 HAIL 15732267048.
## 7 HURRICANE 11868319010
## 8 TROPICAL STORM 7703890550
## 9 WINTER STORM 6688497251
## 10 HIGH WIND 5270046295
Then, it is possible to create a bar plot for easier comparison using the following code:
# Graficar el top 10 de eventos con mayores consecuencias económicas (orden invertido)
dmge_plot_top10 <- ggplot(head(events_economic_impact, 10), aes(x = reorder(EVTYPE, -total_prop_damage), y = total_prop_damage)) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(title = "Top 10 Events by Total Property Damage in US Dollars",
x = "Event Type",
y = "Total Property Damage") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
dmge_plot_top10
En esta figura es posible apreciar que Flood tiene por lejos las peores consecuencias en ´terminos monetarios de daño a la propiedad, siendo superior a los daños causados por Hurricane/Typhoon y Tornado juntos. Es notorio también que este tipo de eventos (top4) tienen una alta relación y comúnmente suelen ocurrir en épocas del año y lugares similares. He aquí algunos puntos en comun de los fenómenos con mayores pérdidas materiales:
These events are closely linked to extreme weather phenomena such as low-pressure systems, strong winds, and intense precipitation. For instance, hurricanes and typhoons can produce heavy rainfall leading to flash floods, while tornadoes may form in conjunction with severe storms, including hurricanes. Storm surges, on the other hand, primarily occur during hurricane events and can significantly contribute to coastal flooding.
These events tend to occur in regions prone to extreme weather conditions, such as coastal areas, river basins, and regions with high storm activity. Therefore, it is common for a region affected by a hurricane to experience flooding due to heavy rainfall and storm surge, along with the possibility of associated tornadoes.
The presence of one extreme weather event can increase the risk of occurrence or intensify the impact of others. For example, a hurricane striking a coastal region can lead to significant flooding due to heavy rainfall and storm surge, thereby increasing the likelihood of tornado formation in affected areas.