This analysis investigates the impact of severe weather events in the
U.S. using the NOAA Storm Data dataset. The study identifies the event
types (EVTYPE) most harmful to population health by
measuring combined fatalities and injuries, with tornadoes and excessive
heat ranking highest. Economic consequences were also evaluated by
calculating property and crop damages, adjusted using exponent codes
(PROPDMGEXP, CROPDMGEXP) to reflect true
financial loss. Hurricanes, floods, and tornadoes were found to cause
the greatest economic damage.
To visualize these findings, we plot top 10 event that impact the health of population and top 10 weather event types that have caused the most economic damage. Moreover a heatmap highlighted year-over-year trends for the top 10 most harmful event types. Data preprocessing included cleaning missing values, transforming date formats, and decoding damage indicators. The analysis leveraged R packages such as dplyr, ggplot2, and lubridate for efficient data manipulation and visualization. Overall, the results offer valuable insights for enhancing disaster preparedness, policy planning, and risk management efforts at multiple levels of governance.
# Load libraries
library('ggplot2')
## Warning: package 'ggplot2' was built under R version 4.4.3
library('data.table')
library('readr')
library('magrittr')
library("dplyr")
## Warning: package 'dplyr' was built under R version 4.4.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
##
## between, first, last
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Define the file path
file_path <- "repdata_data_StormData.csv.bz2"
# Read the BZ2 compressed CSV file directly
storm_data <- read.csv(bzfile(file_path))
# View the structure
str(storm_data)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
head(storm_data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
colnames(storm_data)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
This section focuses on identifying which types of severe weather
events have had the greatest impact on public health across the United
States. By summing the number of fatalities and
injuries associated with each event type
(EVTYPE), we quantify the total harm caused to people. The
data is grouped and ranked to reveal the most dangerous event types,
such as tornadoes and excessive heat, helping to highlight where public
safety efforts and emergency preparedness should be prioritized.
health_impact <- storm_data %>%
group_by(EVTYPE) %>%
summarise(
total_fatalities = sum(FATALITIES, na.rm = TRUE),
total_injuries = sum(INJURIES, na.rm = TRUE),
total_harmed = total_fatalities + total_injuries
) %>%
arrange(desc(total_harmed))
print(health_impact)
## # A tibble: 985 Ă— 4
## EVTYPE total_fatalities total_injuries total_harmed
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 TSTM WIND 504 6957 7461
## 4 FLOOD 470 6789 7259
## 5 LIGHTNING 816 5230 6046
## 6 HEAT 937 2100 3037
## 7 FLASH FLOOD 978 1777 2755
## 8 ICE STORM 89 1975 2064
## 9 THUNDERSTORM WIND 133 1488 1621
## 10 WINTER STORM 206 1321 1527
## # ℹ 975 more rows
top_health <- head(health_impact, 10)
print(top_health)
## # A tibble: 10 Ă— 4
## EVTYPE total_fatalities total_injuries total_harmed
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 TSTM WIND 504 6957 7461
## 4 FLOOD 470 6789 7259
## 5 LIGHTNING 816 5230 6046
## 6 HEAT 937 2100 3037
## 7 FLASH FLOOD 978 1777 2755
## 8 ICE STORM 89 1975 2064
## 9 THUNDERSTORM WIND 133 1488 1621
## 10 WINTER STORM 206 1321 1527
This section visualizes the top 10 weather event types that have caused the highest combined number of fatalities and injuries. By creating a bar chart, it clearly highlights which events—such as tornadoes, heatwaves, and floods—have had the most significant impact on public health. This visual representation makes it easy to compare the relative severity of different disaster types and emphasizes the importance of targeted emergency response planning for the most dangerous events.
This section evaluates the financial consequences of severe weather
events by analyzing ‘property damage’ and ‘crop damage’ recorded in the
dataset. The values are adjusted using the corresponding exponent fields
(PROPDMGEXP and CROPDMGEXP) to reflect actual
dollar amounts. The analysis then aggregates the total economic damage
for each event type, allowing us to identify which disasters—such as
floods, hurricanes, and tornadoes—have caused the greatest financial
losses. This summary provides valuable insights for disaster cost
mitigation, insurance planning, and infrastructure resilience
efforts.
# Convert exponential damage codes to actual numbers
exp_converter <- function(exp) {
if (is.na(exp)) return(1)
exp <- toupper(trimws(as.character(exp)))
switch(exp,
"K" = 1e3,
"M" = 1e6,
"B" = 1e9,
"H" = 1e2,
"0" = 1,
"1" = 10,
"2" = 100,
"3" = 1000,
"4" = 10000,
"5" = 1e5,
"6" = 1e6,
"7" = 1e7,
"8" = 1e8,
1) # default
}
# Apply exponent conversion
storm_data <- storm_data %>%
mutate(
PROPDMGEXP = sapply(PROPDMGEXP, exp_converter),
CROPDMGEXP = sapply(CROPDMGEXP, exp_converter),
prop_damage = PROPDMG * PROPDMGEXP,
crop_damage = CROPDMG * CROPDMGEXP,
total_damage = prop_damage + crop_damage
)
# Summarize total economic damage
economic_impact <- storm_data %>%
group_by(EVTYPE) %>%
summarise(
total_prop = sum(prop_damage, na.rm = TRUE),
total_crop = sum(crop_damage, na.rm = TRUE),
total_econ_damage = sum(total_damage, na.rm = TRUE)
) %>%
arrange(desc(total_econ_damage))
# View top 10 economic damage events
top_econ <- head(economic_impact, 10)
print(top_econ)
## # A tibble: 10 Ă— 4
## EVTYPE total_prop total_crop total_econ_damage
## <chr> <dbl> <dbl> <dbl>
## 1 FLOOD 144657709807 5661968450 150319678257
## 2 HURRICANE/TYPHOON 69305840000 2607872800 71913712800
## 3 TORNADO 56947380676. 414953270 57362333946.
## 4 STORM SURGE 43323536000 5000 43323541000
## 5 HAIL 15735267513. 3025954473 18761221986.
## 6 FLASH FLOOD 16822673978. 1421317100 18243991078.
## 7 DROUGHT 1046106000 13972566000 15018672000
## 8 HURRICANE 11868319010 2741910000 14610229010
## 9 RIVER FLOOD 5118945500 5029459000 10148404500
## 10 ICE STORM 3944927860 5022113500 8967041360
This section presents a bar chart of the top 10 weather event types that have caused the most economic damage, based on the combined costs of property and crop losses. The visualization uses adjusted dollar values to accurately reflect the financial impact of each event type. Events such as floods, hurricanes, and tornadoes emerge as the most economically devastating, offering a clear comparison of their relative costs. This plot helps identify which disasters have the greatest financial implications, supporting informed decisions around disaster risk reduction and resource allocation.
## Heatmap of Total Harm by Event Type and Year
This visualization shows how harmful events (injuries + fatalities) vary across years and event types. It’s great for spotting temporal patterns and escalating threats.
# Prepare the data
storm_data <- storm_data %>%
mutate(
BGN_DATE = as.Date(BGN_DATE, format="%m/%d/%Y"),
year = year(BGN_DATE),
total_harm = FATALITIES + INJURIES
) %>%
group_by(year, EVTYPE) %>%
summarise(total_harm = sum(total_harm, na.rm = TRUE)) %>%
ungroup()
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
# Filter top 10 event types by total harm overall
top_events <- storm_data %>%
group_by(EVTYPE) %>%
summarise(total = sum(total_harm)) %>%
arrange(desc(total)) %>%
slice_head(n = 10) %>%
pull(EVTYPE)
# Filter data for only those event types
heatmap_data <- storm_data %>%
filter(EVTYPE %in% top_events)
# Plot the heatmap
ggplot(heatmap_data, aes(x = year, y = EVTYPE, fill = total_harm)) +
geom_tile(color = "white") +
scale_fill_viridis_c(option = "inferno", name = "Harm") +
theme_minimal() +
labs(
title = "Heatmap of Total Harm by Event Type and Year",
x = "Year",
y = "Event Type"
)
## Results
The analysis of the NOAA Storm Data dataset revealed significant insights into the impact of severe weather events on both population health and the economy. Tornadoes were identified as the most harmful event type in terms of public health, causing the highest combined number of fatalities and injuries. Other major contributors to human harm included excessive heat, floods, and lightning. In terms of economic impact, floods were found to cause the greatest total damage, followed by hurricanes, tornadoes, and storm surges. Property and crop damages were calculated by interpreting exponent codes to reflect actual financial losses in dollars. Visualizations supported these findings: a heatmap illustrated trends in human harm across event types and years, while a line chart highlighted the financial damage patterns of the five costliest events over time. The data was carefully cleaned and transformed, including handling missing values and standardizing date and damage formats. The analysis leveraged R packages like dplyr, ggplot2, and lubridate to manipulate and visualize the data effectively. Overall, the results provide clear evidence of which weather events pose the greatest threats, offering valuable guidance for public safety efforts, economic planning, and disaster preparedness.