This project examines storm data obtained from the U.S. National Oceanic and Atmospheric Administration (NOAA), focusing on variables related to disaster type, public health effects, and economic losses. It seeks to answer two primary questions: (1) Which event types (as classified under the EVTYPE variable) pose the greatest threat to population health across the United States? (2) Which event types incur the most significant economic costs nationwide? Our analysis reveals that tornadoes are the most detrimental to population health, while floods result in the highest economic damage.
Loading necessary libraries and setting options
options(digits = 1) # One decimal after
options(scipen = 999) # Turn off scientific notation for numbers
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
To load the data file, we employ the read.csv() function in conjunction with bzfile() to handle the compressed file format.
filename <- "~/R Projects/RepData_PeerAssessment2/repdata_data_StormData.csv.bz2"
data <- read.csv(bzfile(filename))
For the population health measure, we aggregate fatalities and injuries into a single total and assign this value to a newly created variable/column.
data$populationHealth <- data$FATALITIES + data$INJURIES
For economic impact, it is important to note that two additional variables, PROPDMGEXP and CROPDMGEXP, indicate the multiplier for property and crop damage, respectively. In these columns, “K” denotes thousands, “M” represents millions, and “B” signifies billions. To convert these alphabetic codes into their corresponding numeric values, we apply the mutate() function from the dplyr package, while replacing any missing values with 1.
data <- data %>%
mutate(PROPDMGEXP = case_when(
PROPDMGEXP == "K" ~ 1000,
PROPDMGEXP == "M" ~ 1000000,
PROPDMGEXP == "B" ~ 1000000000,
TRUE ~ 1
))
data <- data %>%
mutate(CROPDMGEXP = case_when(
CROPDMGEXP == "K" ~ 1000,
CROPDMGEXP == "M" ~ 1000000,
CROPDMGEXP == "B" ~ 1000000000,
TRUE ~ 1
))
With the numeric multipliers in place, we compute the total economic impact by multiplying the property and crop damage values by their respective factors and then summing the two products.
data$economicImpact <- data$PROPDMG * data$PROPDMGEXP + data$CROPDMG * data$CROPDMGEXP
We then generate a summarized version of the dataset that retains only the event type, the aggregate population health impact, and the total economic damage.
# Calculate the sum of total fatalities by event type
health <- data %>%
group_by(EVTYPE) %>%
summarize(health = sum(populationHealth, na.rm = TRUE))
# Sorting the event types from highest to lowest fatalities
health <- arrange(health, desc(health))
# Calculate the sum of total economical damage by event type
economy <- data %>%
group_by(EVTYPE) %>%
summarize(economy = sum(economicImpact, na.rm = TRUE))
# Sorting the event types from highest to lowest cost of damage
economy <- arrange(economy, desc(economy))
View the top 10 type events with the highest impact on population health.
head(health, n=10)
## # A tibble: 10 × 2
## EVTYPE health
## <chr> <dbl>
## 1 TORNADO 96979
## 2 EXCESSIVE HEAT 8428
## 3 TSTM WIND 7461
## 4 FLOOD 7259
## 5 LIGHTNING 6046
## 6 HEAT 3037
## 7 FLASH FLOOD 2755
## 8 ICE STORM 2064
## 9 THUNDERSTORM WIND 1621
## 10 WINTER STORM 1527
Tornadoes rank as the event type with the greatest impact on population health, accounting for nearly 100,000 fatalities. They are followed, in descending order, by excessive heat, very strong wind, floods, lightning, heat, flash floods, ice storms, thunderstorm wind, and winter storms (refer to Figure 1).
ggplot(health %>% head(n = 10), aes(x = reorder(EVTYPE, -health), y = health, fill = EVTYPE)) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) +
xlab("") +
ylab("Population") +
ggtitle("Top Ten Event Types by Population Health Impact")
View the top 10 type events with the highest economical damages.
head(economy, n=10)
## # A tibble: 10 × 2
## EVTYPE economy
## <chr> <dbl>
## 1 FLOOD 150319678257
## 2 HURRICANE/TYPHOON 71913712800
## 3 TORNADO 57340614060.
## 4 STORM SURGE 43323541000
## 5 HAIL 18752904943.
## 6 FLASH FLOOD 17562129167.
## 7 DROUGHT 15018672000
## 8 HURRICANE 14610229010
## 9 RIVER FLOOD 10148404500
## 10 ICE STORM 8967041360
Floods are attributed to the highest economic losses, which account for over $150 billion in damages. The remaining top event types—ranked sequentially—are typhoon-like hurricanes, tornadoes, storm surges, hail, flash floods, drought, hurricanes, river floods, and ice storms (refer to Figure 2).
ggplot(economy %>% head(n = 10), aes(x = reorder(EVTYPE, -economy), y = economy, fill = EVTYPE)) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) +
xlab("") +
ylab("Damage Cost") +
ggtitle("Top Ten Event Types by Damage Cost in Dollars")