In this analysis I found, that the event named ‘TORNADO’, in the stormdata database from the NOAA Satellite and Information Service, is the most harmful event regarding population health in the US, based on the most people got injured or died in this event between 1999 and 2011 Furthermore I found, that the Event named “FLOOD” has the greatest economic consequences in the US, this event was responible for damages as high as 150 trillion USD.
In this document we evaluate which types of events in the stormdata-database from 1950 to 2011 (NOAA Satellite and Information Service) are most harmful with respect to population health across the United States. Secondly we show which types of events have the greatest economic consequences.
To perform the necessary analysis, the following libraries were used:
library(R.utils)
library(data.table)
library(dplyr)
library(ggplot2)
library(scales)
The dataset used for this analysis was provided in the Coursera
Course project (week 4). It was loaded using the fread
function from the data.table package for efficient reading
of the data:
stormdata <- fread("repdata_data_StormData.csv.bz2")
To assess the impact of weather events on population health and the economy, we first examine the column names of the dataset to identify relevant variables:
names(stormdata)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
From the column names, we identify that: - For evaluating the impact
on population health, the variables of interest are
FATALITIES and INJURIES. - For evaluating the
economic consequences, the relevant variables are PROPDMG,
PROPDMGEXP, CROPDMG, and
CROPDMGEXP.
The PROPDMGEXP and CROPDMGEXP columns
indicate the magnitude of the damage amounts. We need to convert these
factors into numerical values and multiply them with
PROPDMG and CROPDMG to get the actual damage
values.
# Function to convert the PROPDMGEXP and CROPDMGEXP to numerical values
convert_exp <- function(exp) {
ifelse(exp %in% c('h', 'H'), 100,
ifelse(exp %in% c('k', 'K'), 1000,
ifelse(exp %in% c('m', 'M'), 1e6,
ifelse(exp %in% c('b', 'B'), 1e9, 1))))
}
# Applying the conversion function to the data
stormdata <- stormdata %>%
mutate(PROPDMGEXP = convert_exp(PROPDMGEXP),
CROPDMGEXP = convert_exp(CROPDMGEXP),
TOTAL_PROPDMG = PROPDMG * PROPDMGEXP,
TOTAL_CROPDMG = CROPDMG * CROPDMGEXP,
TOTAL_ECONOMIC_DAMAGE = TOTAL_PROPDMG + TOTAL_CROPDMG)
To determine which types of weather events are most harmful to population health, we calculate a weighted sum of fatalities and injuries. Fatalities are given a higher weight (10 times) due to their severe impact.
The following code selects the relevant variables, applies the weighting, and summarizes the data by event type:
health <- stormdata %>%
select(EVTYPE, FATALITIES, INJURIES) %>%
mutate(fat_weight = FATALITIES * 10, inj_weight = INJURIES) %>%
mutate(fat_inj_sum = fat_weight + inj_weight) %>%
group_by(EVTYPE) %>%
summarise(fat_count = sum(fat_inj_sum)) %>%
arrange(desc(fat_count))
# Display the top 10 most harmful events
health_top10 <- head(health, 10)
health_top10
## # A tibble: 10 × 2
## EVTYPE fat_count
## <chr> <dbl>
## 1 TORNADO 147676
## 2 EXCESSIVE HEAT 25555
## 3 LIGHTNING 13390
## 4 TSTM WIND 11997
## 5 FLASH FLOOD 11557
## 6 FLOOD 11489
## 7 HEAT 11470
## 8 RIP CURRENT 3912
## 9 HIGH WIND 3617
## 10 WINTER STORM 3381
The following plot visualizes the top 10 most harmful weather events in terms of their impact on population health:
ggplot(health_top10, aes(x = reorder(EVTYPE, fat_count), y = fat_count)) +
geom_bar(stat = "identity", fill = "steelblue") +
coord_flip() +
labs(title = "Top 10 Weather Events by Health Impact",
x = "Event Type",
y = "Weighted Fatalities and Injuries") +
scale_y_continuous(labels = scales::comma) +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5))
To evaluate the economic impact of weather events, we now use the adjusted damage values and summarize them by event type.
economic <- stormdata %>%
group_by(EVTYPE) %>%
summarise(total_damage = sum(TOTAL_ECONOMIC_DAMAGE)) %>%
arrange(desc(total_damage))
# Display the top 10 events by economic damage
economic_top10 <- head(economic, 10)
economic_top10
## # A tibble: 10 × 2
## EVTYPE total_damage
## <chr> <dbl>
## 1 FLOOD 150319678257
## 2 HURRICANE/TYPHOON 71913712800
## 3 TORNADO 57352114049.
## 4 STORM SURGE 43323541000
## 5 HAIL 18758222016.
## 6 FLASH FLOOD 17562129167.
## 7 DROUGHT 15018672000
## 8 HURRICANE 14610229010
## 9 RIVER FLOOD 10148404500
## 10 ICE STORM 8967041360
The following plot visualizes the top 10 weather events in terms of economic damage:
ggplot(economic_top10, aes(x = reorder(EVTYPE, total_damage), y = total_damage)) +
geom_bar(stat = "identity", fill = "darkred") +
coord_flip() +
labs(title = "Top 10 Weather Events by Economic Damage",
x = "Event Type",
y = "Total Economic Damage (USD)") +
scale_y_continuous(labels = scales::comma) +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5))
The analysis identified the most harmful weather events in terms of both population health and economic impact. Tornadoes, excessive heat, and floods are significant contributors to fatalities and injuries, while hurricanes and floods account for the largest economic losses. The findings underscore the importance of targeted disaster preparedness and mitigation efforts.