This project deals with a storm dataset from the U.S. National Oceanic and Atmospheric Administration (NOAA). It makes use of the disaster type, the population health impact, and economic damage impact variables from the dataset. We address two questions: 1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? 2. Across the United States, which types of events have the greatest economic consequences? For population health, tornadoes have the highest amount of damage. For economic impact, floods have the highest amount of damage.
We first use read.csv() with the bzfile() function to load the dataset.
storm_data <- read.csv(bzfile("repdata_data_StormData.csv.bz2"))
To get the impact on population health, we get the total number of fatalities and injuries and create a new column.
storm_data$POP_HEALTH <- storm_data$FATALITIES + storm_data$INJURIES
For economic impact, it must be noted that there are two additional columns, PROPDMGEXP and CROPDMGEXP which represent the factor of property and crop damage. Here, “K” represents thousands, “M” represents millions, and “B” represents billions. To replace these letters with their numerical value, we use the dplyr function and the mutate() function. N/A values are replaced with 0.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
storm_data <- storm_data %>%
mutate(PROPDMGEXP = case_when(
PROPDMGEXP == "K" ~ 1000,
PROPDMGEXP == "M" ~ 1000000,
PROPDMGEXP == "B" ~ 1000000000,
TRUE ~ 0
))
storm_data <- storm_data %>%
mutate(CROPDMGEXP = case_when(
CROPDMGEXP == "K" ~ 1000,
CROPDMGEXP == "M" ~ 1000000,
CROPDMGEXP == "B" ~ 1000000000,
TRUE ~ 0
))
Now, we can get the total economic impact by multiplying property damage and crop damage values by their respective factors, then getting the sum.
storm_data$ECON_CONSEQ <- storm_data$PROPDMG * storm_data$PROPDMGEXP + storm_data$CROPDMG * storm_data$CROPDMGEXP
We then create a summary of the data that only considers the disaster type, the total population health impact, and the total economic impact.
tot_pop <- storm_data %>%
group_by(EVTYPE) %>%
summarize(TOT_POP_HEALTH = sum(POP_HEALTH, na.rm = TRUE))
tot_pop <- arrange(tot_pop, desc(TOT_POP_HEALTH))
tot_econ <- storm_data %>%
group_by(EVTYPE) %>%
summarize(TOT_ECON_CONSEQ = sum(ECON_CONSEQ, na.rm = TRUE))
tot_econ <- arrange(tot_econ, desc(TOT_ECON_CONSEQ))
tot_summary <- full_join(tot_pop, tot_econ)
## Joining with `by = join_by(EVTYPE)`
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.5.2
ten_highest_pop <- tot_pop[1:10, ]
ggplot(ten_highest_pop, aes(x = EVTYPE, y = TOT_POP_HEALTH)) + geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) + xlab("") + ylab("Population") + ggtitle("Ten Most Impactful Disaster Types on Population Health")
For population health, tornadoes have the highest amount of damage.
ten_highest_econ <- tot_econ[1:10, ]
ggplot(ten_highest_econ, aes(x = EVTYPE, y = TOT_ECON_CONSEQ)) + geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) + xlab("") + ylab("Population") + ggtitle("Ten Most Impactful Disaster Types on Economic Damages")
For economic impact, floods have the highest amount of damage.