This report analyzes NOAA’s Storm Data to evaluate the effects of weather events on population health and economic loss in the United States. Data is processed and compiled by event type, calculating fatalities, injuries, and monetary damage.
Graphical plots reveal that tornadoes are the most deadly (fatalities), and also have the largest effect on population health (fatalities+injuries). While floods cause the highest economic damage. Percentage analyses emphasizes the significant contributions of these events to overall impacts. The findings offer insights for improving disaster management and policy decisions.
Data was loaded from web-based database. Next, the downloaded file was used to create a data frame and inspected to facilitate future working.
#download and read in data
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destfile <- "StormData.csv.bz2" # fixed file name
#download the file if it doesn't already exist
if(!file.exists(destfile)) {
download.file(url, destfile, mode = "wb")
}
#read the compressed CSV file
stormdata <- read.csv(bzfile(destfile))
#inspect data to aid later working
str(stormdata)
names(stormdata)
The original data was modified into three groups.
First the data is compiling by total fatalities per weather event (EVTYPE) and the top 5 most fatal weather events are selected.
#group fatalities by event type
fatalities_by_event <- stormdata %>%
group_by(EVTYPE) %>%
summarise(total_fatalities = sum(FATALITIES, na.rm = TRUE)) %>%
filter(total_fatalities > 0) %>%
arrange(desc(total_fatalities))
#select the top 5
top_fatalities <- fatalities_by_event %>% top_n(5, total_fatalities)
Second the data is complied by total fatalities and total injuries per event and the top 5 most harmful events are selected.
#find most dangerous events by combining fatalities and injuries
health_impact <- stormdata %>%
group_by(EVTYPE) %>%
summarise(total_health = sum(FATALITIES + INJURIES, na.rm = TRUE)) %>%
filter(total_health > 0) %>%
arrange(desc(total_health))
#reorder data
top_health <- health_impact %>% top_n(5, total_health)
This data will be used to answer which types of events are most harmful to population health.
Lastly, the data is modified to address total economic damage by compiled by total property and crop damage.
Property damage and crop damage is first calculated in a dollar amount since this is not given in the original data.
#convert symbols to multipliers
exp_to_multiplier <- function(exp) {
if(exp %in% c("", "0")) {
1
} else if(exp %in% c("K", "k")) {
1e3
} else if(exp %in% c("M", "m")) {
1e6
} else if(exp %in% c("B", "b")) {
1e9
} else {
1
}
}
#create multiplier columns for property and crop damage
stormdata$prop_multiplier <- sapply(stormdata$PROPDMGEXP, exp_to_multiplier)
stormdata$crop_multiplier <- sapply(stormdata$CROPDMGEXP, exp_to_multiplier)
#calculate property and crop damages in dollars
stormdata$prop_dollars <- stormdata$PROPDMG * stormdata$prop_multiplier
stormdata$crop_dollars <- stormdata$CROPDMG * stormdata$crop_multiplier
#find total economic damage and create column for this
stormdata$total_damage <- stormdata$prop_dollars + stormdata$crop_dollars
Then, the total cost of property and crop damage per weather event is calculated. The top 10 most costly events are selected.
#find total damage by event type
damage_by_event <- stormdata %>%
group_by(EVTYPE) %>%
summarise(total_damage = sum(total_damage, na.rm = TRUE)) %>%
filter(total_damage > 0) %>%
arrange(desc(total_damage))
#select top 10 events
top_damage <- damage_by_event %>% top_n(10, total_damage)
This data will be used to answer which types of events have the greatest economic consequence.
We will address population health in two parts.
First, by looking solely at fatalities from each weather events to determine which storms are the most deadly overall.
ggplot(top_fatalities, aes(x = reorder(EVTYPE, total_fatalities), y = total_fatalities)) +
geom_bar(stat = "identity", fill = "red") +
coord_flip() +
labs(title = "Top 5 Events by Fatalities",
x = "Weather Event",
y = "Total Fatalities")
Tornadoes are the most dangerous weather event for overall deaths and excessive heat is second.
Knowing the percent of overall deaths due to storm related fatalities can help put their impact in perspective.
#find overall fatalities from the dataset
overall_fatalities <- sum(stormdata$FATALITIES, na.rm = TRUE)
#find the percentage contribution for each top event
top_fatalities <- top_fatalities %>%
mutate(percentage = total_fatalities / overall_fatalities * 100)
#make a pretty table for the report
fatalities_table <- top_fatalities %>%
mutate(Percentage = round(percentage, 2)) %>%
select(Weather_Event = EVTYPE, Percentage) %>%
head(3)
#print table
knitr::kable(fatalities_table)
| Weather_Event | Percentage |
|---|---|
| TORNADO | 37.19 |
| EXCESSIVE HEAT | 12.57 |
| FLASH FLOOD | 6.46 |
This confirms that tornadoes account for almost 40% of fatalities due to weather related events.
Next, we will look at fatalities and injuries to see which
weather events have the largest impact on population health.
ggplot(top_health, aes(x = reorder(EVTYPE, total_health), y = total_health)) +
geom_bar(stat = "identity", fill = "blue") +
coord_flip() +
labs(title = "Top 5 Events by Total Health Impact",
x = "Total Health Impact (Fatalities + Injuries)",
y = "Weather Event")
This shows that tornado are the most harmful to overall health. Excessive heat is next, but comprised a much smaller portion of impact to population health.
To further demonstrate the effect these events have, the percentage of overall health impact is calculated.
#find overall injuries from the dataset
overall_health <- sum(stormdata$FATALITIES + stormdata$INJURIES, na.rm = TRUE)
#find the percentage contribution for each top event
top_health <- top_health %>%
mutate(percentage = total_health / overall_health * 100)
#make a pretty table for the report
health_table <- top_health %>%
mutate(Percentage = round(percentage, 2)) %>%
select(Weather_Event = EVTYPE, Percentage) %>%
head(5)
#print table
knitr::kable(health_table)
| Weather_Event | Percentage |
|---|---|
| TORNADO | 62.30 |
| EXCESSIVE HEAT | 5.41 |
| TSTM WIND | 4.79 |
| FLOOD | 4.66 |
| LIGHTNING | 3.88 |
Tornadoes account for about 60% of all storm related injuries and deaths, and all others events account for less than 6%.
This confirms that tornadoes have the largest impact to population health of all storm related events.
Now we address economic consequences by looking at which weather events cause the most economic damage to property and crops.
# Create a horizontal bar plot of the top events by economic damage
ggplot(top_damage, aes(x = reorder(EVTYPE, total_damage), y = total_damage)) +
geom_bar(stat = "identity", fill = "orange") +
coord_flip() +
labs(title = "Top 10 Events by Economic Consequences",
x = "Weather Event Type",
y = "Total Damage (USD)") +
scale_y_continuous(labels = scales::dollar_format())
Flood contribute to the largest economic damage, followed by typhoons/hurricanes, and again by tornadoes.
It is worth pointing out that flood damage appears to be more than double the damage cause by any other single weather event.
We will confirm this with a percentage analysis of overall damage.
#find overall economic damage across all events
overall_damage <- sum(stormdata$total_damage, na.rm = TRUE)
#find percentage contribution for each top event
top_damage <- top_damage %>%
mutate(percentage = total_damage / overall_damage * 100)
#make it look pretty in a table
damage_table <- top_damage %>%
mutate(Percentage = round(percentage, 2)) %>%
select(Weather_Event = EVTYPE, Percentage) %>%
head(5)
#print table
knitr::kable(damage_table)
| Weather_Event | Percentage |
|---|---|
| FLOOD | 31.55 |
| HURRICANE/TYPHOON | 15.09 |
| TORNADO | 12.04 |
| STORM SURGE | 9.09 |
| HAIL | 3.94 |
This confirms that floods have the largest economic impact, about 32%, and are more than double the economic damage of any other weather event.
Also, hurricane/typhoons and tornadoes economic damage combined, about 28%, represent almost as large an economic impact as floods.