In this analysis, we look at NOAA data to draw generalization about weather incidents in terms of destructiveness by type. We use fatality and injury measures as metrics for impact to population health and property and crop damage measures as metrics for economic consequences. We obtain these totals for weather incidents by NOAA Event type to find and visualize the top 10 most fatal weather events and top 10 most damaging weather events.
When trying to determine how best to prepare for a harmful weather event using limited resources, it is important to understand which weather events are the most destructive so that those events can be prioritized appropriately. Within the question is recognizing which events are the most destructive are two separate but related questions of which weather events are most harmful with respect to population health, and which weather events have the greatest economic consequences. In this analysis, we will use NOAA data to visualize the top ten most destructive weather events for each of these questions.
The data we use is from the National Weather Service and is publically available as a CSV here. There is supporting documentation for this data available here and an FAQ here. We read the data in using the typical read.csv
method.
stormData <- read.csv("StormData.csv.bz2")
NOAA weather incidents data uses two fields to describe each measured value of financial damage, a DMG
value to indicate the quantity of the financial damage and a DMGEXP
value to indicate the unit of measurement that quantity is in. For example, if for an incident the PROPDMG
is 123
and the PROPDMGEXP
is K
, this indicates that that incident did an estimated $123,000 in property damage.
This undocumented untidiness was translated by the research of David Hood and published by Eddie Song in RPubs here and that document will form the basis of our formatting of these values. In this analysis, we use a hardcoded look-up table to standardize the damage measures of properties and crops into dollars.
require(tibble)
require(dplyr)
expToMultiplier <- frame_data(
~DMGEXP, ~Multiplier,
"H", 100,
"h", 100,
"K", 1000,
"k", 1000,
"M", 1000000,
"m", 1000000,
"B", 1000000000,
"b", 1000000000,
"+", 1,
"-", 0,
"?", 0,
"0", 10,
"1", 10,
"2", 10,
"3", 10,
"4", 10,
"5", 10,
"6", 10,
"7", 10,
"8", 10,
"9", 10,
"", 0
) %>% mutate(
PROPDMGEXP = DMGEXP,
CROPDMGEXP = DMGEXP,
PROPMULT = Multiplier,
CROPMULT = Multiplier
) %>% select(
PROPDMGEXP, CROPDMGEXP, PROPMULT, CROPMULT
)
stormData <- stormData %>%
inner_join(expToMultiplier) %>%
mutate(
PROPCASH = PROPDMG * PROPMULT,
CROPCASH = CROPDMG * CROPMULT
)
To answer the question of which types of events are most harmful to population health and which have the greatest economic consequences, we need metrics available or derivable from the measures in the NOAA data. We’ve decided in this analysis to look at casualties as measured by the NOAA FATALITIES
and INJURIES
fields and financial damage to properties and crops as measured in the NOAA PROPDMG
, PROPDMGEXP
, CROPDMG
, and CROPDMGEXP
fields that we’ve already standardized.
stormData <- stormData %>%
select(
EVTYPE,
FATALITIES,
INJURIES,
PROPCASH,
CROPCASH
)
Because there are some event types whose incidences are more frequent than other event types and because want to analize which event types are as a whole most destructive, we’ve group the incidences by event type and calculated the summation of casualties and financial damages here.
destructionTotalsByEvent <- stormData %>%
group_by(EVTYPE) %>%
summarize(
Fatality = sum(FATALITIES),
Injury = sum(INJURIES),
PropertyDamage = sum(PROPCASH),
CropDamage = sum(CROPCASH)
)
Because there are hundreds of different event types in the NOAA data and we’re interested only in the most harmful of them, we take only the 10 event types with the highest total fatalities to visualize. Although the 10 events with the most fatalities is not quite the same as the 10 events with the most injuries, we use fatalities as the threshold because we interpret fatalities as the more significant metric for overall population health.
top10FatalEvents <- destructionTotalsByEvent %>%
select(EVTYPE, Fatality, Injury) %>%
arrange(desc(Fatality)) %>%
top_n(10)
In order to support our visualization of fatal events ordered by fatality, we need a second field to track this information that we will have for every record even after we break it down by casualty type.
top10FatalEvents <- top10FatalEvents %>%
mutate(TotalFatalities = Fatality)
In order to see a clearer picture of the overall population health impacted by an event than just fatalities or overall casualties, we need to format our data to support a breakdown of event type by casualty type so that fatalities and injuries can be compared side by side.
require(reshape2)
top10FatalEventsByCasualtyType <-
melt(
top10FatalEvents,
id.vars = c("EVTYPE", "TotalFatalities")
) %>%
rename(
CasualtyType = variable,
Count = value
)
As we did with fataly event we take only the 10 event types with the highest total damages to visualize the event types with the most damage. In this analysis, our interpretation treats property damage and crop damage as equally detrimental to the economy (for purposes of picking events to visualize at least) for our event types and so use the total of damage of both property and crops for our threshold.
top10DamageEvents <- destructionTotalsByEvent %>%
select(EVTYPE, PropertyDamage, CropDamage) %>%
arrange(desc(PropertyDamage + CropDamage)) %>%
top_n(10)
In order to see a clearer picture of the economic impacted by an event than just the overall damage, we need to format our data to support a breakdown of event type by damage type so that property damage and crop damage can be compared side by side.
top10DamageEventsByDamageType <- melt(
top10DamageEvents,
id.vars = c("EVTYPE")
) %>%
rename(
DamageType = variable,
Damage = value
)
In order to make the scale in our visualization more readable, we convert our measures of total damage to total damage in billions with a simple divide by a billion.
top10DamageEventsByDamageType <- top10DamageEventsByDamageType %>%
mutate(
DamageBillions = Damage / 1000000000
)
Displayed here are the top 10 most fatal event types ranked, and their injury counts (which are in much great counts).
require(ggplot2)
ggplot(
top10FatalEventsByCasualtyType,
aes(x = reorder(EVTYPE, TotalFatalities), y = Count)
) +
geom_bar(stat = "identity", position = position_dodge()) +
facet_grid(. ~ CasualtyType, scales = "free") +
coord_flip() +
labs(
title = "Top 10 Most Fatal Event Types, Fatality and Injury Counts",
x = "NOAA Event Type",
y = "Casualty Count"
)
A bar plot of the 10 events with the most total fatalities, and their number of total fatalities and injuries.
Displayed here are the top 10 most damaging event types ranked, and broken down by property and crop damage.
ggplot(
top10DamageEventsByDamageType,
aes(
x = reorder(EVTYPE, DamageBillions),
y = DamageBillions,
fill = factor(DamageType, levels = c("CropDamage", "PropertyDamage"))
)
) +
geom_bar(stat = "identity") +
coord_flip() +
labs(
title = "Top 10 Most Overall Damage by Event Type",
x = "NOAA Event Type",
y = "Total Overall Damages (Billions of Dollars)",
fill = "Type of Damage"
) +
scale_color_manual(labels = c("Crop", "Property")) +
theme(legend.position = "bottom")
A bar plot of the 10 events with the most total damage, for both property damage and crop damage.
As can be seen in the above visualizations, the weather which events cause the most fatalities are not the same as the events that cause the most financial damage. Excessive Heat
is by far the most fatal event type, roughly twice as fatal as the Heat
designation, which in turn with Lightning
are each more than twice as fatal than the event types below it. In terms of financial damage, on the other hand, Hurricane/Typhoon
and River Flood
types are the most damaging, follow by Flood
and Hurricane
. To identify an event type that is destructive both in terms of fatalities and financial damages, Flood
event types are 6th in fatalities and 3rd in financial damage.