Storms and other severe weather events are increasingly causing both public health and economic problems for communities and municipalities. Using the U.S. National Oceanic and Atmospheric Administration’s storm database, which tracks characteristics of major storms and weather events in the United States, we analyzed the public health and economic impacts of severe weather from 1950 and to the end in November 2011. We calculated the public health impacts in two ways: the total number of fatalities and injuries for a given type of weather event and the fatality rate of a weather event, which is the ratio of the number of fatalities for the event and the total number of fatalities and injuries. From these data, we found that tornadoes by far caused the most fatalities and injuries, but that the cold and wind chill had the highest fatality rate. For economic impacts, we measured the total property and crop damages of a weather event. Tornadoes again had by far the highest impact.
For this analysis, we used the following additional R libraries.
suppressMessages({
suppressWarnings(
{
library(dplyr) # easier data frame manipulation
library(stringr) # string manipulation
library(ggplot2) # plotting charts
library(reshape2) # for data frame manipulation tools
})
})
The data for this analysis were provided in the form of a
comma-separated-value file compressed via the bzip2 algorithm. After
extracting the file, we loaded the data into a variable called
data.
data <- read.csv('repdata_data_StormData.csv')
head(data)
For the health data, we pulled out the EVTYPE (Event
Type), FATALITIES and INJURIES columns from
the data set. To remove inconsistencies in the naming conventions used
for the events, we also converted all events to title case.
health_data <- data[c("EVTYPE", "FATALITIES", "INJURIES")]
health_data$EVTYPE <- str_to_title(health_data$EVTYPE)
# Display first few rows
head(health_data)
For the economic data, we pulled out the EVTYPE column
again, as well as the PROPDMG (Property Damage) and
CROPDMG (Crop Damage) columns. We also converted the event
types to title case.
econ_data <- data[c("EVTYPE","PROPDMG", "CROPDMG")]
econ_data$EVTYPE <- str_to_title(econ_data$EVTYPE)
# Display first few rows
head(econ_data)
For public health impacts, we took the initial data and grouped it by
the event types and added column for the total number of fatalities and
injuries. Then we added the fatality rate column. This value was
calculated by dividing the total number of fatalities for each event and
dividing by the total of the number of fatalities and injuries. Because
there are a number of event types without any fatalities or injuries,
this calculation produced a number of NA values. We removed
these entries from the analysis.
# Group data by weather events
event_health_data <- health_data %>% group_by(EVTYPE) %>%
summarize(Injuries = sum(INJURIES),
Fatalities = sum(FATALITIES),
Total = sum(INJURIES) + sum(FATALITIES))
# Calculate fatality rates
event_health_data$Fatality_Rate = event_health_data$Fatalities /
event_health_data$Total
# Remove N/A, fatality rates caused by zero division
event_health_data <- event_health_data[complete.cases(event_health_data), ]
# Display the first few rows
head(event_health_data)
The first public health impact we analyzed was the severe weather
events with most total fatalities and injuries. Because we wanted to use
visualization techniques in the analysis, we selected just the
EVETYPE (Event Type), Fatalities, and
Injuries columns from the health data. We then sorted the
entries by the number of fatalities in ascending order and selected the
first ten of them.
# Get top 10 weather events with most fatalities / injuries
top_10_health_data <- event_health_data[c('EVTYPE', 'Injuries', 'Fatalities')]
top_10_health_data <- top_10_health_data[order(-top_10_health_data$Fatalities),
][1:10,]
head(top_10_health_data)
In order to visualize the total impact of the weather events with a stacked bar chart, we converted the table to long form by adding a health outcome column. And to improve readability of the plot, we also changed the column names to more appropriate titles.
# Convert table to long form to make plotting possible
long_top_10_health_data <- melt(top_10_health_data, id='EVTYPE')
# Rename columns to meaningful labels
colnames(long_top_10_health_data) <- c("Event", "Health_Outcome", "Count")
head(long_top_10_health_data)
To improve the readability of the plot, we adjusted the overall size of the figure and added appropriate labels and titles.
# Adjust plot size
options(repr.plot.width = 10, repr.plot.height = 10)
# Plot bar chart
p <- ggplot(long_top_10_health_data,
aes(x = Event, y = Count, fill = Health_Outcome)) +
geom_bar(stat = 'identity') +
labs(x="Weather Event",
y="Totals",
fill='',
title="Ten Weather Events with Most Fatalities and Injuries") +
scale_x_discrete(guide = guide_axis(angle = 20)) +
scale_fill_manual(values=c("deeppink3",
"darkorchid3"))
print(p)
Using the total number of fatalities and injuries as the metric of evaluation, tornadoes are the weather event type with greatest total health impact. There are 11.5 times more tornado incidents than the next highest event, excessive heat. And even if we only consider fatalities, tornado incidents caused almost 3 times as many fatalities as the next most common cause, again excessive heat. However, the total number of incidents may not be the best measure of public health impacts. Fatality rates may be more appropriate.
To evaluate which weather events have the highest fatality rates, we first compared the fatality rates of the events with highest totals. To compare these rates, we again calculated the fatality rates for these events and sorted them in ascending order of these rates.
# Calculate Fatality Rate for Top 10
top_10_health_data$Total <- (top_10_health_data$Fatalities +
top_10_health_data$Injuries)
top_10_health_data$Fatality_Rate <- top_10_health_data$Fatalities /
top_10_health_data$Total
# Sort by Fatality Rates
top_10_health_data <- top_10_health_data[order(-top_10_health_data$Fatality_Rate)
, ]
# Display results
top_10_health_data
As seen in the table above, the only weather event types with fatality rates above 50% are rip currents and avalanches with fatality rates of 61.3% and 56.9% respectively. Interestingly, the weather event with lowest fatality rate among these events with highest total of incidents were tornadoes. So tornadoes occur most frequently but cause the fewest fatalities as a proportion of the number of incidents.
But these are only the weather events with highest total occurrences. To expand our analysis, we then widened it to include all of the weather events in the data set.
Among the highest fatality rates are a number of weather events with a fatality rate of 1.0. In other words, the events that always caused fatalities. To get these data, we selected only weather events with fatality rates of 1.0 and sorted them in ascending order by the total number of fatalities and injuries.
# Events with fatality rate of 1.0
fatality_rate_1_health_data <- event_health_data[event_health_data$Fatality_Rate
== 1.0, ]
# Sort by total number of incidents in ascending order
fatality_rate_1_health_data <- fatality_rate_1_health_data[
order(-fatality_rate_1_health_data$Total), ]
# Get first few rows
head(fatality_rate_1_health_data)
Although these weather events warrant further exploration, we decided to exclude them in our analysis because they cause so few fatalities. Unseasonably warm and dry weather, which occurred most among those with fatality rates of 1.0, only had 29 incidents in the data set. Since we’re looking at the weather events with the greatest impacts, the impact of these events was minimal.
Next, we examined the weather events with fatality rates greater than 0.5 but less than 1.0. We then sorted the events by these rates in ascending order.
# Events with fatality rates above 0.5 and below 1.0
upper_half_fatality_rates <- event_health_data[(event_health_data$Fatality_Rate
< 1.0) & (event_health_data$Fatality_Rate >= 0.5), ]
# Sort by fatality rate in ascending order
upper_half_fatality_rates <- upper_half_fatality_rates[
order(-upper_half_fatality_rates$Fatality_Rate), ]
To further expand our analysis, we also plotted the data for the 10 events with the highest rates.
# Plot 10 highest fatality rates
p <- ggplot(data=upper_half_fatality_rates[1:10, ],
aes(x=EVTYPE, y=Fatality_Rate)) +
geom_bar(stat="identity", fill="deepskyblue3") +
labs(x="Weather Event",
y="Fatality Rates",
title="Ten Weather Events with Highest Fatality Rates") +
scale_x_discrete(guide = guide_axis(angle = 20))
print(p)
In the bar chart, we see that cold and wind chill had the highest fatality rate, which we calculated as 0.89. Second is Hurricane Erin with 0.88. Interestingly, extreme cold and wind chill was third at 0.84. However, in total these weather only accounted for 263 of all of the fatalities and injuries. Despite the high fatality rate of these events, it would be hard to argue with the pubic health impact of tornadoes with 96,979 total fatalities and injuries.
To determine which weather event had the greatest economic impact, we first grouped the data by these events and calculated the total damages for each of them. We also sorted them in ascending order by these totals.
# Group data by weather events
event_econ_data <- econ_data %>% group_by(EVTYPE) %>%
summarize(Property_Damage = sum(PROPDMG),
Crop_Damage = sum(CROPDMG),
Total = sum(PROPDMG) + sum(CROPDMG))
# Order by total damages in descending order
event_econ_data <- event_econ_data[order(-event_econ_data$Total), ]
# Display the few rows of the data
head(event_econ_data)
To help visualize the total impact of the most damaging weather events, we selected the ten events with highest totals and scaled the damages to the millions of dollars. Then to plot this data, we converted the table to long form by adding a damage type column and renamed the columns with more appropriate labels.
# Get top 10 weather events with most economic damages
top_10_econ_data <- event_econ_data[c('EVTYPE', 'Property_Damage',
'Crop_Damage')][1:10, ]
# Rescale for plotting
top_10_econ_data$Crop_Damage <- top_10_econ_data$Crop_Damage / 1000000
top_10_econ_data$Property_Damage <- top_10_econ_data$Property_Damage / 1000000
# Convert table to long form to make plotting possible
long_top_10_econ_data <- melt(top_10_econ_data, id='EVTYPE')
# Rename columns to meaningful labels
colnames(long_top_10_econ_data) <- c("Event", "Damage_Type", "Damages")
# Display the table
long_top_10_econ_data
Like the public health impacts above, we chose to plot the data using a stacked bar chart. This time we stacked property and crop damages.
# Adjust plot size
options(repr.plot.width = 10, repr.plot.height = 10)
# Plot bar chart
p <- ggplot(long_top_10_econ_data,
aes(x = Event, y = Damages, fill = Damage_Type)) +
geom_bar(stat = 'identity') +
labs(x="Weather Event",
y="Damages (millions)",
fill='',
title="Ten Weather Events with Most Economic Impact") +
scale_x_discrete(guide = guide_axis(angle = 20)) +
scale_fill_manual(values=c("olivedrab3",
"sienna3"))
print(p)
Although hail is responsible for the most crop damage, tornadoes cause the most damage by far. It’s responsible for more than twice as much damage as the next most damaging weather event, flash floods.