Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. This report explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, a database that tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
It will be analyzed which types of events are most harmful with respect to population health and which types of events have the greatest economic impact. For this all events tracked by the U.S. National Oceanic and Atmospheric Administration’s (NOAA) from 1950 till November 2011 will be considered.
The data are downloaded from a cloud space and subsequently extracted if the data file doesn’t already exist. All of the storm data are read into a data frame. Some transformations are performed such as combining the same events which were named differently and adjusting property and crop damage to get the right magnitude.
library(dplyr)
library(tidyr)
library(ggplot2)
options(scipen = 2, digits = 2)
# download data if it doesn't already exist
if (!file.exists("repdata-data-StormData.csv.bz2")) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
"repdata-data-StormData.csv.bz2", method="curl")
}
# extract data if not already done
if (!file.exists("repdata-data-StormData.csv")) {
unzip("repdata-data-StormData.csv.bz2")
}
# read in storms data
storms <- read.csv("repdata-data-StormData.csv")
# combine three different namings of thunderstorm wind
storms$EVTYPE[storms$EVTYPE == "TSTM WIND"] <- "THUNDERSTORM WIND"
storms$EVTYPE[storms$EVTYPE == "THUNDERSTORM WINDS"] <- "THUNDERSTORM WIND"
# adjust PROPDMG for PROPDMGEXP and CROPDMG for CROPDMGEXP
storms$PROPDMG[storms$PROPDMGEXP == "B"] <- storms$PROPDMG[storms$PROPDMGEXP == "B"] * 1e9
storms$PROPDMG[storms$PROPDMGEXP == "M"] <- storms$PROPDMG[storms$PROPDMGEXP == "M"] * 1e6
storms$PROPDMG[storms$PROPDMGEXP == "K"] <- storms$PROPDMG[storms$PROPDMGEXP == "K"] * 1e3
storms$CROPDMG[storms$CROPDMGEXP == "B"] <- storms$CROPDMG[storms$CROPDMGEXP == "B"] * 1e9
storms$CROPDMG[storms$CROPDMGEXP == "M"] <- storms$CROPDMG[storms$CROPDMGEXP == "M"] * 1e6
storms$CROPDMG[storms$CROPDMGEXP == "K"] <- storms$CROPDMG[storms$CROPDMGEXP == "K"] * 1e3
rows <- nrow(storms)
cols <- ncol(storms)
event_types <- length(levels(storms$EVTYPE))
total_fatalities <- sum(storms$FATALITIES)
total_injuries <- sum(storms$INJURIES)
total_prop_dmg <- sum(storms$PROPDMG)
total_crop_dmg <- sum(storms$CROPDMG)
There are 902297 events in the data set with 37 variables. The total number of fatalities by severe weather events was 15145, while the total number of injuries was 140528. There was a total property damage of 427.28 billion dollars and a total crop damage of 49.09 billion dollars.
First the events are processed to analyze how harmful they are in respect to population health. The total fatalities and injuries are calculated grouped by the 985 event types and sorted by the magnitude. For a graphical analysis, a tidy data set in long format is created containing the 10 most harmful event types.
# get total and relative fatalities and injuries grouped by event type
storm_harm <- group_by(storms, EVTYPE) %>%
summarize(total_fat = sum(FATALITIES), total_inj = sum(INJURIES),
rel_fat = sum(FATALITIES) * 100 / total_fatalities,
rel_inj = sum(INJURIES) * 100 / total_injuries,
rel_harm = (sum(FATALITIES) + sum(INJURIES)) * 100 / (total_fatalities + total_injuries)) %>%
arrange(desc(total_fat), desc(total_inj))
# get tidy data set of 10 most harmful event types
# get tidy data set of 10 most harmful event types
storm_harm_tidy <- filter(storm_harm, row_number() <= 10) %>%
gather(harm_type, total, c(total_fat, total_inj))
Furthermore the events are processed to analyze for the economic consequences. The total property and crop damage is calculated grouped by the 985 event types and sorted by the magnitude. For a graphical analysis, a tidy data set in long format is created containing the 10 most costly event types.
# get total property and crop damage grouped by event type
storm_damages <- group_by(storms, EVTYPE) %>%
summarize(total_prop = sum(PROPDMG), total_crop = sum(CROPDMG),
rel_prop = round(sum(PROPDMG) * 100 / total_prop_dmg, 2),
rel_crop = round(sum(CROPDMG) * 100 / total_crop_dmg, 2),
rel_damage = (sum(PROPDMG) + sum(CROPDMG)) * 100 / (total_prop_dmg + total_crop_dmg)) %>%
arrange(desc(total_prop), desc(total_crop))
# get tidy data set of 10 most costly event types
storm_damages_tidy <- filter(storm_damages, row_number() <= 10) %>%
gather(dmg_type, total, c(total_prop, total_crop)) %>%
mutate(total_bil = total / 1e9)
The processed data is presented in two ways, each for health as well as economic impact. First a table containing the 10 events with the most influence is shown along with the impact value. These events are then displayed in a plot to make it easier to compare the impact of the events.
These are the 10 most harmful events with respect to population health.
## Source: local data frame [983 x 6]
##
## EVTYPE total_fat total_inj rel_fat rel_inj rel_harm
## 1 TORNADO 5633 91346 37.2 65.00 62.30
## 2 EXCESSIVE HEAT 1903 6525 12.6 4.64 5.41
## 3 FLASH FLOOD 978 1777 6.5 1.26 1.77
## 4 HEAT 937 2100 6.2 1.49 1.95
## 5 LIGHTNING 816 5230 5.4 3.72 3.88
## 6 THUNDERSTORM WIND 701 9353 4.6 6.66 6.46
## 7 FLOOD 470 6789 3.1 4.83 4.66
## 8 RIP CURRENT 368 232 2.4 0.17 0.39
## 9 HIGH WIND 248 1137 1.6 0.81 0.89
## 10 AVALANCHE 224 170 1.5 0.12 0.25
## .. ... ... ... ... ... ...
The column EVTYPE shows the type of event that occured. total_fat and total_inj show the total fatalities and injuries that were the result of the event. rel_fat and rel_inj show the percentages of fatalities and injuries the event was responsible for. The total percentage for fatalities or injuries that were a result of the event can be seen in rel_harm.
For better visualization, the following graph also shows the total and relative fatalities and injuries of these 10 events in a bar plot.
# bar plot of 10 most harmful events types
ggplot(storm_harm_tidy, aes(EVTYPE, total, fill = harm_type)) + geom_bar(stat="identity") +
stat_identity(aes(y=-max(total)/20, label=sprintf("%.02f%%", rel_harm)), geom="text") +
labs(title = "Weather events most harmful to human health", x = "Event type", y = "People harmed") +
scale_fill_discrete("Type of harm", labels = c("Fatality", "Injury")) +
theme(plot.title=element_text(face="bold", size=15), axis.text.x = element_text(angle=90))
Tornadoes are responsible for by far the most fatalities and injuries (62.3%) followed by excessive heat (5.41%) and flash flood (1.77%).
These are the 10 events with the greatest economic consequences.
## Source: local data frame [983 x 6]
##
## EVTYPE total_prop total_crop rel_prop rel_crop rel_damage
## 1 FLOOD 1.4e+11 5.7e+09 33.9 11.53 31.6
## 2 HURRICANE/TYPHOON 6.9e+10 2.6e+09 16.2 5.31 15.1
## 3 TORNADO 5.7e+10 4.1e+08 13.3 0.85 12.0
## 4 STORM SURGE 4.3e+10 5.0e+03 10.1 0.00 9.1
## 5 FLASH FLOOD 1.6e+10 1.4e+09 3.8 2.90 3.7
## 6 HAIL 1.6e+10 3.0e+09 3.7 6.16 3.9
## 7 HURRICANE 1.2e+10 2.7e+09 2.8 5.59 3.1
## 8 THUNDERSTORM WIND 9.7e+09 1.2e+09 2.3 2.36 2.3
## 9 TROPICAL STORM 7.7e+09 6.8e+08 1.8 1.38 1.8
## 10 WINTER STORM 6.7e+09 2.7e+07 1.6 0.05 1.4
## .. ... ... ... ... ... ...
The column EVTYPE shows the type of event that occured. total_prop and total_crop show the total damage for property and crop in dollars that were the result of the event. rel_prop and rel_crop show the percentages of property and crop damage the event was responsible for. The total percentage for damages that were a result of the event can be seen in rel_damage.
For better visualization, the following graph also shows the total and relative damage for property and crop of these 10 events in a bar plot.
# bar plot of 10 most costly events types
ggplot(storm_damages_tidy, aes(EVTYPE, total_bil, fill = dmg_type)) + geom_bar(stat="identity") +
stat_identity(aes(y=-max(total_bil)/20, label=sprintf("%.02f%%", rel_damage)), geom="text") +
labs(title = "Weather events with greatest economic impact", x = "Event type", y = "Damage in billion $") +
scale_fill_discrete("Type of damage", labels = c("Property", "Crop")) +
theme(plot.title=element_text(face="bold", size=15), axis.text.x = element_text(angle=90))
Flood is responsible for the most property, crop, as well as total damage (31.56%), followed by different types of storms. While having a relative low relative total damage, hail has the second most impact on crop damage (6.16%).
The result of analyzing the severe weather events database is that tornados are the most dangerous events for population health by a large margin. While thunderstorm wind accounts for the second most harmful events to human health, this is only a tenth of that of tornadoes. When looking at the economic damages, flood is ahead followed by different kinds of storms which have an even higher impact than flood when combined. It is also worth to note that hail is responsible for the second most crop damage.