The following analysis takes a look at the damage caused by the various environmental events in the USA. These events were recorded in a time range from 1950 to 2011. The analysis groups the type of harm caused, into human damage and material damage. Whereby human damage summarises the number of fatal outcomes and outcomes with human injury by distinct environmental events. In the same manner, material damage is a summary of the dollar value of the property and crop damage caused by the events. Finally, the damage values are normalised in order to compare the magnitude of human and material damage caused.
The results indicate that tornadoes are clearly the most damaging environmental events. With regard to economic damage, tornadoes are followed by flash- and regular flooding, hail and thunderstorm winds. While excessive heat, regular flooding, lightning and thunderstorm winds are the most damaging for human health.
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
library(knitr)
# set global chunk options
opts_chunk$set(cache = TRUE, fig.align='center')
Download the data from the internet and crate a tbl for easier handling.
con <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url = con, destfile = "storm_data.csv.bz2", method = "curl", mode = "r")
data <- read.csv(bzfile("storm_data.csv.bz2", open = "r"))
unlink("storm_data.csv.bz2")
data <- tbl_df(data)
According to the documentation1 the variables that describe the event’s harmfulness “with respect to population health” are: FATALITIES and INJURIES. These two variables add up the human damage caused by environmental events. Furthermore, PROPDMG records property damage, while CROPDMG records crop damage. These two variables add up the material damage caused by environmental events.
Hence, the variables FATALITIES and INJURIES are summarised by event type to calculate the total human damage caused by every distinct environmental event. The variables PROPDMG and CROPDMG are summarized with the same method the calculate the total material damage.
# Summarise the data by human damage.
data_sum_civil <- data %>%
group_by(EVTYPE) %>%
summarise(fatal = sum(FATALITIES))
data_sum_civil <- left_join(data_sum_civil,
summarise(group_by(data, EVTYPE),
injury = sum(INJURIES)))
## Joining by: "EVTYPE"
data_sum_civil <- gather(data_sum_civil, H_Harm, H_Damage, -EVTYPE)
# Summarise the data by material damage.
data_sum_property <- data %>%
group_by(EVTYPE) %>%
summarise(property = sum(PROPDMG))
data_sum_property <- left_join(data_sum_property,
summarise(group_by(data, EVTYPE),
crop = sum(CROPDMG)))
## Joining by: "EVTYPE"
data_sum_property <- gather(data_sum_property, M_Harm, M_Damage, -EVTYPE)
# Join the summarised datasets.
data_sum <- full_join(data_sum_civil, data_sum_property, "EVTYPE")
Because the most damaging events are searched for, the observations with the value of 0 would skew the analysis. Therefore these observations are removed from the dataset.
data_sum <- data_sum[data_sum$H_Damage > 0 | data_sum$M_Damage > 0, ]
In order to compare the magnitude of human and material damage caused, their values are normalised.
# Normalise the damage values for comparison
normalise <- function(x) {
a <- (x-min(x))/(max(x)-min(x))
return(a)
}
## First subset only the most harmful 5% of the events
data_sum_norm <- data_sum[data_sum$H_Damage > quantile(data_sum$H_Damage, 0.95) |
data_sum$M_Damage > quantile(data_sum$M_Damage, 0.95), ]
human <- data_sum_norm %>%
group_by(EVTYPE) %>%
summarise(human = sum(H_Damage))
material <- data_sum_norm %>%
group_by(EVTYPE) %>%
summarise(material = sum(M_Damage))
data_sum_norm <- full_join(human, material, "EVTYPE")
## Then normalise these values between 0-1
data_sum_norm <- data_sum_norm %>%
mutate(human = normalise(human)) %>%
mutate(material = normalise(material)) %>%
gather(Harm, Damage, -EVTYPE)
The most harmful 5% of the environmental events with respect to human health.
ggplot(data_sum[data_sum$H_Damage > quantile(data_sum$H_Damage, 0.95), ],
aes(x = EVTYPE, y = H_Damage, colour = H_Harm)) +
geom_point(stat = "identity",
shape = 95,
size = 10) +
geom_text(aes(label = H_Damage),
size = 3,
vjust = -1) +
scale_colour_manual(values = c("#ca0020", "#404040")) +
labs(x = "Environmental event",
y = "Total number of victims",
colour = "Type of damage",
title = "Human damage caused by the most damaging 5% of
environmental events") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 35,
hjust = 1,
vjust = 1,
colour = "#404040"))
The most harmful 5% of the environmental events with respect to economic damage.
ggplot(data_sum[data_sum$M_Damage > quantile(data_sum$M_Damage, 0.95), ],
aes(x = EVTYPE, y = M_Damage, colour = M_Harm)) +
geom_point(stat = "identity",
shape = 95,
size = 10) +
geom_text(aes(label = round(M_Damage, digits = 0)),
size = 3,
vjust = -1) +
scale_colour_manual(values = c("#a6611a", "#018571")) +
labs(x = "Environmental event",
y = "Total damage in $",
colour = "Type of damage",
title = "Material damage caused by the most damaging 5% of
environmental events") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 35,
hjust = 1,
vjust = 1,
colour = "#404040"))
Comparison of human and material damage.
ggplot(data_sum_norm, aes(x = EVTYPE, y = Damage, fill = Harm)) +
#facet_grid(Harm ~ .) +
geom_bar(stat = "identity",
position = "dodge") +
scale_fill_manual(values = c("#d01c8b", "#4dac26")) +
labs(x = "Environmental event",
y = "Normalised damage",
fill = "Type of damage",
title = "Comparison of human and material damage caused by the most
damaging 5% of environmental events") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 35,
hjust = 1,
vjust = 1,
colour = "#404040"))
sessionInfo()
## R version 3.2.0 (2015-04-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.2 LTS
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=de_CH.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=de_CH.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=de_CH.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=de_CH.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitr_1.10 ggplot2_1.0.1 tidyr_0.2.0 dplyr_0.4.1
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.11.5 magrittr_1.5 MASS_7.3-39 munsell_0.4.2
## [5] colorspace_1.2-6 stringr_0.6.2 plyr_1.8.1 tools_3.2.0
## [9] parallel_3.2.0 grid_3.2.0 gtable_0.1.2 DBI_0.3.1
## [13] htmltools_0.2.6 lazyeval_0.1.10 yaml_2.1.13 assertthat_0.1
## [17] digest_0.6.8 reshape2_1.4.1 formatR_1.2 codetools_0.2-11
## [21] evaluate_0.7 rmarkdown_0.7 labeling_0.3 scales_0.2.4
## [25] proto_0.3-10