This report aims to answer for questions of what types of weather events are the most harmful for people’s health and the most ruinous from an economic point of view. It is based on US National Oceanic and Atmospheric Administration´s (NOAA) storm database, which contains several pieces of information about major weather events in the US between 1950 and 2011. We will focus on those regarding to the estimates of fatalities, injuries, property and crop damages.
First of all, we load the packages we are going to use in this report.
library(dplyr)
library(ggplot2)
We downloaded the data from this link and loaded it into R via read.csv function. It has been stored as a data frame in an object called data.
data <- read.csv("repdata_data_StormData.csv")
Since we are going to investigate which kinds of events are the most harmful and ruinous, we must take a look at what variable contains the names of the event types.
event_types <- length(unique(data$EVTYPE))
The variable is EVTYPE and there are 985 types of weather events in this dataset.
Now we are going to create a new data frame called harmfulness and assign to it some info from the original dataset. First we group the variable EVTYPE and then sum the numeric columns of fatalities and injuries (dropping any existent NA, if there is any) into a column called HARM. Finally we sort HARM in descending order so that we obtain the most harmful types of events with head function.
harmfulness <- data %>%
group_by(EVTYPE) %>%
summarise(HARM = sum(FATALITIES, INJURIES, na.rm = TRUE)) %>%
arrange(desc(HARM))
In order to get the estimate amount of economic losses for each type of event, we are going to need to do a little extra manipulation of the available data. There are two kinds of economical damage reported - property damage and crop damage. Both have estimates that need to be multiplied by the exponentials indicated in columns called PROPDMGEXP and CROPDMGEXP. First, let´s see what their class is and their unique values.
class(data$PROPDMGEXP)
## [1] "character"
unique(data$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
class(data$CROPDMGEXP)
## [1] "character"
unique(data$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
Basically, these values can be summarized as follows:
black/empty character = 0
(?) = 0
(-) = 0
(+) = 1
numeric 0:8 = 10
H,h = hundreds = 100
K,k = thousands = 1,000
M,m = millions = 1,000,000
B,b = billions = 1,000,000,000
Here is the manipulation that conducts to a data frame called damage, which is roughly the result of the multiplication between the estimates contained in the columns PROPDMG and CROPDMG and their respective exponentials obtained from the manipulation that takes place in the beginning of the following code.
damage <- data %>%
mutate(PROP_MULT = case_when(
PROPDMGEXP %in% c("", "-", "?") ~ 0,
PROPDMGEXP %in% c("+") ~ 1,
PROPDMGEXP %in% c("0", "1", "2", "3", "4",
"5", "6", "7", "8") ~ 10,
PROPDMGEXP %in% c("h", "H") ~ 100,
PROPDMGEXP %in% c("k", "K") ~ 1000,
PROPDMGEXP %in% c("m", "M") ~ 1000000,
PROPDMGEXP %in% c("b", "B") ~ 1000000000
)
) %>%
mutate(CROP_MULT = case_when(
CROPDMGEXP %in% c("", "-", "?") ~ 0,
CROPDMGEXP %in% c("+") ~ 1,
CROPDMGEXP %in% c("0", "1", "2", "3", "4",
"5", "6", "7", "8") ~ 10,
CROPDMGEXP %in% c("h", "H") ~ 100,
CROPDMGEXP %in% c("k", "K") ~ 1000,
CROPDMGEXP %in% c("m", "M") ~ 1000000,
CROPDMGEXP %in% c("b", "B") ~ 1000000000
)
)
damage <- damage %>%
mutate(PROP_LOSS = PROPDMG * PROP_MULT) %>%
mutate(CROP_LOSS = CROPDMG * CROP_MULT)
damage <- damage %>%
group_by(EVTYPE) %>%
summarise(DMG = sum(PROP_LOSS, CROP_LOSS, na.rm = TRUE)) %>%
arrange(desc(DMG))
Like harmfulness data frame, damage was also grouped by EVTYPE variable and, in turn, sorted in descending order by the new created variable DMG, that stands for the sum of property and crop losses.
Let´s see what we got from the data handled so far and do some plots.
With the next code we get the ten most harmful types of event.
head_HARM <- head(harmfulness, 10) %>%
mutate(EVTYPE = factor(EVTYPE,
levels = EVTYPE[order(HARM)]))
head_HARM
## # A tibble: 10 × 2
## EVTYPE HARM
## <fct> <dbl>
## 1 TORNADO 96979
## 2 EXCESSIVE HEAT 8428
## 3 TSTM WIND 7461
## 4 FLOOD 7259
## 5 LIGHTNING 6046
## 6 HEAT 3037
## 7 FLASH FLOOD 2755
## 8 ICE STORM 2064
## 9 THUNDERSTORM WIND 1621
## 10 WINTER STORM 1527
Now are the ten most economic ruinous types of event (losses turned in billions).
head_DMG <- head(damage, 10) %>%
mutate(EVTYPE = factor(EVTYPE,
levels = EVTYPE[order(DMG)])) %>%
mutate(DMG = DMG/1e+09)
head_DMG
## # A tibble: 10 × 2
## EVTYPE DMG
## <fct> <dbl>
## 1 FLOOD 150.
## 2 HURRICANE/TYPHOON 71.9
## 3 TORNADO 57.4
## 4 STORM SURGE 43.3
## 5 HAIL 18.8
## 6 FLASH FLOOD 17.6
## 7 DROUGHT 15.0
## 8 HURRICANE 14.6
## 9 RIVER FLOOD 10.1
## 10 ICE STORM 8.97
We can see in both headings that there is an outlier type of event, which far exceeds the second position. Because of it, we are going to plot the results zooming at the other nine types of events, without removing the outlier line though.
ggplot(head_HARM, aes(y = EVTYPE)) +
geom_segment(aes(
x = 0,
xend = HARM,
yend = EVTYPE),
linewidth = 2,
color = "darkred") +
geom_point(aes(x = HARM), size = 3, color = "steelblue") +
coord_cartesian(xlim = c(0, 10000)) +
labs(x = "fatalities/injuries",
y = "types of event",
title = "Top 10 harmful types of event 1950-2011")
When it comes to harmfulness, tornadoes greatly surpasses the other types of weather events, with estimates of 96979 fatalities/injuries over the years considered. On the other hand, in terms of economic damages, it is noticeable that flood events are the most ruinous ones, accounting for 150.3197 billion dollars in losses.
ggplot(head_DMG, aes(y = EVTYPE)) +
geom_segment(aes(
x = 0,
xend = DMG,
yend = EVTYPE),
linewidth = 2,
color = "darkred") +
geom_point(aes(x = DMG), size = 3, color = "steelblue") +
coord_cartesian(xlim = c(0, 75)) +
labs(x = "property & crop losses in $ billions",
y = "types of event",
title = "Top 10 ruinous types of event 1950-2011")