Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
The objective of this research is to answer the following questions:
The results shows that tornadoes are the most harmful and floods the most economic damaging events.
The data analysed comes from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property and crop damage. So the numbers presented here are an estimate according the NOOA instructions.
Not all the columns from the data are important to this study, and in the process of load of the data, only the following columns are used:
library(memisc)
library(dplyr)
library(stringr)
library(ggplot2)
# Initializes variables:
raw.data.dir <- "data/raw"
tidy.data.dir <- "data/tidy"
data.file <- "data/raw/StormData.csv.bz2"
url.file <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
# Verify the existence of data directory:
if(file.exists("data")){
unlink("data", recursive = TRUE) # Erases data directory and its files
}
# Creates new and empty data directories:
dir.create("data")
dir.create("data/tidy")
dir.create("data/raw")
# Downloading the storm data:
download.file(url = url.file, destfile = data.file, method = "wget")
## Load the raw data file:
if(file.exists(data.file)) {
data <- read.csv(data.file, stringsAsFactors = FALSE, strip.white = TRUE)
} else {
Stop("Data file not found.")
}
# Creates a function to convert PROPDMGEXP e CROPDMGEXP to numeric values:
convert.to.exp <- function(expn = "character") {
cases(
(expn == "B" | expn == "b") -> 9,
(expn == "M" | expn == "m") -> 6,
(expn == "K" | expn == "k") -> 3,
(expn == "H" | expn == "h") -> 2,
(!is.na(as.numeric(expn))) -> as.numeric(expn),
(is.na(as.numeric(expn))) -> 0
)
}
# Removes unused variables and transforms variables, using a dplyr chain:
data <- tbl_df(data) %>%
select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG,
CROPDMGEXP) %>% # Select variables
mutate(EVTYPE = toupper(EVTYPE)) %>% # Transforms values of EVTYPE
mutate(EVTYPE = str_trim(EVTYPE)) %>% # Removes spaces
mutate(PROPDMGEXP2 = convert.to.exp(PROPDMGEXP)) %>% # Inserts temp variables
mutate(CROPDMGEXP2 = convert.to.exp(CROPDMGEXP)) %>%
mutate(prop.damage = PROPDMG * 10 ^ PROPDMGEXP2) %>%
mutate(crop.damage = CROPDMG * 10 ^ CROPDMGEXP2) %>%
select(-PROPDMG, -PROPDMGEXP, -PROPDMGEXP2, -CROPDMG, -CROPDMGEXP,
-CROPDMGEXP2) %>% # Removes unused variables for analisys
rename(event.type = EVTYPE) %>% # Renames variables
rename(fatalities = FATALITIES) %>%
rename(injuries = INJURIES)
# Writes tidy data in a file:
write.csv(data, file = "data/tidy/severe-wheather-events.csv", row.names = FALSE)
The cleaned data is shown below:
# Print tidy data, ready for analisys:
print(data)
## Source: local data frame [902,297 x 5]
##
## event.type fatalities injuries prop.damage crop.damage
## 1 TORNADO 0 15 25000 0
## 2 TORNADO 0 0 2500 0
## 3 TORNADO 0 2 25000 0
## 4 TORNADO 0 2 2500 0
## 5 TORNADO 0 2 2500 0
## 6 TORNADO 0 6 2500 0
## 7 TORNADO 0 1 2500 0
## 8 TORNADO 0 0 2500 0
## 9 TORNADO 1 14 25000 0
## 10 TORNADO 0 0 25000 0
## .. ... ... ... ... ...
# Summarizes data by EVTYPE, creating data for analisys:
analisys.data <- data %>%
group_by(event.type) %>%
summarize(total.fatalities = sum(fatalities),
total.injuries = sum(injuries),
total.prop.damage = sum(prop.damage),
total.crop.damage = sum(crop.damage)) %>%
mutate(total.economic.damage = total.prop.damage + total.crop.damage)
# Creates a dataset of the 10 most fatal events:
most.fatal <- analisys.data %>%
select(event.type, total.fatalities) %>%
arrange(desc(total.fatalities)) %>%
slice(1:10) %>%
# Creates a new variable event.type as a factor, ordering by total.fatalities:
mutate(event.type2 = reorder(event.type, total.fatalities))
# Creates a dataset of the 10 most injurious events:
most.injurious <- analisys.data %>%
select(event.type, total.injuries) %>%
arrange(desc(total.injuries)) %>%
slice(1:10) %>%
# Creates a new variable event.type as a factor, ordering by total.fatalities:
mutate(event.type2 = reorder(event.type, total.injuries))
# Creates a dataset of the 10 most property damaging events:
most.prop.damaging <- analisys.data %>%
select(event.type, total.prop.damage) %>%
arrange(desc(total.prop.damage)) %>%
slice(1:10) %>%
# Converts currency to billion dollars:
mutate(total.prop.damage = total.prop.damage / 10 ^ 9) %>%
# Creates a new variable event.type as a factor, ordering by total.damage:
mutate(event.type2 = reorder(event.type, total.prop.damage))
# Creates a dataset of the 10 most crop damaging events:
most.crop.damaging <- analisys.data %>%
select(event.type, total.crop.damage) %>%
arrange(desc(total.crop.damage)) %>%
slice(1:10) %>%
# Converts currency to billions dollars:
mutate(total.crop.damage = total.crop.damage / 10 ^ 9) %>%
# Creates a new variable event.type as a factor, ordering by total.damage:
mutate(event.type2 = reorder(event.type, total.crop.damage))
# Creates a dataset of the 10 most economic damaging events:
most.economic.damaging <- analisys.data %>%
select(event.type, total.economic.damage) %>%
arrange(desc(total.economic.damage)) %>%
slice(1:10) %>%
# Converts currency to billions of dollars:
mutate(total.economic.damage = total.economic.damage / 10 ^ 9) %>%
# Creates a new variable event.type as a factor, ordering by total.damage:
mutate(event.type2 = reorder(event.type, total.economic.damage))
The results are divided in five subsections:
According to the results found, the 10 most fatal weather events are:
print(most.fatal[, c("event.type", "total.fatalities")])
## Source: local data frame [10 x 2]
##
## event.type total.fatalities
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
The bar graph of the 10 most fatal events is presented below:
# Creates and prints the graph of 10 most fatal events:
graph.most.fatal <- ggplot(most.fatal, aes(x = event.type2,
y = total.fatalities)) +
geom_bar(stat = "identity", colour = "black", fill = "red") +
xlab("Events") + ylab("Estimated number of fatalities") + ylim(0, 6000) +
ggtitle("10 Most Fatal Weather Events") + coord_flip()
print(graph.most.fatal)
According to the results found, the 10 most injurious weather events are:
print(most.injurious[, c("event.type", "total.injuries")])
## Source: local data frame [10 x 2]
##
## event.type total.injuries
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
The bar graph of the 10 most injurious events is presented below:
# Creates and prints the graph of 20 most injurious events
graph.most.injurious <- ggplot(most.injurious, aes(x = event.type2,
y = total.injuries)) +
geom_bar(stat = "identity", colour = "black", fill = "orange") +
xlab("Events") + ylab("Estimated number of injuries") +
ggtitle("10 Most Injurious Weather Events") + coord_flip()
print(graph.most.injurious)
According to the results found, the 10 most property damaging weather events are:
print(most.prop.damaging[, c("event.type", "total.prop.damage")])
## Source: local data frame [10 x 2]
##
## event.type total.prop.damage
## 1 FLOOD 144.657710
## 2 HURRICANE/TYPHOON 69.305840
## 3 TORNADO 56.947381
## 4 STORM SURGE 43.323536
## 5 FLASH FLOOD 16.822724
## 6 HAIL 15.735268
## 7 HURRICANE 11.868319
## 8 TROPICAL STORM 7.703891
## 9 WINTER STORM 6.688497
## 10 HIGH WIND 5.270046
Values are presented in billions of US Dollars.
print(most.crop.damaging[, c("event.type", "total.crop.damage")])
## Source: local data frame [10 x 2]
##
## event.type total.crop.damage
## 1 DROUGHT 13.972566
## 2 FLOOD 5.661968
## 3 RIVER FLOOD 5.029459
## 4 ICE STORM 5.022113
## 5 HAIL 3.025954
## 6 HURRICANE 2.741910
## 7 HURRICANE/TYPHOON 2.607873
## 8 FLASH FLOOD 1.421317
## 9 EXTREME COLD 1.312973
## 10 FROST/FREEZE 1.094186
Values are presented in billions of US Dollars.
According to the results found, the 10 most economic damaging weather events are:
print(most.economic.damaging[, c("event.type", "total.economic.damage")])
## Source: local data frame [10 x 2]
##
## event.type total.economic.damage
## 1 FLOOD 150.319678
## 2 HURRICANE/TYPHOON 71.913713
## 3 TORNADO 57.362334
## 4 STORM SURGE 43.323541
## 5 HAIL 18.761222
## 6 FLASH FLOOD 18.244041
## 7 DROUGHT 15.018672
## 8 HURRICANE 14.610229
## 9 RIVER FLOOD 10.148404
## 10 ICE STORM 8.967041
Values are presented in billions of US Dollars. Note that economic damage values are the sum of property and crop damages.
The bar graph of the 10 most economic damaging events is presented below:
# Creates and prints the graph of 20 most economic damaging events
graph.most.economic.damaging <- ggplot(most.economic.damaging,
aes(x = event.type2,
y = total.economic.damage)) +
geom_bar(stat = "identity", colour = "black", fill = "yellow2") +
xlab("Events") +
ylab("Estimated number of economic damages (in billions of US Dollars)") +
ggtitle("10 Most Economic Damaging Weather Events") + coord_flip()
print(graph.most.economic.damaging)
The accuracy of the data imposes limitations to the results:
heat.names <- grep("HEAT", analisys.data$event.type, value = TRUE)
print(heat.names)
## [1] "DROUGHT/EXCESSIVE HEAT" "EXCESSIVE HEAT"
## [3] "EXCESSIVE HEAT/DROUGHT" "EXTREME HEAT"
## [5] "HEAT" "HEAT DROUGHT"
## [7] "HEAT WAVE" "HEAT WAVE DROUGHT"
## [9] "HEAT WAVES" "HEAT/DROUGHT"
## [11] "HEATBURST" "RECORD HEAT"
## [13] "RECORD HEAT WAVE" "RECORD/EXCESSIVE HEAT"
The same occurs in other weather events, like “WIND” and “THUNDERSTORM”.
Some events are categorized with two or more event types, like “WINTER STORM/HIGH WINDS” and “HEAVY SNOW/HIGH WINDS/FREEZING”.
The estimates have been done since 1950, maybe using different procedures.
The database doesn’t take in account the inflation rate for economic damage estimates.