We analyze the NOAA storm dataset, which covers information on severe weather events spanning from 1951 to 2011. We examine the human and economic implications of these storms. We group up weather events which have no material difference between them and present the top 10 here for brevity. In particular, we find which disasters are the most deadly and economically devastating, to help policymakers plan accordingly.
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.6.3
## -- Attaching packages -------------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.0 v purrr 0.3.3
## v tibble 2.1.3 v dplyr 0.8.3
## v tidyr 1.0.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## Warning: package 'tidyr' was built under R version 3.6.3
## Warning: package 'purrr' was built under R version 3.6.3
## -- Conflicts ----------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(glue)
##
## Attaching package: 'glue'
## The following object is masked from 'package:dplyr':
##
## collapse
library(magrittr)
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
##
## set_names
## The following object is masked from 'package:tidyr':
##
## extract
setwd("//chws3092/PPM Admin File/Pricing Product General/Users/Carey C/Coursera/Reproducible Research")
activity <- read.csv("data/stormdata.csv.bz2", stringsAsFactors = FALSE)
Now let’s let’s clean up event types and look at the number of fatalities. We see that there are many categories of observations, some with very similar names. We clean up the different types of capitalization and whitespace to ensure events that are really the same do not get tallied differently. We only count ones with a significant amount of observations as well. We feel that only events that are most prevalent should really be a concern. We also convert the economic damage into a number of dollars, for both property and crop damage, using the correct “exponent” in the respective exponents column. If this column does not make sense, we ignore these rows.
new_activity <- activity %>%
filter(!grepl("Summary", EVTYPE)) %>%
mutate(EVTYPE = toupper(EVTYPE)) %>%
mutate(EVTYPE = gsub(" ", "", EVTYPE)) %>%
mutate(EVTYPE = gsub("HURRICANE/TYPHOON", "HURRICANE", EVTYPE)) %>%
mutate(EVTYPE = gsub("FLASHFLOOD", "FLOOD", EVTYPE)) %>%
mutate(EVTYPE = ifelse(EVTYPE == "FLASHFLOODING", "FLASHFLOOD", EVTYPE)) %>%
mutate(EVTYPE = ifelse(EVTYPE == "TSTMWIND", "TSTORM", EVTYPE)) %>%
mutate(EVTYPE = ifelse(EVTYPE == "THUNDERSTORM", "TSTORM", EVTYPE)) %>%
mutate(EVTYPE = ifelse(EVTYPE == "THUNDERSTORMWIND", "TSTORM", EVTYPE)) %>%
mutate(EVTYPE = ifelse(EVTYPE == "EXCESSIVEHEAT", "HEAT", EVTYPE)) %>%
mutate(PROPDMG = ifelse(PROPDMGEXP == "K", PROPDMG * 10^3, PROPDMG)) %>%
mutate(PROPDMG = ifelse(PROPDMGEXP == "m", PROPDMG * 10^6, PROPDMG)) %>%
mutate(PROPDMG = ifelse(PROPDMGEXP == "M", PROPDMG * 10^6, PROPDMG)) %>%
mutate(PROPDMG = ifelse(PROPDMGEXP == "B", PROPDMG * 10^9, PROPDMG)) %>%
mutate(CROPDMG = ifelse(CROPDMGEXP == "K", CROPDMG * 10^3, CROPDMG)) %>%
mutate(CROPDMG = ifelse(CROPDMGEXP == "m", CROPDMG * 10^6, CROPDMG)) %>%
mutate(CROPDMG = ifelse(CROPDMGEXP == "M", CROPDMG * 10^6, CROPDMG)) %>%
mutate(CROPDMG = ifelse(CROPDMGEXP == "B", CROPDMG * 10^9, CROPDMG))
health <- new_activity %>%
group_by(EVTYPE) %>%
summarise(FATALITIES = sum(as.numeric(FATALITIES)),
INJURIES = sum(as.numeric(INJURIES)))
damage <- new_activity %>%
group_by(EVTYPE) %>%
summarise(PROPDMG = sum(as.numeric(PROPDMG)),
CROPDMG = sum(as.numeric(CROPDMG)))
casualties <- health[order(-health$INJURIES), ] %>%
head(10) %>%
pivot_longer(cols = c("FATALITIES", "INJURIES"), values_to = "Statistic")
economic <- damage[order(-damage$PROPDMG), ] %>%
head(10) %>%
pivot_longer(cols = c("PROPDMG", "CROPDMG"), values_to = "Statistic")
First we look at the causes of the most casualities
casualties %>%
ggplot(aes(fill = name, y = Statistic , x = reorder(EVTYPE, Statistic))) +
geom_bar(stat = "identity") +
coord_flip() +
labs(x = "Event", y = "Number of Injuries", title = "Number of injuries",
caption = "Top 10 most dangerous weather events.
Tornadoes are a leading cause of injuries. \nTo better address this, residents should be advised to take tornado warnings seriously\nand take proper precautions. \nInvestments in tornado shelters in high-risk areas may also be wise.")
We now look into economic damages:
economic %>%
ggplot(aes(fill = name, y = Statistic / 10^9 , x = reorder(EVTYPE, Statistic))) +
geom_bar(stat = "identity") +
coord_flip() +
labs(x = "Event", y = "Damage", title = "Damage (billions of $)",
caption = "Top 10 most dangerous weather events.
Flooding is a leading cause of property damage. \nTo better address this, residents should be advised to ensure flood insurance,\nespecially in high-risk areas.")