I explored the U.S. NOAA’storm database, focusing in the health, and economic impact of weather events during the 1996 - 2011 period. I found that excessive heat is responsible for more deaths that any other weather event; besides, tornadoes provoque more injuries that any other event, by far. In terms of economic impact, the hurricanes lead, by a lot, to more economic damage than any other event, specially in crops.
knitr::opts_chunk$set(echo = TRUE, cache = TRUE)
library(R.utils) # function to unzip data
library(tidyverse) # useful packages for data science
library(janitor) # clean variable names
library(lubridate) # makes easy handle dates
library(gridExtra) # help for plotting
url_nooa <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
path_compressed <- "Storm_data.csv.bz2"
path_uncompressed <- "Storm_data.csv"
# download data if they are not in the working directory, and uncompress them
if (!file.exists(path_compressed)) {
download.file(url = url_nooa, destfile = path_compressed)
R.utils::bunzip2(path_compressed, destname = path_uncompressed, remove = FALSE)
}
# loading, then cleaning names
storm_raw <- readr::read_csv(path_uncompressed) %>% clean_names()
glimpse(storm_raw)
## Observations: 902,297
## Variables: 37
## $ state <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ bgn_date <chr> "4/18/1950 0:00:00", "4/18/1950 0:00:00", "2/20/19...
## $ bgn_time <chr> "0130", "0145", "1600", "0900", "1500", "2000", "0...
## $ time_zone <chr> "CST", "CST", "CST", "CST", "CST", "CST", "CST", "...
## $ county <dbl> 97, 3, 57, 89, 43, 77, 9, 123, 125, 57, 43, 9, 73,...
## $ countyname <chr> "MOBILE", "BALDWIN", "FAYETTE", "MADISON", "CULLMA...
## $ state_2 <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "A...
## $ evtype <chr> "TORNADO", "TORNADO", "TORNADO", "TORNADO", "TORNA...
## $ bgn_range <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ bgn_azi <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ bgn_locati <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ end_date <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ end_time <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ county_end <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ countyendn <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ end_range <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ end_azi <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ end_locati <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ length <dbl> 14.0, 2.0, 0.1, 0.0, 0.0, 1.5, 1.5, 0.0, 3.3, 2.3,...
## $ width <dbl> 100, 150, 123, 100, 150, 177, 33, 33, 100, 100, 40...
## $ f <int> 3, 2, 2, 2, 2, 2, 2, 1, 3, 3, 1, 1, 3, 3, 3, 4, 1,...
## $ mag <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ fatalities <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 4, 0,...
## $ injuries <dbl> 15, 0, 2, 2, 2, 6, 1, 0, 14, 0, 3, 3, 26, 12, 6, 5...
## $ propdmg <dbl> 25.0, 2.5, 25.0, 2.5, 2.5, 2.5, 2.5, 2.5, 25.0, 25...
## $ propdmgexp <chr> "K", "K", "K", "K", "K", "K", "K", "K", "K", "K", ...
## $ cropdmg <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ cropdmgexp <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ wfo <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ stateoffic <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ zonenames <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ latitude <dbl> 3040, 3042, 3340, 3458, 3412, 3450, 3405, 3255, 33...
## $ longitude <dbl> 8812, 8755, 8742, 8626, 8642, 8748, 8631, 8558, 87...
## $ latitude_e <dbl> 3051, 0, 0, 0, 0, 0, 0, 0, 3336, 3337, 3402, 3404,...
## $ longitude_2 <dbl> 8806, 0, 0, 0, 0, 0, 0, 0, 8738, 8737, 8644, 8640,...
## $ remarks <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ refnum <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,...
The variables with information of the effect are bgn_date, the year the event started, evtype, the event type, injuries, the amount of injured people, fatalities, the toll caused by the weather event, propdmg, property damage rounded up to three significative digits, propdmgexp, magnitude of the property damage, cropdmg, crop damage up to three signifivative digits, and cropdmgexp, magnitude of the crop damage.
# select useful variables, extract year of the events, and transform character variables
storm_all <- storm_raw %>%
select(bgn_date, evtype, injuries, fatalities, propdmg:cropdmgexp) %>%
mutate(year = year(mdy_hms(bgn_date)), evtype = str_to_title(evtype),
propdmgexp = str_to_lower(propdmgexp), cropdmgexp = str_to_lower(cropdmgexp)) %>%
select(-bgn_date)
The yearly numbers of events is shown in the next plot.
storm_all %>% count(year) %>% plot(type = "b", pch = 19, cex = .5, ann = FALSE)
title(xlab = "Year", ylab = "Number of events", main = "Yearly number of events")
abline(v = 1996, col = "forestgreen")
grid()
The earlier years have fewer events recorded, likely due to a lack of recording; I will do the analysis with events since 1996, and with values bigger than zero.
storm_recent <- filter(storm_all, year >= 1996,
propdmg > 0 | cropdmg > 0 | injuries > 0 | fatalities > 0)
The next table displays the top ten weather events in injuries.
storm_recent %>%
count(evtype, wt = injuries, sort = T) %>% rename(events = n) %>%
head(10) %>% knitr::kable()
| evtype | events |
|---|---|
| tornado | 20667 |
| flood | 6758 |
| excessive heat | 6391 |
| lightning | 4141 |
| tstm wind | 3629 |
| flash flood | 1674 |
| thunderstorm wind | 1400 |
| winter storm | 1292 |
| hurricane/typhoon | 1275 |
| heat | 1222 |
Ttsm Wind and Thunderstorm Wind correspond to the same event type; nevertheles, their records are splited in two different evtype. This behaviour is repeated in other cases, e. g. Hurricane and Hurricane/Typhoon. In this step I gather the several forms of each evtype.
patterns <- c("hurrican.+" = "hurricane", "typhoon" = "hurricane", "wind " = "winds ",
"heavy rain.+" = "heavy rain", "rains " = "rain", "hvy" = "heavy",
"tstm" = "thunderstorm", "thundertorm" = "thunderstorm",
"thunderstormw" = "thunderstorm", "thunderstorms" = "thunderstorm",
"thundeerstorm" = "thunderstorm", "thunderestorm" = "thunderstorm",
"\\s+" = " ", "\\." = "", "(|)" = "", "^\\s+" = "", "\\s+$" = "")
# lowercase, them replace patterns, finally coerse evtype to title
storm_final <- storm_recent %>%
mutate(evtype = str_to_lower(evtype) %>% str_replace_all(patterns) %>% str_to_title())
Reshape storm events to tidy format:
# reshaping, tidy format
storm_economic_tidy <- storm_final %>%
select(-injuries, -fatalities) %>%
gather(type_damage, cost, propdmg, cropdmg) %>%
gather(type_mag, magnitude, propdmgexp, cropdmgexp)
In order to obtain the actual economic effect of the weather events we have to multiply the cost by \(10 ^ {magnitude}\), the damage is displayed in 2011 dollars. First, we coerse the values in magnitude to the appropiate exponent: \(0 = 0\), \(k = 3\), \(m = 6\), \(b = 9\).
storm_economic_tidy$magnitude %>% unique
## [1] "k" NA "m" "b" "0"
# replace NA with zeros, coerse variables to exponents
storm_economic_middle <- storm_economic_tidy %>%
mutate(magnitude = coalesce(magnitude, "0"),
magnitude = str_replace_all(magnitude, c("k" = "3", "m" = "6", b = "9")),
magnitude = as.integer(magnitude))
Now we calculate the actual economic damage by event.
# function to calculate actual damage
actual_damage <- function(cost, mag) {
if (near(cost, 0)) 0 else cost * 10 ^ mag
}
# actual damage by event
storm_economic_final <- storm_economic_middle %>%
mutate(damage = map2_dbl(cost, magnitude, .f = actual_damage))
Aggregate total injuries, and fatalities by event type.
storm_health <- storm_final %>%
group_by(evtype) %>%
summarise(injuries = sum(injuries, na.rm = TRUE), fatalities = sum(fatalities, na.rm = TRUE)) %>%
arrange(desc(fatalities))
Total economic damage by event type and kind of damage, either property or crop.
storm_economic <- storm_economic_final %>%
group_by(evtype, type_damage) %>% summarise(damage = sum(damage, na.rm = TRUE)) %>%
arrange(desc(damage)) %>% ungroup()
The following panel displays the top ten weather events by the number of deaths, and by fatalities they caused since 1996 until 2011.
# ordering
storm_health_ordered <- storm_health %>%
mutate(evtype = factor(evtype, levels = evtype))
# fatalities
gg_fatalities <- storm_health_ordered %>% head(10) %>%
ggplot(aes(evtype, fatalities, fill = evtype)) +
geom_bar(stat = "identity") +
labs(x = NULL, y = "Fatalities",
title = "Total of fatalities by Event Type during 1996 - 2011") +
scale_fill_brewer(palette = "Paired") +
theme(legend.position = "none")
# injuries
gg_injuries <- storm_health_ordered %>%
arrange(desc(injuries)) %>% head(10) %>%
ggplot(aes(evtype, injuries, fill = evtype)) +
geom_bar(stat = "identity") +
labs(x = "Event type", y = "Injuries") +
ggtitle("Total of injuries by Event Type during 1996 - 2011") +
scale_fill_brewer(palette = "Paired") +
theme(legend.position = "none")
# arrange both barplots
grid.arrange(gg_fatalities, gg_injuries, nrow = 2)
In one hand Excessive Heat is responsible of more deaths than any other type of weather event; on the other hand, Tornado lead to more injuires than any other weather event.
# top 10, and ordering
top_10_damage <- storm_economic %>% count(evtype, wt = damage, sort = T) %>% head(10)
# subseting
storm_economic_ordered <- storm_economic %>%
semi_join(top_10_damage, by = "evtype") %>%
mutate(evtype = factor(evtype, levels = top_10_damage$evtype),
type_damage = ifelse(type_damage == "cropdmg", "Crop", "Property"))
# plotting
storm_economic_ordered %>%
ggplot(aes(evtype, damage, fill = type_damage)) +
geom_bar(stat = "identity", position = "stack") +
scale_fill_brewer(palette = "Dark2") +
labs(x = "Event type", y = "Damage in dollars", fill = "Affected") +
ggtitle("Total economic damage by Event Type during 1996 - 2011") +
theme(legend.position = c(.92, .87))
Globally, hurricanes is the most harmful weather event in terms of economic damage, specially to crops; however, floods produced more damage to properties than any other weather event.
devtools::session_info()
## Session info -------------------------------------------------------------
## setting value
## version R version 3.4.3 (2017-11-30)
## system x86_64, mingw32
## ui RTerm
## language (EN)
## collate Spanish_Mexico.1252
## tz America/Mexico_City
## date 2018-01-08
## Packages -----------------------------------------------------------------
## package * version date source
## backports 1.1.2 2017-12-13 CRAN (R 3.4.3)
## base * 3.4.3 2017-12-06 local
## compiler 3.4.3 2017-12-06 local
## datasets * 3.4.3 2017-12-06 local
## devtools 1.13.4 2017-11-09 CRAN (R 3.4.2)
## digest 0.6.13 2017-12-14 CRAN (R 3.4.3)
## evaluate 0.10.1 2017-06-24 CRAN (R 3.4.1)
## graphics * 3.4.3 2017-12-06 local
## grDevices * 3.4.3 2017-12-06 local
## htmltools 0.3.6 2017-04-28 CRAN (R 3.4.0)
## knitr 1.18 2017-12-27 CRAN (R 3.4.3)
## magrittr 1.5 2014-11-22 CRAN (R 3.4.0)
## memoise 1.1.0 2017-04-21 CRAN (R 3.4.1)
## methods * 3.4.3 2017-12-06 local
## Rcpp 0.12.14 2017-11-23 CRAN (R 3.4.3)
## rmarkdown 1.8 2017-11-17 CRAN (R 3.4.3)
## rprojroot 1.3-2 2018-01-03 CRAN (R 3.4.3)
## stats * 3.4.3 2017-12-06 local
## stringi 1.1.6 2017-11-17 CRAN (R 3.4.2)
## stringr 1.2.0 2017-02-18 CRAN (R 3.4.0)
## tools 3.4.3 2017-12-06 local
## utils * 3.4.3 2017-12-06 local
## withr 2.1.1 2017-12-19 CRAN (R 3.4.3)
## yaml 2.1.16 2017-12-12 CRAN (R 3.4.3)