Sypnosis

I explored the U.S. NOAA’storm database, focusing in the health, and economic impact of weather events during the 1996 - 2011 period. I found that excessive heat is responsible for more deaths that any other weather event; besides, tornadoes provoque more injuries that any other event, by far. In terms of economic impact, the hurricanes lead, by a lot, to more economic damage than any other event, specially in crops.

Data Processing

Settings

knitr::opts_chunk$set(echo = TRUE, cache = TRUE) 
library(R.utils)   # function to unzip data
library(tidyverse) # useful packages for data science     
library(janitor)   # clean variable names
library(lubridate) # makes easy handle dates
library(gridExtra) # help for plotting

Data downloading

url_nooa <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
path_compressed   <- "Storm_data.csv.bz2"
path_uncompressed <- "Storm_data.csv"
# download data if they are not in the working directory, and uncompress them
if (!file.exists(path_compressed)) {
  download.file(url = url_nooa, destfile = path_compressed)
  R.utils::bunzip2(path_compressed, destname = path_uncompressed, remove = FALSE)
}

Data loading

# loading, then cleaning names
storm_raw <- readr::read_csv(path_uncompressed) %>% clean_names()

Data structure

glimpse(storm_raw)
## Observations: 902,297
## Variables: 37
## $ state       <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ bgn_date    <chr> "4/18/1950 0:00:00", "4/18/1950 0:00:00", "2/20/19...
## $ bgn_time    <chr> "0130", "0145", "1600", "0900", "1500", "2000", "0...
## $ time_zone   <chr> "CST", "CST", "CST", "CST", "CST", "CST", "CST", "...
## $ county      <dbl> 97, 3, 57, 89, 43, 77, 9, 123, 125, 57, 43, 9, 73,...
## $ countyname  <chr> "MOBILE", "BALDWIN", "FAYETTE", "MADISON", "CULLMA...
## $ state_2     <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "A...
## $ evtype      <chr> "TORNADO", "TORNADO", "TORNADO", "TORNADO", "TORNA...
## $ bgn_range   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ bgn_azi     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ bgn_locati  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ end_date    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ end_time    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ county_end  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ countyendn  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ end_range   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ end_azi     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ end_locati  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ length      <dbl> 14.0, 2.0, 0.1, 0.0, 0.0, 1.5, 1.5, 0.0, 3.3, 2.3,...
## $ width       <dbl> 100, 150, 123, 100, 150, 177, 33, 33, 100, 100, 40...
## $ f           <int> 3, 2, 2, 2, 2, 2, 2, 1, 3, 3, 1, 1, 3, 3, 3, 4, 1,...
## $ mag         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ fatalities  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 4, 0,...
## $ injuries    <dbl> 15, 0, 2, 2, 2, 6, 1, 0, 14, 0, 3, 3, 26, 12, 6, 5...
## $ propdmg     <dbl> 25.0, 2.5, 25.0, 2.5, 2.5, 2.5, 2.5, 2.5, 25.0, 25...
## $ propdmgexp  <chr> "K", "K", "K", "K", "K", "K", "K", "K", "K", "K", ...
## $ cropdmg     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ cropdmgexp  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ wfo         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ stateoffic  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ zonenames   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ latitude    <dbl> 3040, 3042, 3340, 3458, 3412, 3450, 3405, 3255, 33...
## $ longitude   <dbl> 8812, 8755, 8742, 8626, 8642, 8748, 8631, 8558, 87...
## $ latitude_e  <dbl> 3051, 0, 0, 0, 0, 0, 0, 0, 3336, 3337, 3402, 3404,...
## $ longitude_2 <dbl> 8806, 0, 0, 0, 0, 0, 0, 0, 8738, 8737, 8644, 8640,...
## $ remarks     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ refnum      <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,...

Selecting useful variables

The variables with information of the effect are bgn_date, the year the event started, evtype, the event type, injuries, the amount of injured people, fatalities, the toll caused by the weather event, propdmg, property damage rounded up to three significative digits, propdmgexp, magnitude of the property damage, cropdmg, crop damage up to three signifivative digits, and cropdmgexp, magnitude of the crop damage.

# select useful variables, extract year of the events, and transform character variables
storm_all <- storm_raw %>% 
  select(bgn_date, evtype, injuries, fatalities, propdmg:cropdmgexp) %>% 
  mutate(year = year(mdy_hms(bgn_date)), evtype = str_to_title(evtype), 
         propdmgexp = str_to_lower(propdmgexp), cropdmgexp = str_to_lower(cropdmgexp)) %>%
  select(-bgn_date) 

Filter observations

The yearly numbers of events is shown in the next plot.

storm_all %>% count(year) %>% plot(type = "b", pch = 19, cex = .5, ann = FALSE)
title(xlab = "Year", ylab = "Number of events", main = "Yearly number of events")
abline(v = 1996, col = "forestgreen")
grid()

The earlier years have fewer events recorded, likely due to a lack of recording; I will do the analysis with events since 1996, and with values bigger than zero.

storm_recent <- filter(storm_all, year >= 1996, 
                       propdmg > 0 | cropdmg > 0 | injuries > 0 | fatalities > 0)

Event Type Preprocessing

The next table displays the top ten weather events in injuries.

storm_recent %>%
  count(evtype, wt = injuries, sort = T) %>% rename(events = n) %>% 
  head(10) %>% knitr::kable()
evtype events
tornado 20667
flood 6758
excessive heat 6391
lightning 4141
tstm wind 3629
flash flood 1674
thunderstorm wind 1400
winter storm 1292
hurricane/typhoon 1275
heat 1222

Ttsm Wind and Thunderstorm Wind correspond to the same event type; nevertheles, their records are splited in two different evtype. This behaviour is repeated in other cases, e. g. Hurricane and Hurricane/Typhoon. In this step I gather the several forms of each evtype.

patterns <- c("hurrican.+" = "hurricane", "typhoon" = "hurricane", "wind " = "winds ",
              "heavy rain.+" = "heavy rain", "rains " = "rain", "hvy" = "heavy",
              "tstm" = "thunderstorm", "thundertorm" = "thunderstorm", 
              "thunderstormw" = "thunderstorm", "thunderstorms" = "thunderstorm",
              "thundeerstorm" = "thunderstorm", "thunderestorm" = "thunderstorm",
              "\\s+" = " ", "\\." = "", "(|)" = "", "^\\s+" = "", "\\s+$" = "")
# lowercase, them replace patterns, finally coerse evtype to title
storm_final <- storm_recent %>%
  mutate(evtype = str_to_lower(evtype) %>% str_replace_all(patterns) %>% str_to_title())

Preprocess to economic effect

Reshape storm events to tidy format:

# reshaping, tidy format
storm_economic_tidy <- storm_final %>% 
  select(-injuries, -fatalities) %>% 
  gather(type_damage, cost, propdmg, cropdmg) %>% 
  gather(type_mag, magnitude, propdmgexp, cropdmgexp)

In order to obtain the actual economic effect of the weather events we have to multiply the cost by \(10 ^ {magnitude}\), the damage is displayed in 2011 dollars. First, we coerse the values in magnitude to the appropiate exponent: \(0 = 0\), \(k = 3\), \(m = 6\), \(b = 9\).

storm_economic_tidy$magnitude %>% unique
## [1] "k" NA  "m" "b" "0"
# replace NA with zeros, coerse variables to exponents
storm_economic_middle <- storm_economic_tidy %>% 
  mutate(magnitude = coalesce(magnitude, "0"),
         magnitude = str_replace_all(magnitude, c("k" = "3", "m" = "6", b = "9")),
         magnitude = as.integer(magnitude))

Now we calculate the actual economic damage by event.

# function to calculate actual damage
actual_damage <- function(cost, mag) {
  if (near(cost, 0)) 0 else cost * 10 ^ mag
}
# actual damage by event
storm_economic_final <- storm_economic_middle %>% 
  mutate(damage = map2_dbl(cost, magnitude, .f = actual_damage))

Aggregation of effect on health

Aggregate total injuries, and fatalities by event type.

storm_health <- storm_final %>% 
  group_by(evtype) %>% 
  summarise(injuries = sum(injuries, na.rm = TRUE), fatalities = sum(fatalities, na.rm = TRUE)) %>% 
  arrange(desc(fatalities))

Aggregation of economic effect

Total economic damage by event type and kind of damage, either property or crop.

storm_economic <- storm_economic_final %>% 
  group_by(evtype, type_damage) %>% summarise(damage = sum(damage, na.rm = TRUE)) %>% 
  arrange(desc(damage)) %>% ungroup()

Results

Effect on health

The following panel displays the top ten weather events by the number of deaths, and by fatalities they caused since 1996 until 2011.

# ordering
storm_health_ordered <- storm_health %>%
  mutate(evtype = factor(evtype, levels = evtype))
# fatalities
gg_fatalities <- storm_health_ordered %>%  head(10) %>% 
  ggplot(aes(evtype, fatalities, fill = evtype)) +
  geom_bar(stat = "identity") +
  labs(x = NULL, y = "Fatalities",
       title = "Total of fatalities by Event Type during 1996 - 2011") + 
  scale_fill_brewer(palette = "Paired") +
  theme(legend.position = "none")
# injuries
gg_injuries <- storm_health_ordered %>%
  arrange(desc(injuries)) %>% head(10) %>% 
  ggplot(aes(evtype, injuries, fill = evtype)) +
  geom_bar(stat = "identity") +
  labs(x = "Event type", y = "Injuries") +
  ggtitle("Total of injuries by Event Type during 1996 - 2011") + 
  scale_fill_brewer(palette = "Paired") +
  theme(legend.position = "none")
# arrange both barplots  
grid.arrange(gg_fatalities, gg_injuries, nrow = 2)     

In one hand Excessive Heat is responsible of more deaths than any other type of weather event; on the other hand, Tornado lead to more injuires than any other weather event.

Across the U.S, which types of events have the greatest economic consequences?

Economic Effect

# top 10, and ordering
top_10_damage <- storm_economic %>% count(evtype, wt = damage, sort = T) %>% head(10)
# subseting  
storm_economic_ordered <- storm_economic %>%    
  semi_join(top_10_damage, by = "evtype") %>% 
  mutate(evtype = factor(evtype, levels = top_10_damage$evtype),
         type_damage = ifelse(type_damage == "cropdmg", "Crop", "Property"))
# plotting
storm_economic_ordered %>%
  ggplot(aes(evtype, damage, fill = type_damage)) +
  geom_bar(stat = "identity", position = "stack") +
  scale_fill_brewer(palette = "Dark2") +
  labs(x = "Event type", y = "Damage in dollars", fill = "Affected") +
  ggtitle("Total economic damage by Event Type during 1996 - 2011") +
  theme(legend.position = c(.92, .87))

Globally, hurricanes is the most harmful weather event in terms of economic damage, specially to crops; however, floods produced more damage to properties than any other weather event.

Session info

devtools::session_info()
## Session info -------------------------------------------------------------
##  setting  value                       
##  version  R version 3.4.3 (2017-11-30)
##  system   x86_64, mingw32             
##  ui       RTerm                       
##  language (EN)                        
##  collate  Spanish_Mexico.1252         
##  tz       America/Mexico_City         
##  date     2018-01-08
## Packages -----------------------------------------------------------------
##  package   * version date       source        
##  backports   1.1.2   2017-12-13 CRAN (R 3.4.3)
##  base      * 3.4.3   2017-12-06 local         
##  compiler    3.4.3   2017-12-06 local         
##  datasets  * 3.4.3   2017-12-06 local         
##  devtools    1.13.4  2017-11-09 CRAN (R 3.4.2)
##  digest      0.6.13  2017-12-14 CRAN (R 3.4.3)
##  evaluate    0.10.1  2017-06-24 CRAN (R 3.4.1)
##  graphics  * 3.4.3   2017-12-06 local         
##  grDevices * 3.4.3   2017-12-06 local         
##  htmltools   0.3.6   2017-04-28 CRAN (R 3.4.0)
##  knitr       1.18    2017-12-27 CRAN (R 3.4.3)
##  magrittr    1.5     2014-11-22 CRAN (R 3.4.0)
##  memoise     1.1.0   2017-04-21 CRAN (R 3.4.1)
##  methods   * 3.4.3   2017-12-06 local         
##  Rcpp        0.12.14 2017-11-23 CRAN (R 3.4.3)
##  rmarkdown   1.8     2017-11-17 CRAN (R 3.4.3)
##  rprojroot   1.3-2   2018-01-03 CRAN (R 3.4.3)
##  stats     * 3.4.3   2017-12-06 local         
##  stringi     1.1.6   2017-11-17 CRAN (R 3.4.2)
##  stringr     1.2.0   2017-02-18 CRAN (R 3.4.0)
##  tools       3.4.3   2017-12-06 local         
##  utils     * 3.4.3   2017-12-06 local         
##  withr       2.1.1   2017-12-19 CRAN (R 3.4.3)
##  yaml        2.1.16  2017-12-12 CRAN (R 3.4.3)