Brief exploration of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, looking for the most harmful weather events with respect to population and economy. The database documentation is here.

Data Processing

We start by certifying that all the codes are shown and needed packages are loaded.

knitr::opts_chunk$set(echo = TRUE)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(gsubfn)
## Loading required package: proto
library(readr)
library(reshape2)

The database is ready for a simple read_csv() call, however, the call is cached, because the database is huge, so it would be slow rerun it all the time.

noaa <- read_csv("repdata%2Fdata%2FStormData.csv.bz2")
## Parsed with column specification:
## cols(
##   .default = col_character(),
##   STATE__ = col_double(),
##   COUNTY = col_double(),
##   BGN_RANGE = col_double(),
##   COUNTY_END = col_double(),
##   END_RANGE = col_double(),
##   LENGTH = col_double(),
##   WIDTH = col_double(),
##   F = col_integer(),
##   MAG = col_double(),
##   FATALITIES = col_double(),
##   INJURIES = col_double(),
##   PROPDMG = col_double(),
##   CROPDMG = col_double(),
##   LATITUDE = col_double(),
##   LONGITUDE = col_double(),
##   LATITUDE_E = col_double(),
##   LONGITUDE_ = col_double(),
##   REFNUM = col_double()
## )
## See spec(...) for full column specifications.
head(noaa)
## # A tibble: 6 x 37
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
##     <dbl>              <chr>    <chr>     <chr>  <dbl>      <chr> <chr>
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
## # ... with 30 more variables: EVTYPE <chr>, BGN_RANGE <dbl>,
## #   BGN_AZI <chr>, BGN_LOCATI <chr>, END_DATE <chr>, END_TIME <chr>,
## #   COUNTY_END <dbl>, COUNTYENDN <chr>, END_RANGE <dbl>, END_AZI <chr>,
## #   END_LOCATI <chr>, LENGTH <dbl>, WIDTH <dbl>, F <int>, MAG <dbl>,
## #   FATALITIES <dbl>, INJURIES <dbl>, PROPDMG <dbl>, PROPDMGEXP <chr>,
## #   CROPDMG <dbl>, CROPDMGEXP <chr>, WFO <chr>, STATEOFFIC <chr>,
## #   ZONENAMES <chr>, LATITUDE <dbl>, LONGITUDE <dbl>, LATITUDE_E <dbl>,
## #   LONGITUDE_ <dbl>, REMARKS <chr>, REFNUM <dbl>

Results

For the most harmful weather events with respect to population health, we look for the events with high fatality and injury rates. Tornado-related events are the ones that kill more.

pop_harm_tot <- noaa %>% group_by(EVTYPE) %>% 
  summarise(tot_fal = sum(FATALITIES),
            tot_inj = sum(INJURIES)) %>%
  filter(tot_fal >= 500 | tot_inj > 2000) %>%
  arrange(tot_fal) %>%
  melt(id = "EVTYPE")
ggplot(pop_harm_tot, aes(x = EVTYPE, y = value, fill = variable)) + 
  geom_col(position = "dodge") + 
  theme_bw() + 
  labs(x = "Event", y = "", title = "Total Harm to Population Health", 
       fill = "Harm") + 
  scale_fill_manual(labels = c("Fatality", "Injury"), values = c("blue", "red"))

pop_harm_mean <- noaa %>% group_by(EVTYPE) %>%
  summarise(mean_fal = mean(FATALITIES),
            mean_inj = mean(INJURIES)) %>%
  filter(mean_fal >= 10 | mean_inj >= 40) %>%
  melt(id = "EVTYPE")
ggplot(pop_harm_mean, aes(x = EVTYPE, y = value, fill = variable)) + 
  geom_col(position = "dodge") + 
  theme_bw() + 
  labs(x = "Event", y = "", title = "Mean Harm to Population Health", 
       fill = "Harm") + 
  scale_fill_manual(labels = c("Fatality", "Injury"), values = c("blue", "red"))

For weather events with the greatest economic consequences, we look for the cases with biggest loss for property and crop.

econ_harm_prop <- noaa %>% group_by(EVTYPE) %>% 
  filter(PROPDMGEXP == "B") %>% arrange(desc(PROPDMG))
econ_harm_crop <- noaa %>% group_by(EVTYPE) %>% 
  filter(CROPDMGEXP == "B") %>% arrange(desc(CROPDMG))

The event with the biggest property loss is FLOOD and the biggest crop loss was given by RIVER FLOOD, ICE STORM.