This project explores an influence of weather events on USA economy. We will check out impact on population heath and also the economic consequences of weather events. The analysis will answer the question of which event type has the highest impact. As a data source serves the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.

Data Processing

Read the the NOAA Storm Database from the downloaded archive.

library(dplyr)
data <- read.table("repdata_data_StormData.csv.bz2", header=T, quote="\"", sep=",")

To analyze the impact of weather event types we will explore the following measurements:

EVTYPE – event type
FATALITIES – number of fatal cases
INJURIES – number of injury cases
CROPDMGEXP – measurement unit for CROPDMG
CROPDMG – amount of damage on crops
PROPDMGEXP – measurement unit for PROPDMG
PROPDMG – amount of damage on properties

Measurement units

To assess the impact, we need to compare damages using the same base, i.e. to transform values of CROPDMG/PROPDMG using the measurement units:

B stands for billions
M stands for millions
K/k stands for thousands (Kilo)
0 stands for nominal unit, i.e. no multiplier

Since economic impact from the weather events can be quite high, we’ll use millions as the base unit.

measurement_unit_multiplier <- function(unit) {
  # default
  multiplier <- 1/1000000
  # has to use ifelse, because the input is vectorized
  multiplier <- ifelse(unit == 'B', 1000, multiplier)
  multiplier <- ifelse(unit == 'M', 1, multiplier)
  multiplier <- ifelse(unit == 'K', 1/1000, multiplier)
  multiplier <- ifelse(unit == 'k', 1/1000, multiplier)
  
  return (multiplier)
}


analysis_data <- data %>% 
  select(EVTYPE, FATALITIES, INJURIES,CROPDMG, CROPDMGEXP, PROPDMG, PROPDMGEXP) %>%
  mutate(
    CROPDMG = CROPDMG * measurement_unit_multiplier(CROPDMGEXP),
    PROPDMG = PROPDMG * measurement_unit_multiplier(PROPDMGEXP)
    ) %>%
  select(-CROPDMGEXP, -PROPDMGEXP) %>%
  group_by(EVTYPE)

summary(analysis_data)

##     EVTYPE            FATALITIES          INJURIES            CROPDMG       
##  Length:902297      Min.   :  0.0000   Min.   :   0.0000   Min.   :0.0e+00  
##  Class :character   1st Qu.:  0.0000   1st Qu.:   0.0000   1st Qu.:0.0e+00  
##  Mode  :character   Median :  0.0000   Median :   0.0000   Median :0.0e+00  
##                     Mean   :  0.0168   Mean   :   0.1557   Mean   :5.4e-02  
##                     3rd Qu.:  0.0000   3rd Qu.:   0.0000   3rd Qu.:0.0e+00  
##                     Max.   :583.0000   Max.   :1700.0000   Max.   :5.0e+03  
##     PROPDMG        
##  Min.   :0.00e+00  
##  1st Qu.:0.00e+00  
##  Median :0.00e+00  
##  Mean   :4.70e-01  
##  3rd Qu.:0.00e+00  
##  Max.   :1.15e+05

Exploration

Let’s check which event types have the most impact and what’s the intersection across different impact types:

library(dplyr)

top_by <- function(column) {
  total_column <- paste(column, "TOTAL", sep="_")
  top <- analysis_data %>%
    summarise(!!total_column := sum(get(column))) %>%
    arrange(desc(get(total_column)))
  return (top)
}

print_top_by <- function(column) {
  head(top_by(column))
}

print_top_by("FATALITIES")

## # A tibble: 6 × 2
##   EVTYPE         FATALITIES_TOTAL
##   <chr>                     <dbl>
## 1 TORNADO                    5633
## 2 EXCESSIVE HEAT             1903
## 3 FLASH FLOOD                 978
## 4 HEAT                        937
## 5 LIGHTNING                   816
## 6 TSTM WIND                   504

print_top_by("INJURIES")

## # A tibble: 6 × 2
##   EVTYPE         INJURIES_TOTAL
##   <chr>                   <dbl>
## 1 TORNADO                 91346
## 2 TSTM WIND                6957
## 3 FLOOD                    6789
## 4 EXCESSIVE HEAT           6525
## 5 LIGHTNING                5230
## 6 HEAT                     2100

print_top_by("CROPDMG")

## # A tibble: 6 × 2
##   EVTYPE      CROPDMG_TOTAL
##   <chr>               <dbl>
## 1 DROUGHT            13973.
## 2 FLOOD               5662.
## 3 RIVER FLOOD         5029.
## 4 ICE STORM           5022.
## 5 HAIL                3026.
## 6 HURRICANE           2742.

print_top_by("PROPDMG")

## # A tibble: 6 × 2
##   EVTYPE            PROPDMG_TOTAL
##   <chr>                     <dbl>
## 1 FLOOD                   144658.
## 2 HURRICANE/TYPHOON        69306.
## 3 TORNADO                  56926.
## 4 STORM SURGE              43324.
## 5 FLASH FLOOD              16141.
## 6 HAIL                     15727.

Results

Across the United States, which types of events are most harmful with respect to population health?

For both fatalities and injuries, Tornado is the most harmful event, and way ahead of other event types by the injuries number. Let’s explore this on the plot graph:

library(tidyr)
library(ggplot2)

# to be sure both top injuries and top fatalities events be present at the plot
top_events <- union(
  top_by("INJURIES") %>% top_n(7) %>% pull(EVTYPE),
  top_by("FATALITIES") %>% top_n(7) %>% pull(EVTYPE)
)

analysis_data %>%
  summarise(FATALITIES_TOTAL = sum(FATALITIES),
            INJURIES_TOTAL = sum(INJURIES)) %>%
  filter(EVTYPE %in% top_events) %>%
  ggplot(aes(EVTYPE, group = 1)) +
  geom_line(aes(y = FATALITIES_TOTAL, colour = "Fatalities")) +
  geom_line(aes(y = INJURIES_TOTAL, colour = "Injuries")) + 
  labs(title = "Most harmful events across US with respect to population health", x = "Event type", y = "People impacted", color=element_blank()) + 
  scale_x_discrete(guide = guide_axis(n.dodge = 2))

Across the United States, which types of events have the greatest economic consequences?

The greatest economic consequences have:

Drought for crop damages
Flood for properties damages Let’s explore this on the faceted bar charts:

library(gridExtra)

# to be sure both top injuries and top fatalities events be present at the plot

top_events <- union(
  top_by("CROPDMG") %>% top_n(7) %>% pull(EVTYPE),
  top_by("PROPDMG") %>% top_n(7) %>% pull(EVTYPE)
)

damage_labels <- c(`CROPDMG` = "Crops Damage", `PROPDMG` = "Property Damage")

analysis_data %>%
  summarise(
    CROPDMG = sum(CROPDMG),
    PROPDMG = sum(PROPDMG)
  ) %>%
  filter(EVTYPE %in% top_events) %>%
  gather(key, value, 2:3) %>%
  ggplot(aes(x=EVTYPE, y = value, fill = factor(key))) + 
  facet_wrap(~key, labeller = as_labeller(damage_labels)) + 
  geom_bar(stat = "identity", position = position_dodge2(preserve = "single")) + 
  coord_flip() + 
  labs(title = "Most harmful events across US\nwith respect to economic consequences", x = element_blank(), y = "Million $") + 
  theme(legend.position="none")

Exploring the NOAA Storm Database - an impact of weather event types

Data Processing

Exploration

Results

Across the United States, which types of events are most harmful with respect to population health?

Across the United States, which types of events have the greatest economic consequences?