Introduction

This report is based National Storm Data. It compares the impact of different types of extreme weather condition on population health and economy. Fatalities and injures are used as measures of the impact of population health. Property damage and crop damage are used to identify which type of extreme weather condition is most harmful to economic.

Data Processing

Data is publicly available can be obtained using the following shell script

wget https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2
bzip2 -d repdata%2Fdata%2FStormData.csv.bz2

This report uses readr to load csv data, as it does not depends on the implementation of operating system which means more reproducible.

df <- read_csv("repdata%2Fdata%2FStormData.csv")
## Parsed with column specification:
## cols(
##   .default = col_character(),
##   STATE__ = col_double(),
##   COUNTY = col_double(),
##   BGN_RANGE = col_double(),
##   COUNTY_END = col_double(),
##   END_RANGE = col_double(),
##   LENGTH = col_double(),
##   WIDTH = col_double(),
##   F = col_integer(),
##   MAG = col_double(),
##   FATALITIES = col_double(),
##   INJURIES = col_double(),
##   PROPDMG = col_double(),
##   CROPDMG = col_double(),
##   LATITUDE = col_double(),
##   LONGITUDE = col_double(),
##   LATITUDE_E = col_double(),
##   LONGITUDE_ = col_double(),
##   REFNUM = col_double()
## )
## See spec(...) for full column specifications.

To address which types of extreme weather conditions harms the most of population health, fatalities and injuries are grouped by the types of weather condition and summed together. For demonstration purpose only the top 10 types are kept.

df.health <- df %>%
  select(EVTYPE, FATALITIES, INJURIES) %>%
  mutate(EVTYPE=as.factor(EVTYPE)) %>%
  group_by(EVTYPE) %>%
  summarise(FATALITIES=sum(FATALITIES), INJURIES=sum(INJURIES)) %>%
  arrange(desc(FATALITIES + INJURIES)) %>%
  head(n=10) %>%
  gather('TYPE', 'COUNT', c(FATALITIES, INJURIES)) %>%
  mutate(TYPE=ifelse(TYPE=='FATALITIES', 'Fatality', 'Injury'))

According to the documentation of the data all the damage are measures in US dollars, in which also describe the scale of measure are in ‘K’, ‘M’ and ‘B’ which are corresponding to thousand, million and billion.

The following function was designed to normalized the scale of damage to US dollars. All the scale other than ‘K’, ‘M’ and ‘B’ are treated as 1.

normalize <- Vectorize(function(u) {
  switch(u, 'K'=1000, 'M'=1000000, 'B'=1000000000, 1)
})

First, all the damage are normalized in US dollars. And then they are grouped by types of weather conditions and all the measures are summed. For demonstration purpose, only the top ten most harmful types of weather conditions are kept and all damage are converted into the scale of million dollars.

df.economy <- df %>%
  select(EVTYPE, contains('DMG')) %>%
  mutate(PROPDMG=PROPDMG * normalize(PROPDMGEXP), 
         CROPDMG=CROPDMG * normalize(CROPDMGEXP)) %>%
  select(EVTYPE, PROPDMG, CROPDMG) %>%
  group_by(EVTYPE) %>%
  summarise(PROPDMG=sum(PROPDMG), CROPDMG=sum(CROPDMG)) %>%
  arrange(desc(PROPDMG + CROPDMG)) %>%
  head(n=10) %>%
  gather('TYPE', 'DMG', contains('DMG')) %>%
  mutate(TYPE=ifelse(TYPE=='PROPDMG', 'Property Damage', 'Crops damage'), 
         DMG=DMG / 1000000)

Results

Most harmful types of extreme weather condition to population health:

ggplot(data=df.health, aes(x=EVTYPE, y=COUNT, fill=TYPE)) +
  geom_bar(stat='identity') +
  labs(x='Types of Extreme Weather Condition', 
       y='Fatality/Injuries',
       fill='Damage Type',
       caption='US 1950 - 2011 Types of Storm and Fatalities/Injuries') +
  coord_flip()

The most harmful types of extreme weather conditions to economy:

ggplot(data=df.economy, aes(x=EVTYPE, y=DMG, fill=TYPE)) +
  geom_bar(stat='identity') +
  labs(x='Types of Extreme Weather Condition', 
       y='Damage (Million Dollar)',
       fill='Damage Type',
       caption='US 1950 - 2011 Types of Storm and Property/Crops Damage') +
  coord_flip()