Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage. From U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database we can see that most harmful for economic weather event is flood. There is about 138 bln dollars damage from floods in 2001-2011 period. But most harmful weather event for public health is tornado with more than thousand fatalities and more than 14 thousands injuries during 2001-2011 period.

Data Processing

We use U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. Base was downloaded from URL below at 24.10.2014.

options(stringsAsFactors = FALSE)
#
# you should change path according to your system
setwd("c:/Users/gregory/Documents/!Projects/Trainings/Coursera - Reproducible Research/RepData_PA2/")
# download.file(url = "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",destfile = "stormdata.csv.bz2")

storm = read.csv("stormdata.csv.bz2",header=TRUE)

We need to determine the events most harmful for health and for economic. For this questions there is no need to keep original database. So we aggregate data by year and event type. We will keep only necessary variables: FATALITIES (number of fatalities), INJURIES (number of injuries), PROPDAMAGE (property damage, dollars) and CROPDMG (crop damage, dollars).

invisible(library(dplyr))
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(knitr)
library(ggplot2)

# aggregation

# it seems that TSTM WIND, THUNDERSTORM WIND and THUNDERSTORM WINDS are the same thing
# so we recode them all to THUNDERSTORM WINDS
# the same about HURRICANE/TYPHOON and HURRICANE

aggr_storm = storm %>% 
    mutate(EVTYPE = ifelse(EVTYPE %in% c("TSTM WIND", "THUNDERSTORM WIND"),"THUNDERSTORM WINDS",EVTYPE)) %>%
    mutate(EVTYPE = ifelse(EVTYPE %in% c("HURRICANE/TYPHOON"),"HURRICANE",EVTYPE)) %>%
    mutate(event = factor(EVTYPE)) %>% # convert event to factor
    mutate(year = as.numeric(substr(as.Date(BGN_DATE,format = "%m/%d/%Y %H:%M:%S"),1,4))) %>% # extract year
    # multipliers for damage - thousands, millions, billions
    mutate(prop_mult = ifelse(PROPDMGEXP %in% c("B"),1e9, # billions
                              ifelse(PROPDMGEXP %in% c("m","M"),1e6,  # millions
                                     ifelse(PROPDMGEXP %in% c("k","K"),1e3,0)))) %>% # multiplier for damage
    mutate(crop_mult = ifelse(CROPDMGEXP %in% c("B"),1e9, # billions
                              ifelse(CROPDMGEXP %in% c("m","M"),1e6, # millions
                                     ifelse(CROPDMGEXP %in% c("k","K"),1e3,0)))) %>% # multiplier for damage
    group_by(year,event) %>%
    summarize(fatalities = sum(FATALITIES,na.rm = TRUE),
              injuries = sum(INJURIES,na.rm = TRUE),
              propdamage = sum(PROPDMG*prop_mult,na.rm = TRUE),
              cropdamage = sum(CROPDMG*crop_mult,na.rm = TRUE),
              count = length(CROPDMG) # count number of cases
              ) %>%
    mutate(total_dmg = propdamage + cropdamage) # total damage

Types of events which have the greatest economic consequences

Under economic consequences we will understand total damage (property damage + crop damage) from the weather event. Available data covers rather long period of time. To correctly estimate economic damage we need adjustments by inflation but there is no such data in this base. To avoid this problem we will pick events with max economic damage (total_dmg) in each year (there is very small inflation during one year).

# ranking events by damage
aggr_storm = aggr_storm %>%
    group_by(year) %>%
    mutate(rank = min_rank(desc(total_dmg)))

# keep one event with maximum damage from each year 
max_dmg_events = filter(aggr_storm,rank==1) %>% 
    select(event) %>% 
    group_by(event) %>%
    summarize(count = length(event)) %>%
    arrange(desc(count))

###Frequency of most harmful events in descending order
kable(max_dmg_events)
event count
TORNADO 45
HURRICANE 6
FLOOD 3
DROUGHT 1
HAIL 1
HURRICANE OPAL 1
ICE STORM 1
RIVER FLOOD 1
STORM SURGE/TIDE 1
TROPICAL STORM 1
WILDFIRE 1

We can see that in 45 different years TORNADO was most harmful event. There are HURRICANE and FLOOD on the second and third place. But it is possible that HURRICANE or FLOOD sometimes is so strong that its damage exceeds damage from TORNADO from all others years. To clarify this point we plot damage in dollars for all these events vs time.

for_plot = aggr_storm %>%
    filter(event %in% c("TORNADO","HURRICANE","FLOOD"))

qplot(year,total_dmg,data=for_plot,
      geom="line", 
      facets = .~event, 
      log = "total_dmg",
      xlab ="Year", 
      ylab = "Damage, $",
      main = "Fig. 1. Damage from most harmful weather events during 1950-2011") +
    theme_bw()

plot of chunk unnamed-chunk-4

Surprisingly but there are no records for HURRICANE and FLOOD in early years. For existing records damage from HURRICANE/FLOOD exceeds that from TORNADO. Records for these events appears only from 1993 year. Let’s repeat our calculations for this period.

min_year = min(filter(for_plot,event=="FLOOD" & total_dmg>0)$year)
max_dmg_events = filter(aggr_storm,rank==1 & year>=min_year) %>% 
    select(event) %>% 
    group_by(event) %>%
    summarize(count = length(event)) %>%
    arrange(desc(count))

###Frequency of most harmful events in descending order
kable(max_dmg_events)
event count
HURRICANE 6
FLOOD 3
TORNADO 2
DROUGHT 1
HAIL 1
HURRICANE OPAL 1
ICE STORM 1
RIVER FLOOD 1
STORM SURGE/TIDE 1
TROPICAL STORM 1
WILDFIRE 1

It appears that HURRICANE is winner in this period. Further we compare damage in dollars for all events in 2001-2011. Inflation in this period is not so high so we can disregard it.

lst_years_damage = aggr_storm %>% 
    filter(year>=2001) %>%
    group_by (event) %>%
    summarize(damage=sum(total_dmg), events_number=sum(count)) %>%
    arrange(desc(damage))

# Top 10 events by damage in 2000-2001
kable (lst_years_damage[1:10,])
event damage events_number
FLOOD 1.376e+11 19034
HURRICANE 7.540e+10 133
STORM SURGE 4.317e+10 120
TORNADO 1.927e+10 16518
HAIL 1.320e+10 154458
FLASH FLOOD 1.218e+10 38412
TROPICAL STORM 7.607e+09 605
DROUGHT 7.543e+09 1933
THUNDERSTORM WINDS 5.903e+09 154257
HIGH WIND 5.395e+09 15569

Finally, FLOOD is clear winner with approximately 138 billion dollars of damage in 2001-2011.

Types of events wich are most harmful with respect to population health

For consistency with previews topic in this question we will consider period from 2001 year. We will count number of injuries and fatalities from different type of events during this period.

health_damage = aggr_storm %>% 
    filter(year>=2001) %>%
    group_by (event) %>%
    summarize(fatalities = sum(fatalities), injuries=sum(injuries), events_number=sum(count)) %>%
    arrange(desc(fatalities))

kable(health_damage[1:10,])
event fatalities injuries events_number
TORNADO 1152 14331 16518
EXCESSIVE HEAT 856 3242 1049
FLASH FLOOD 573 780 38412
LIGHTNING 414 2622 8779
RIP CURRENT 340 208 431
FLOOD 260 309 19034
HEAT 230 1222 710
THUNDERSTORM WINDS 222 2878 154257
AVALANCHE 163 109 300
EXTREME COLD/WIND CHILL 125 24 1002
max_fatal = health_damage[1,"fatalities"]
max_injur = health_damage[1,"injuries"]

Both maximum fatalities(1152) and maximum injuries(1.4331 × 104) are from tornado.

Results

Most harmful for economic weather event is flood. There is about 138 bln. dollars damage from floods in 2001-2011 period. But most harmful weather event for public health is tornado with more than 1’000 fatalities and more than 14’000 injuries during 2001-2011 period. Our conclusions mostly based on 2001-2011 years data because of some inconsistencies in data in early periods.

lst_years_damage[,"event"] = with(lst_years_damage,factor(event, levels=event))
qplot(event,
      damage/1e9,
      data=lst_years_damage[1:5,],
      geom="histogram", 
      stat = "identity",
      xlab ="Event", 
      ylab = "Damage, bln. $",
      main = "Fig. 2. Damage from top five most harmful events during 2001-2011") +
    theme_bw()

plot of chunk unnamed-chunk-8

health_damage[,"event"] = with(health_damage,factor(event, levels=event))
qplot(event,fatalities, 
      data = health_damage[1:5,], 
      geom = "histogram",
      stat = "identity",
      ylab = "Number of fatalities",
      xlab = "Event",
      main = "Fig. 3. Fatalities from top 5 weather events in 2001-2011") +
    theme_bw()

plot of chunk unnamed-chunk-8

Thank you for your attention!