Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage. From U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database we can see that most harmful for economic weather event is flood. There is about 138 bln dollars damage from floods in 2001-2011 period. But most harmful weather event for public health is tornado with more than thousand fatalities and more than 14 thousands injuries during 2001-2011 period.
We use U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. Base was downloaded from URL below at 24.10.2014.
options(stringsAsFactors = FALSE)
#
# you should change path according to your system
setwd("c:/Users/gregory/Documents/!Projects/Trainings/Coursera - Reproducible Research/RepData_PA2/")
# download.file(url = "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",destfile = "stormdata.csv.bz2")
storm = read.csv("stormdata.csv.bz2",header=TRUE)
We need to determine the events most harmful for health and for economic. For this questions there is no need to keep original database. So we aggregate data by year and event type. We will keep only necessary variables: FATALITIES (number of fatalities), INJURIES (number of injuries), PROPDAMAGE (property damage, dollars) and CROPDMG (crop damage, dollars).
invisible(library(dplyr))
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(knitr)
library(ggplot2)
# aggregation
# it seems that TSTM WIND, THUNDERSTORM WIND and THUNDERSTORM WINDS are the same thing
# so we recode them all to THUNDERSTORM WINDS
# the same about HURRICANE/TYPHOON and HURRICANE
aggr_storm = storm %>%
mutate(EVTYPE = ifelse(EVTYPE %in% c("TSTM WIND", "THUNDERSTORM WIND"),"THUNDERSTORM WINDS",EVTYPE)) %>%
mutate(EVTYPE = ifelse(EVTYPE %in% c("HURRICANE/TYPHOON"),"HURRICANE",EVTYPE)) %>%
mutate(event = factor(EVTYPE)) %>% # convert event to factor
mutate(year = as.numeric(substr(as.Date(BGN_DATE,format = "%m/%d/%Y %H:%M:%S"),1,4))) %>% # extract year
# multipliers for damage - thousands, millions, billions
mutate(prop_mult = ifelse(PROPDMGEXP %in% c("B"),1e9, # billions
ifelse(PROPDMGEXP %in% c("m","M"),1e6, # millions
ifelse(PROPDMGEXP %in% c("k","K"),1e3,0)))) %>% # multiplier for damage
mutate(crop_mult = ifelse(CROPDMGEXP %in% c("B"),1e9, # billions
ifelse(CROPDMGEXP %in% c("m","M"),1e6, # millions
ifelse(CROPDMGEXP %in% c("k","K"),1e3,0)))) %>% # multiplier for damage
group_by(year,event) %>%
summarize(fatalities = sum(FATALITIES,na.rm = TRUE),
injuries = sum(INJURIES,na.rm = TRUE),
propdamage = sum(PROPDMG*prop_mult,na.rm = TRUE),
cropdamage = sum(CROPDMG*crop_mult,na.rm = TRUE),
count = length(CROPDMG) # count number of cases
) %>%
mutate(total_dmg = propdamage + cropdamage) # total damage
Under economic consequences we will understand total damage (property damage + crop damage) from the weather event. Available data covers rather long period of time. To correctly estimate economic damage we need adjustments by inflation but there is no such data in this base. To avoid this problem we will pick events with max economic damage (total_dmg) in each year (there is very small inflation during one year).
# ranking events by damage
aggr_storm = aggr_storm %>%
group_by(year) %>%
mutate(rank = min_rank(desc(total_dmg)))
# keep one event with maximum damage from each year
max_dmg_events = filter(aggr_storm,rank==1) %>%
select(event) %>%
group_by(event) %>%
summarize(count = length(event)) %>%
arrange(desc(count))
###Frequency of most harmful events in descending order
kable(max_dmg_events)
| event | count |
|---|---|
| TORNADO | 45 |
| HURRICANE | 6 |
| FLOOD | 3 |
| DROUGHT | 1 |
| HAIL | 1 |
| HURRICANE OPAL | 1 |
| ICE STORM | 1 |
| RIVER FLOOD | 1 |
| STORM SURGE/TIDE | 1 |
| TROPICAL STORM | 1 |
| WILDFIRE | 1 |
We can see that in 45 different years TORNADO was most harmful event. There are HURRICANE and FLOOD on the second and third place. But it is possible that HURRICANE or FLOOD sometimes is so strong that its damage exceeds damage from TORNADO from all others years. To clarify this point we plot damage in dollars for all these events vs time.
for_plot = aggr_storm %>%
filter(event %in% c("TORNADO","HURRICANE","FLOOD"))
qplot(year,total_dmg,data=for_plot,
geom="line",
facets = .~event,
log = "total_dmg",
xlab ="Year",
ylab = "Damage, $",
main = "Fig. 1. Damage from most harmful weather events during 1950-2011") +
theme_bw()
Surprisingly but there are no records for HURRICANE and FLOOD in early years. For existing records damage from HURRICANE/FLOOD exceeds that from TORNADO. Records for these events appears only from 1993 year. Let’s repeat our calculations for this period.
min_year = min(filter(for_plot,event=="FLOOD" & total_dmg>0)$year)
max_dmg_events = filter(aggr_storm,rank==1 & year>=min_year) %>%
select(event) %>%
group_by(event) %>%
summarize(count = length(event)) %>%
arrange(desc(count))
###Frequency of most harmful events in descending order
kable(max_dmg_events)
| event | count |
|---|---|
| HURRICANE | 6 |
| FLOOD | 3 |
| TORNADO | 2 |
| DROUGHT | 1 |
| HAIL | 1 |
| HURRICANE OPAL | 1 |
| ICE STORM | 1 |
| RIVER FLOOD | 1 |
| STORM SURGE/TIDE | 1 |
| TROPICAL STORM | 1 |
| WILDFIRE | 1 |
It appears that HURRICANE is winner in this period. Further we compare damage in dollars for all events in 2001-2011. Inflation in this period is not so high so we can disregard it.
lst_years_damage = aggr_storm %>%
filter(year>=2001) %>%
group_by (event) %>%
summarize(damage=sum(total_dmg), events_number=sum(count)) %>%
arrange(desc(damage))
# Top 10 events by damage in 2000-2001
kable (lst_years_damage[1:10,])
| event | damage | events_number |
|---|---|---|
| FLOOD | 1.376e+11 | 19034 |
| HURRICANE | 7.540e+10 | 133 |
| STORM SURGE | 4.317e+10 | 120 |
| TORNADO | 1.927e+10 | 16518 |
| HAIL | 1.320e+10 | 154458 |
| FLASH FLOOD | 1.218e+10 | 38412 |
| TROPICAL STORM | 7.607e+09 | 605 |
| DROUGHT | 7.543e+09 | 1933 |
| THUNDERSTORM WINDS | 5.903e+09 | 154257 |
| HIGH WIND | 5.395e+09 | 15569 |
Finally, FLOOD is clear winner with approximately 138 billion dollars of damage in 2001-2011.
For consistency with previews topic in this question we will consider period from 2001 year. We will count number of injuries and fatalities from different type of events during this period.
health_damage = aggr_storm %>%
filter(year>=2001) %>%
group_by (event) %>%
summarize(fatalities = sum(fatalities), injuries=sum(injuries), events_number=sum(count)) %>%
arrange(desc(fatalities))
kable(health_damage[1:10,])
| event | fatalities | injuries | events_number |
|---|---|---|---|
| TORNADO | 1152 | 14331 | 16518 |
| EXCESSIVE HEAT | 856 | 3242 | 1049 |
| FLASH FLOOD | 573 | 780 | 38412 |
| LIGHTNING | 414 | 2622 | 8779 |
| RIP CURRENT | 340 | 208 | 431 |
| FLOOD | 260 | 309 | 19034 |
| HEAT | 230 | 1222 | 710 |
| THUNDERSTORM WINDS | 222 | 2878 | 154257 |
| AVALANCHE | 163 | 109 | 300 |
| EXTREME COLD/WIND CHILL | 125 | 24 | 1002 |
max_fatal = health_damage[1,"fatalities"]
max_injur = health_damage[1,"injuries"]
Both maximum fatalities(1152) and maximum injuries(1.4331 × 104) are from tornado.
Most harmful for economic weather event is flood. There is about 138 bln. dollars damage from floods in 2001-2011 period. But most harmful weather event for public health is tornado with more than 1’000 fatalities and more than 14’000 injuries during 2001-2011 period. Our conclusions mostly based on 2001-2011 years data because of some inconsistencies in data in early periods.
lst_years_damage[,"event"] = with(lst_years_damage,factor(event, levels=event))
qplot(event,
damage/1e9,
data=lst_years_damage[1:5,],
geom="histogram",
stat = "identity",
xlab ="Event",
ylab = "Damage, bln. $",
main = "Fig. 2. Damage from top five most harmful events during 2001-2011") +
theme_bw()
health_damage[,"event"] = with(health_damage,factor(event, levels=event))
qplot(event,fatalities,
data = health_damage[1:5,],
geom = "histogram",
stat = "identity",
ylab = "Number of fatalities",
xlab = "Event",
main = "Fig. 3. Fatalities from top 5 weather events in 2001-2011") +
theme_bw()