Synopsis

From the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, 900’000 U.S. weather events since 1950 were analyzed. The goal of the analysis was to find the weather events with a) the largest population health impact, b) the highest economic consequences. Tornadoes are by far the weather event with the largest number of fatalities and injuries, followed by excessive heat, thunderstorm winds, floods and lightning strikes. The biggest economic consequences in terms of property and crop damages are floods, tornadoes, hurricanes, storm surges and hail.

Data Processing

The data for the analysis is available on the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database and can be obtained as a bzip2 file from the course website.

First, the data is downloaded and read into a data file:

noaa_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(noaa_url, "stormdata.bz2", method="curl")
full_data <- read.csv(bzfile("stormdata.bz2"), as.is=TRUE)

The analysis uses the dplyr package and transforms the loaded data into its tbl_df() data format for further processing:

library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
full_data <- tbl_df(full_data)

The number of variables is reduced to the ones necessary for analysis of fatalities, injuries and damages (both property and crop damages, split into a number and an magnitude code), along with the event type.

The variables are renamed, the event type set to a factor variable and the date field reformated to POSIXct using the lubridate package.

The full data set is no longer needed and removed from memory.

library(lubridate)

an_data <- full_data %>%
        select(evtype = EVTYPE, date = BGN_DATE, fatalities = FATALITIES,
               injuries = INJURIES, propdmg = PROPDMG, propdmgexp = PROPDMGEXP,
               cropdmg = CROPDMG, cropdmgexp = CROPDMGEXP) %>%
        mutate(evtype=as.factor(evtype)) %>%
        mutate(date=mdy_hms(date))
rm(full_data)

head(an_data)
## Source: local data frame [6 x 8]
## 
##    evtype       date fatalities injuries propdmg propdmgexp cropdmg
## 1 TORNADO 1950-04-18          0       15    25.0          K       0
## 2 TORNADO 1950-04-18          0        0     2.5          K       0
## 3 TORNADO 1951-02-20          0        2    25.0          K       0
## 4 TORNADO 1951-06-08          0        2     2.5          K       0
## 5 TORNADO 1951-11-15          0        2     2.5          K       0
## 6 TORNADO 1951-11-15          0        6     2.5          K       0
## Variables not shown: cropdmgexp (chr)

Since the damage data are split into two variables, the magnitude codes need to be translated into a number. According to the NOAA data documentation, only magnitude codes (NULL), K (thousands), M (millions) and B (billions) are valid, therefore only those are processed. Other entries are discarded. The magnitude variables are no longer used an removed.

an_data <- an_data %>%
        mutate(propdmg = ifelse(propdmgexp == "K", propdmg*1000, propdmg)) %>%
        mutate(propdmg = ifelse(propdmgexp == "M", propdmg*1e6, propdmg)) %>%
        mutate(propdmg = ifelse(propdmgexp == "B", propdmg*1e9, propdmg)) %>%
        mutate(cropdmg = ifelse(cropdmgexp == "K", cropdmg*1000, cropdmg)) %>%
        mutate(cropdmg = ifelse(cropdmgexp == "M", cropdmg*1e6, cropdmg)) %>%
        mutate(cropdmg = ifelse(cropdmgexp == "B", cropdmg*1e9, cropdmg)) %>%
        select(-propdmgexp, -cropdmgexp)

head(an_data)
## Source: local data frame [6 x 6]
## 
##    evtype       date fatalities injuries propdmg cropdmg
## 1 TORNADO 1950-04-18          0       15   25000       0
## 2 TORNADO 1950-04-18          0        0    2500       0
## 3 TORNADO 1951-02-20          0        2   25000       0
## 4 TORNADO 1951-06-08          0        2    2500       0
## 5 TORNADO 1951-11-15          0        2    2500       0
## 6 TORNADO 1951-11-15          0        6    2500       0

The damage data can be over 60 years old and is not inflation-adjusted. A rough adjustment is made:

Using Wolfram Alpha’s web interface for the query “inflation usa 1950-2014”, an inflation value of 912.9% is obtained (query date: 2014-10-25, 22:30). This value can be used to get an annual inflation rate of i = (9.129)^(1/(2014-1950)) - 1 = 3.5158%. This is obviously only a rough estimation and does not take into account actual specific inflation rates per year.

The historical damage values are adjusted using this annual value.

infl = (9.129)^(1/(2014-1950))
curyear = 2014

an_data <- an_data %>%
        mutate(propdmg = propdmg * infl ^ (curyear - year(date))) %>%
        mutate(cropdmg = cropdmg * infl ^ (curyear - year(date)))

head(an_data)
## Source: local data frame [6 x 6]
## 
##    evtype       date fatalities injuries propdmg cropdmg
## 1 TORNADO 1950-04-18          0       15  228225       0
## 2 TORNADO 1950-04-18          0        0   22822       0
## 3 TORNADO 1951-02-20          0        2  220474       0
## 4 TORNADO 1951-06-08          0        2   22047       0
## 5 TORNADO 1951-11-15          0        2   22047       0
## 6 TORNADO 1951-11-15          0        6   22047       0

The data is now ready for analysis. Since the research questions concern the types of events, the data set is grouped by the evtype field.

an_data <- group_by(an_data, evtype)

For the analysis of effects on human health, the fatalities and injuries are summarized by evtype, sorted in descending order by number of cases, and reduced to a list of top 5:

health <- an_data %>%
        summarize(Fatalities = sum(fatalities),
                  Injuries = sum(injuries),
                  Cases = sum(fatalities)+sum(injuries)) %>%
        arrange(-Cases) %>%
        head(n=5L)

health
## Source: local data frame [5 x 4]
## 
##           evtype Fatalities Injuries Cases
## 1        TORNADO       5633    91346 96979
## 2 EXCESSIVE HEAT       1903     6525  8428
## 3      TSTM WIND        504     6957  7461
## 4          FLOOD        470     6789  7259
## 5      LIGHTNING        816     5230  6046

For the analysis of economic consequences, the property and crop damage values are summarized by evtype, sorted in descending order by total damage cost, and reduced to a list of top 5:

eccon <- an_data %>%
        summarize(Property = sum(propdmg),
                  Crops = sum(cropdmg),
                  Total = sum(propdmg)+sum(cropdmg)) %>%
        arrange(-Total) %>%
        head(n=5L)
eccon
## Source: local data frame [5 x 4]
## 
##              evtype  Property     Crops     Total
## 1             FLOOD 1.938e+11 8.186e+09 2.020e+11
## 2           TORNADO 1.461e+11 6.566e+08 1.467e+11
## 3 HURRICANE/TYPHOON 9.559e+10 3.589e+09 9.918e+10
## 4       STORM SURGE 5.924e+10 9.313e+03 5.924e+10
## 5              HAIL 2.267e+10 4.728e+09 2.740e+10

The results tables are currently in the “wide” data format (i.e. the damage type factor is found in the variable titles). To create the necessary bar graphs, the data needs to be transformed into the long format (i.e. the damage type as a separate factor variable). This is done using the tidyr package and its gather() function:

library(tidyr)

health2 <- gather(health, casetype, incidents, Fatalities:Injuries)

head(health2)
## Source: local data frame [6 x 4]
## 
##           evtype Cases   casetype incidents
## 1        TORNADO 96979 Fatalities      5633
## 2 EXCESSIVE HEAT  8428 Fatalities      1903
## 3      TSTM WIND  7461 Fatalities       504
## 4          FLOOD  7259 Fatalities       470
## 5      LIGHTNING  6046 Fatalities       816
## 6        TORNADO 96979   Injuries     91346
eccon2 <- gather(eccon, dmgtype, damage, Property:Crops)

head(eccon2)
## Source: local data frame [6 x 4]
## 
##              evtype     Total  dmgtype    damage
## 1             FLOOD 2.020e+11 Property 1.938e+11
## 2           TORNADO 1.467e+11 Property 1.461e+11
## 3 HURRICANE/TYPHOON 9.918e+10 Property 9.559e+10
## 4       STORM SURGE 5.924e+10 Property 5.924e+10
## 5              HAIL 2.740e+10 Property 2.267e+10
## 6             FLOOD 2.020e+11    Crops 8.186e+09

Results

Most harmful events for population health

The most harmful events for population health are as follows:

library(xtable)

healthtable <- xtable(health, digits=0)
print(healthtable, type="html")
evtype Fatalities Injuries Cases
1 TORNADO 5633 91346 96979
2 EXCESSIVE HEAT 1903 6525 8428
3 TSTM WIND 504 6957 7461
4 FLOOD 470 6789 7259
5 LIGHTNING 816 5230 6046
library(ggplot2)

ggplot(health2, aes(x=evtype, y=incidents)) +
        geom_bar(stat="identity") +
        facet_grid(casetype ~ .) +
        theme(axis.text.x = element_text(angle=30, hjust=1, vjust=1)) +
        geom_text(aes(label=incidents), vjust=-0.2) +
        ylim(c(0,110000)) +
        xlab("Event Type") + ylab("Number of incidents") +
        ggtitle("Most Harmful Weather Events for Population Health")

plot of chunk unnamed-chunk-10

The highest impact on population health is caused by far by tornadoes, both in terms of fatalities and injuries. Excessiv heat has the second highest impact, due to its number of fatalities. Higher number of injuries (but less fatalities) are caused by thunderstorm winds, floods and lightning strikes. Injuries are, not surprisingly, much higher than fatalities in all top 5 event categories.

Events with the greatest economic consequences

The events with the greatest economic consequences were as follows:

library(xtable)

eccontable <- xtable(eccon, digits=-2)
print(eccontable, type="html")
evtype Property Crops Total
1 FLOOD 1.94E+11 8.19E+09 2.02E+11
2 TORNADO 1.46E+11 6.57E+08 1.47E+11
3 HURRICANE/TYPHOON 9.56E+10 3.59E+09 9.92E+10
4 STORM SURGE 5.92E+10 9.31E+03 5.92E+10
5 HAIL 2.27E+10 4.73E+09 2.74E+10
ggplot(eccon2, aes(x=evtype, y=damage/1e9, fill=dmgtype)) +
        geom_bar(stat="identity") +
        theme(axis.text.x = element_text(angle=30, hjust=1, vjust=1)) +
        xlab("Event Type") + ylab("Damage in Billion USD") +
        ggtitle("Weather Events with the Biggest Economic Consequences")

plot of chunk unnamed-chunk-11

The biggest economic consequences are caused by floods with an estimated 200 billion USD in inflation-adjusted damages, followed by tornadoes, hurricanes/typhoons, storm surges and hail. Property damages are much higher than crop damages.