Analysis of Health and Economic Impact From Severe Weather Events

Synopsis

This analysis looks at the human and economic damage caused by severe weather events.
It uses the NOAA Storm Database.
This is completed for Reproducible Research: Peer Assessment 2.
This report will answer two questions:

  1. Across the United States, which types of events are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

I load and explore the data first.
Because the integrety of the event variable is so bad, I create groups based on specifice words and completed the analysis with these groups.

Data Procesing

Get libraries

library(plyr)
library(rCharts)
library(reshape2)

Load Data

if(!file.exists("repdata-data-StormData.csv.bz2")) {
  download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                 "repdata-data-StormData.csv.bz2",
                 method = "curl")
}

df <- read.csv("repdata-data-StormData.csv.bz2")

Explore Event Types

The event types are incredibly variable
For example there are 985 event types
The phrase 'wind' occurs in 220 event types
I will just show the top and bottom 20

## Total event types
length(unique(df$EVTYPE))
[1] 985
## Find occurances of 'wind'
tmp <- grep('wind', tolower(df$EVTYPE), value=T)
length(table(tmp))
[1] 220
head(arrange(as.data.frame(table(tmp)),desc(Freq)),20)
                        tmp   Freq
1                 tstm wind 219942
2         thunderstorm wind  82564
3        thunderstorm winds  20843
4                 high wind  20214
5          marine tstm wind   6175
6  marine thunderstorm wind   5812
7               strong wind   3569
8                high winds   1533
9            tstm wind/hail   1028
10  extreme cold/wind chill   1002
11          cold/wind chill    539
12                     wind    346
13        extreme windchill    204
14             strong winds    204
15         marine high wind    135
16              gusty winds     65
17  thunderstorm winds hail     61
18      thunderstorm windss     51
19       marine strong wind     48
20          tstm wind (g45)     39
tail(arrange(as.data.frame(table(tmp)),desc(Freq)),20)
                           tmp Freq
201 tornadoes, tstm wind, hail    1
202           tstm wind  (g45)    1
203             tstm wind (41)    1
204            tstm wind (g35)    1
205               tstm wind 40    1
206               tstm wind 45    1
207               tstm wind 50    1
208              tstm wind 65)    1
209    tstm wind and lightning    1
210           tstm wind damage    1
211              tstm wind g45    1
212              tstm wind g58    1
213           tunderstorm wind    1
214              wind and wave    1
215       wind chill/high wind    1
216                 wind storm    1
217                  wind/hail    1
218    winter storm high winds    1
219     winter storm/high wind    1
220    winter storm/high winds    1

This is bad data integrity.
Its a huge problems with the data set.
For this analysis, I will scan the top 100 events and put them into logical groups.
I am showing the top 20 here.

## top 20 event occurences
tmp <- arrange(as.data.frame(table(df$EVTYPE)),desc(Freq))
head(tmp,20)
                       Var1   Freq
1                      HAIL 288661
2                 TSTM WIND 219940
3         THUNDERSTORM WIND  82563
4                   TORNADO  60652
5               FLASH FLOOD  54277
6                     FLOOD  25326
7        THUNDERSTORM WINDS  20843
8                 HIGH WIND  20212
9                 LIGHTNING  15754
10               HEAVY SNOW  15708
11               HEAVY RAIN  11723
12             WINTER STORM  11433
13           WINTER WEATHER   7026
14             FUNNEL CLOUD   6839
15         MARINE TSTM WIND   6175
16 MARINE THUNDERSTORM WIND   5812
17               WATERSPOUT   3796
18              STRONG WIND   3566
19     URBAN/SML STREAM FLD   3392
20                 WILDFIRE   2761

Now I create a new variable and set the event type.
I go from general groups to more specific groups.
Each event can have only one type.

df$event_type <- NA
df$event_type[grep('heat|warm', tolower(df$EVTYPE))] <- 'heat'
df$event_type[grep('cold', tolower(df$EVTYPE))] <- 'cold'
df$event_type[grep('wind', tolower(df$EVTYPE))] <- 'wind'
df$event_type[grep('surf|current|tide', tolower(df$EVTYPE))] <- 'ocean'

df$event_type[grep('snow|winter|wintry|sleet|blizzard|ice|freeze|avalanche', tolower(df$EVTYPE))] <- 'snow'
df$event_type[grep('rain', tolower(df$EVTYPE))] <- 'rain'
df$event_type[grep('hail', tolower(df$EVTYPE))] <- 'hail'
df$event_type[grep('flood|fld', tolower(df$EVTYPE))] <- 'flood'
df$event_type[grep('tornado|funnel|waterspout|devil', tolower(df$EVTYPE))] <- 'tornado'
df$event_type[grep('hurricane|depression', tolower(df$EVTYPE))] <- 'hurricane'
df$event_type[grep('lightning', tolower(df$EVTYPE))] <- 'lightning'

df$event_type[grep('fog', tolower(df$EVTYPE))] <- 'fog'
df$event_type[grep('fire', tolower(df$EVTYPE))] <- 'fire'
df$event_type[grep('drought', tolower(df$EVTYPE))] <- 'drought'
df$event_type[grep('landslide', tolower(df$EVTYPE))] <- 'landslide'

df$event_type[is.na(df$event_type)] <- 'other'

Lets look at the number of events in each group.

df2 <- ddply(df, .(event_type), summarise,
             count = length(EVTYPE)
             )

df2 <- arrange(df2, desc(count))

df2
   event_type  count
1        wind 363686
2        hail 290398
3       flood  86127
4     tornado  71686
5        snow  44080
6   lightning  15775
7        rain  12210
8        fire   4240
9       other   3233
10       heat   2958
11    drought   2512
12      ocean   2269
13        fog   1883
14       cold    892
15  hurricane    348

Wind and hail have many more occurences than other events. 363,686 and 290,398 respectively.
Flood, tornado, snow, lightning, rain have between 10,000 and 90,000 occurences.
The others have less than 5,000 occurences

Now we can break down human and economic damage by major groups.


Results

Across the United States, which types of events are most harmful with respect to population health?

Lets look at the human damage in Total

df2 <- ddply(df, .(event_type), summarise,
             fatalities = sum(FATALITIES), 
             injuries = sum(INJURIES)    
            )

df2 <- arrange(df2, desc(injuries))

df3 <- melt(df2, id.vars = c('event_type'))

p1 <- nPlot(value ~ event_type, group = 'variable', data = df3, type = 'multiBarHorizontalChart')
p1$chart(stacked = TRUE)
p1$show('inline', include_assets = TRUE, cdn = TRUE)

Figure 1: Total human fatalities and injuries by event type

Tornados have caused the most injuries and fatalities.
Wind, heat, flood, snow, and lightning are next.


Lets look at the human damage on Per Event

df2 <- ddply(df, .(event_type), summarise,
             fatalities = mean(FATALITIES), 
             injuries = mean(INJURIES)    
            )

df2 <- arrange(df2, desc(injuries))

df3 <- melt(df2, id.vars = c('event_type'))

p1 <- nPlot(value ~ event_type, group = 'variable', data = df3, type = 'multiBarHorizontalChart')
p1$chart(stacked = TRUE)
p1$show('inline', include_assets = TRUE, cdn = TRUE)

Figure 2: Human fatalities and injuries per event by event type

In terms of individual events Hurricanes and Heat events are far more dangerous.
Tornadoes, Ocean, and Cold are next.


Across the United States, which types of events have the greatest economic consequences?

Lets look at the economic damage in Total and Per Event.
Choose which group to view in the legend.

df2 <- ddply(df, .(event_type), summarise,
             property_damage_total = sum(PROPDMG),
             property_damage_mean = mean(PROPDMG)
            )

df2 <- arrange(df2, desc(property_damage_total))

df3 <- melt(df2, id.vars = c('event_type'))

p1 <- nPlot(value ~ event_type, group = 'variable', data = df3, type = 'multiBarHorizontalChart')
p1$chart(stacked = FALSE)
p1$show('inline', include_assets = TRUE, cdn = TRUE)

Figure 3: Total and mean property damage by event type

Tornados, Wind, and Flood have done the most damage. Per event, Hurricane, Tornados, and Lightning are the most damaging