Population Health & Economic Impact of Severe Weather Events since 1950

Prepared by: Colin Pistell

Synopsis:

In this report we examine the public heath and economic impact of severe weather events since 1950. We used data from the NOAA Storm Database to determine that tornadoes have had the greatest impact on public health while flooding has had the greatest impact on the economy. We then examine what locations around the country are most likely to be affected by tornadoes and flooding and determine that Texas, Florida and much of the South and Mid-west should take steps to be well prepared for the impact of tornadoes while the mid-west should be more prepared for flooding.

Data Processing:

First, we load some required packages and take a snapshot of our Session Info:

require("dplyr") 
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
require("ggplot2") 
## Loading required package: ggplot2
require("reshape2")
## Loading required package: reshape2
require("maps")
## Loading required package: maps
sessionInfo()
## R version 3.1.1 (2014-07-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] maps_2.3-9    reshape2_1.4  ggplot2_1.0.0 dplyr_0.3.0.2
## 
## loaded via a namespace (and not attached):
##  [1] assertthat_0.1   colorspace_1.2-4 DBI_0.3.1        digest_0.6.4    
##  [5] evaluate_0.5.5   formatR_1.0      grid_3.1.1       gtable_0.1.2    
##  [9] htmltools_0.2.6  knitr_1.7        magrittr_1.0.1   MASS_7.3-33     
## [13] munsell_0.4.2    parallel_3.1.1   plyr_1.8.1       proto_0.3-10    
## [17] Rcpp_0.11.3      rmarkdown_0.3.8  scales_0.2.4     stringr_0.6.2   
## [21] tools_3.1.1      yaml_2.1.13

Next, we load the dataset into R. The dataset must be downloaded from here (47megs) and unzipped into our working directory.

stormdata <- read.csv("repdata-data-StormData.csv")

Now we can get some basic info about the dataset:

dim(stormdata)
## [1] 902297     37
names(stormdata)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

We can see that there are 37 variables and 902297 rows. However, we only care about variables associated with Event Type, Casualty and Economic damage info, and perhaps State information. So we can filter our dataset accordingly:

strm <- select(stormdata, EVTYPE, STATE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>% group_by(EVTYPE)

Next we want to look at the recorded Event Types to see if we can eliminate some. First we summarize the data by how many of each Event appears:

tot <- summarize(strm, count = n()) %>% arrange(desc(count))
head(tot)
## Source: local data frame [6 x 2]
## 
##              EVTYPE  count
## 1              HAIL 288661
## 2         TSTM WIND 219940
## 3 THUNDERSTORM WIND  82563
## 4           TORNADO  60652
## 5       FLASH FLOOD  54277
## 6             FLOOD  25326
tail (tot)
## Source: local data frame [6 x 2]
## 
##                    EVTYPE count
## 1               WIND/HAIL     1
## 2 WINTER STORM HIGH WINDS     1
## 3  WINTER STORM/HIGH WIND     1
## 4 WINTER STORM/HIGH WINDS     1
## 5              Wintry Mix     1
## 6                     WND     1
table(tot$count)
## 
##      1      2      3      4      5      6      7      8      9     10 
##    489    140     60     43     28     23     13     14      7      9 
##     11     12     13     14     15     16     17     18     19     20 
##      6      4      5      3      1      2      3      2      5      3 
##     21     22     23     24     25     26     27     28     29     30 
##      3      3      5      2      2      1      1      1      1      2 
##     32     33     34     36     37     38     39     43     45     46 
##      3      1      1      2      1      1      1      1      2      1 
##     47     48     51     53     56     59     60     61     64     72 
##      1      2      1      2      1      1      1      2      1      1 
##     74     81     84     87     88     90     98    101    103    120 
##      2      1      1      1      1      1      1      1      1      1 
##    126    135    141    143    146    148    154    173    174    186 
##      1      1      1      1      1      1      1      1      2      1 
##    196    204    228    249    250    261    304    340    386    427 
##      1      1      1      1      1      1      1      1      1      1 
##    442    470    538    539    587    600    624    636    650    655 
##      1      1      1      1      1      1      1      1      1      1 
##    682    690    725    767   1002   1028   1104   1293   1342   1457 
##      1      1      1      1      1      1      1      1      1      1 
##   1533   1678   2006   2488   2719   2761   3392   3566   3796   5812 
##      1      1      1      1      1      1      1      1      1      1 
##   6175   6839   7026  11433  11723  15708  15754  20212  20843  25326 
##      1      1      1      1      1      1      1      1      1      1 
##  54277  60652  82563 219940 288661 
##      1      1      1      1      1

We can see that nearly 500 Event Types only have 1 entry! To keep the data tidy we’ll eliminate any Event Type that has fewer than 100 entries:

tot2 <- filter(tot, count >= 100) %>% arrange(desc(count))
head(tot2)
## Source: local data frame [6 x 2]
## 
##              EVTYPE  count
## 1              HAIL 288661
## 2         TSTM WIND 219940
## 3 THUNDERSTORM WIND  82563
## 4           TORNADO  60652
## 5       FLASH FLOOD  54277
## 6             FLOOD  25326
nrow(tot2)/nrow(tot) #Checking % of original rows left
## [1] 0.07005076
sum(tot2$count)/sum(tot$count) #Checking what % of information we retained
## [1] 0.9947756

As seen above, by retaining only 7% of the Event Types we still manage to capture over 99% of the records - and we will make our lives much easier down the road. Let’s filter our dataset accordingly and group by Event Type:

strm <- filter(strm, EVTYPE %in% tot2$EVTYPE) %>% group_by(EVTYPE)

Now we need to deal with the Damage multiplyers. First, we’ll remove any row with odd or nonsensical symbols, then convert the remaining symbols into numbers and finally calculate the true economic impact.

## Filter out weird symbols
table(strm$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5 
## 462659      1      8      3    214     25     13      4      4     27 
##      6      7      8      B      h      H      K      m      M 
##      4      5      1     22      0      6 423429      6  11152
table(strm$CROPDMGEXP)
## 
##             ?      0      2      B      k      K      m      M 
## 614249      6     18      1      7     20 281379      0   1903
strm2 <- filter(strm, CROPDMGEXP %in% c("", "B", "k", "K", "m", "M"))
strm2 <- filter(strm2, PROPDMGEXP %in% c("", "B", "H", "K", "m", "M"))
strm2$CROPDMGEXP <- tolower(strm2$CROPDMGEXP)
strm2$PROPDMGEXP <- tolower(strm2$PROPDMGEXP)

## Convert Crop multiplyer
x <- as.character(strm2$CROPDMGEXP)
x <- as.factor(x)
levels(x)
## [1] ""  "b" "k" "m"
levels(x) <- c(1, 1000000000, 1000, 1000000)
x <- as.numeric(as.character(x))
strm2$CROPDMGEXP <- x

## Convert Prop multiplyer
x <- as.character(strm2$PROPDMGEXP)
x <- as.factor(x)
levels(x)
## [1] ""  "b" "h" "k" "m"
levels(x) <- c(1, 1000000000, 100, 1000, 1000000)
x <- as.numeric(as.character(x))
strm2$PROPDMGEXP <- x

## Clean up
rm(x)

strm2$propertydamage <- strm2$PROPDMG * strm2$PROPDMGEXP
strm2$cropdamage <- strm2$CROPDMG * strm2$CROPDMGEXP

Finally, we can summarize our data into useful information for answering our questions:

dat <- summarize(strm2, Deaths = sum(FATALITIES), Injuries = sum(INJURIES), PropertyDmg = sum(propertydamage), CropDmg = sum(cropdamage))
colnames(dat)[1] <- "Event" #Give our Event Types a more human readable name

Results:

Public Health:

First, we’ll look at the public health impact of severe weather events. We’ll look at the top 10 causes of death and injury and see how many event types they have in common:

deaths <- arrange(dat, desc(Deaths)) %>% select(Event, Deaths) %>% head(n = 10)
injuries <- arrange(dat, desc(Injuries)) %>% select(Event, Injuries) %>% head(n = 10)
deaths
## Source: local data frame [10 x 2]
## 
##             Event Deaths
## 1         TORNADO   5630
## 2  EXCESSIVE HEAT   1903
## 3     FLASH FLOOD    978
## 4            HEAT    937
## 5       LIGHTNING    816
## 6       TSTM WIND    504
## 7           FLOOD    470
## 8     RIP CURRENT    368
## 9       HIGH WIND    246
## 10      AVALANCHE    224
injuries
## Source: local data frame [10 x 2]
## 
##                Event Injuries
## 1            TORNADO    91321
## 2          TSTM WIND     6957
## 3              FLOOD     6789
## 4     EXCESSIVE HEAT     6525
## 5          LIGHTNING     5230
## 6               HEAT     2100
## 7          ICE STORM     1975
## 8        FLASH FLOOD     1777
## 9  THUNDERSTORM WIND     1488
## 10              HAIL     1360
intersect(deaths$Event, injuries$Event) #How many in common?
## [1] "TORNADO"        "EXCESSIVE HEAT" "FLASH FLOOD"    "HEAT"          
## [5] "LIGHTNING"      "TSTM WIND"      "FLOOD"

We can see that both lists share most event types and that tornadoes are the worst offender for both deaths and injuries. We’ll use the fatalities list as it is the most severe and create a graph to visualize the relative impact of each event type. First we recombine the data into a useful form for graphing:

cas <- arrange(dat, desc(Deaths), desc(Injuries)) %>% select(Event, Injuries, Deaths) %>% head(n = 10) %>% melt(variable.name = "type", value.name = "casualties")
## Using Event as id variables

And then we create the plot:

p <- ggplot(cas, aes(x = reorder(Event, casualties), y = casualties))
p + geom_bar(stat = "identity", colour = "black", fill = "lightblue") + facet_grid(type ~ ., scales = "free_y") + labs(x = "Event Type", y = "Casualties") + ggtitle("Public Health Impact of Severe Weather Events")

Economic Impact:

Now we will examine the economic impact of severe weather events. We’ll examine both property damage and crop damage, but since we care about overall economic impact we will be most interested in the combination of both values.

property <- arrange(dat, desc(PropertyDmg)) %>% select(Event, PropertyDmg) %>% head(n = 10)
crop <- arrange(dat, desc(CropDmg)) %>% select(Event, CropDmg) %>% head(n = 10)
property
## Source: local data frame [10 x 2]
## 
##             Event  PropertyDmg
## 1           FLOOD 144657709807
## 2         TORNADO  56936985483
## 3     STORM SURGE  43323536000
## 4     FLASH FLOOD  16140811717
## 5            HAIL  15732262277
## 6       HURRICANE  11868319010
## 7  TROPICAL STORM   7703890550
## 8    WINTER STORM   6688497250
## 9       HIGH WIND   5270046260
## 10    RIVER FLOOD   5118945500
crop
## Source: local data frame [10 x 2]
## 
##           Event     CropDmg
## 1       DROUGHT 13972566000
## 2         FLOOD  5661968450
## 3   RIVER FLOOD  5029459000
## 4     ICE STORM  5022110000
## 5          HAIL  3000954453
## 6     HURRICANE  2741910000
## 7   FLASH FLOOD  1420727100
## 8  EXTREME COLD  1292973000
## 9  FROST/FREEZE  1094086000
## 10   HEAVY RAIN   733399800
dat$TotalDmg <- dat$PropertyDmg + dat$CropDmg #Create a new column for total economic impact
total <- arrange(dat, desc(TotalDmg)) %>% select(Event, TotalDmg) %>% head(n = 10)
total
## Source: local data frame [10 x 2]
## 
##             Event     TotalDmg
## 1           FLOOD 150319678257
## 2         TORNADO  57301935593
## 3     STORM SURGE  43323541000
## 4            HAIL  18733216730
## 5     FLASH FLOOD  17561538817
## 6         DROUGHT  15018672000
## 7       HURRICANE  14610229010
## 8     RIVER FLOOD  10148404500
## 9       ICE STORM   8967037810
## 10 TROPICAL STORM   8382236550

Here, we see that flooding has the greatest impact, followed by Tornadoes. We can once again creat a graph to visualize the impact of each event. First, we recombine the data into a suitable form:

econ <- arrange(dat, desc(PropertyDmg), desc(CropDmg)) %>% select(Event, PropertyDmg, CropDmg) %>% head(n = 10) %>% melt(variable.name = "type", value.name = "cost")
## Using Event as id variables

And then plot the graph:

q <- ggplot(econ, aes(x = reorder(Event, cost), y = cost, fill = type, order = desc(type)))
q + geom_bar(stat = "identity", colour = "black") + labs(x = "Event Type", y = "Total Cost") + scale_fill_brewer(palette = "Blues") + ggtitle("Economic Impact of Severe Weather Events") + theme(axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1))

The economic impact of flooding is clear. Now that we know the impact of tornadoes and flooding, perhaps it would be a good idea to get a better understanding of what regions of the country are most at risk for these events. To do this, first we reorganize our data to filter only tornado and flood events and group it by State. Then we can summarize to find out how many of each event each state has experienced:

tornado <- filter(strm2, EVTYPE == "TORNADO" | EVTYPE == "FLOOD") %>% select(STATE, EVTYPE) %>% group_by(STATE, EVTYPE) %>% summarize(Count = n())
head(tornado)
## Source: local data frame [6 x 3]
## Groups: STATE
## 
##   STATE  EVTYPE Count
## 1    AK   FLOOD   208
## 2    AK TORNADO     3
## 3    AL   FLOOD   170
## 4    AL TORNADO  2101
## 5    AR   FLOOD   525
## 6    AR TORNADO  1987

Next we create a vector of State abbreviations to use when merging datasets:

orderedstates <- c("AL", "AZ", "AR", "CA", "CO", "CT", "DE", "DC", "FL", "GA", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD", "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ", "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC", "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY")

We’ll only be looking at the lower 48 states, so we must filter our tornado dataset appropriately

tornado <- filter(tornado, STATE %in% orderedstates)

Next we load in map data, plug in our State abbreviations and then merge the datasets.

map <- map_data("state")
map$region <- as.factor(map$region)
levels(map$region) <- orderedstates

## Merge the two datasets

tormap <- merge(tornado, map, by.x = "STATE", by.y = "region")
tormap <- arrange(tormap, group, order) #Must resort so ggplot will draw the polygons appropriately

Finally, we can create the plot:

## Create a clean theme
theme_clean <- function(base_size = 12) {
        require(grid)
        theme_grey(base_size) %+replace%
                theme(
                        axis.title = element_blank(), 
                        axis.text = element_blank(), 
                        panel.background = element_blank(), 
                        panel.grid = element_blank(), 
                        axis.ticks.length = unit(0, "cm"), 
                        axis.ticks.margin = unit(0, "cm"), 
                        panel.margin = unit(0, "lines"), 
                        plot.margin = unit(c(0, 0, 0, 0), "lines"), 
                        complete = TRUE
                        )
}
## Create the plot
m <- ggplot(tormap, aes(x = long, y = lat, group = group, fill = Count))
m + geom_polygon(colour = "black") + scale_fill_gradient2(low = "#559999", mid = "grey90", high = "#D55E00", midpoint = median(tormap$Count)) + expand_limits(x = tormap$long, y = tormap$lat) + coord_map("mercator") + facet_grid(EVTYPE ~ .) + theme_clean() + ggtitle("Severe Event Locations")
## Loading required package: grid

Conclusion:

We can see that Texas along with most of the South and Mid-West should be prepared to deal with the impact of tornadoes while the midwest should be more prepared for flooding.