Peer Assessment 2: Evaluation on the Impact of Severe Weather Events on Population Health and Economy

Author: Nathan Smith

Synopsis

In this paper I will discuss which types of events are most harmful with respect to population health and which types of events have the greatest economic consequences. The study is focused on the United States exclusively. The dataset I used is from the National Weather Service Storm Data.

Data Processing

This file is rather large so will take a minute load into R. Then we’ll take a look at the structure of the file.

setwd("/Users/nathansmith/")
data <- read.csv("repdata_data_StormData.csv", header=TRUE)
str(data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels ""," Christiansburg",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels ""," CANTON"," TULIA",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","%SD",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","\t","\t\t",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

How many unique event types are there?

length(unique(data$EVTYPE))
## [1] 985

PreProcessing Data to answer Population Health impact of storms

The variables in the dataset that contain information regarding Population Health are:

*FATALITIES

*INJURIES

So let’s make a new dataset that contains the event type and these relevant values.

library(data.table)
PopHealth <- as.data.table(data[,c("EVTYPE", "FATALITIES", "INJURIES")])
head(PopHealth)
##     EVTYPE FATALITIES INJURIES
## 1: TORNADO          0       15
## 2: TORNADO          0        0
## 3: TORNADO          0        2
## 4: TORNADO          0        2
## 5: TORNADO          0        2
## 6: TORNADO          0        6

Now we need a table that summarizes how many fatalities and injuries per event type so we can look at total impact over time. We’ll look at the top ten in descending order.

fatals <- PopHealth[,sum(FATALITIES), by=EVTYPE][order(V1, decreasing=TRUE)]
head(fatals,10)
##             EVTYPE   V1
##  1:        TORNADO 5633
##  2: EXCESSIVE HEAT 1903
##  3:    FLASH FLOOD  978
##  4:           HEAT  937
##  5:      LIGHTNING  816
##  6:      TSTM WIND  504
##  7:          FLOOD  470
##  8:    RIP CURRENT  368
##  9:      HIGH WIND  248
## 10:      AVALANCHE  224
injur <- PopHealth[,sum(INJURIES), by=EVTYPE][order(V1, decreasing=TRUE)]
head(injur,10)
##                EVTYPE    V1
##  1:           TORNADO 91346
##  2:         TSTM WIND  6957
##  3:             FLOOD  6789
##  4:    EXCESSIVE HEAT  6525
##  5:         LIGHTNING  5230
##  6:              HEAT  2100
##  7:         ICE STORM  1975
##  8:       FLASH FLOOD  1777
##  9: THUNDERSTORM WIND  1488
## 10:              HAIL  1361

PreProcessing Data to answer Economic impact of storms

The variables in the dataset that contain information regarding Economic impact are:

*PROPDMG (i.e., property damage)

*PROPDMGEXP

*CROPDMG (i.e., crop damage)

*CROPDMGEXP

EconImpact <- as.data.table(data[c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")])
head(EconImpact)
##     EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1: TORNADO    25.0          K       0           
## 2: TORNADO     2.5          K       0           
## 3: TORNADO    25.0          K       0           
## 4: TORNADO     2.5          K       0           
## 5: TORNADO     2.5          K       0           
## 6: TORNADO     2.5          K       0

To isolate the PROPDMGEXP variables so we can use them, we need to find out which ones are used most frequently.

table(EconImpact$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5 
## 465934      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      h      H      K      m      M 
##      4      5      1     40      1      6 424665      7  11330

We need to process the PROPDMGEXP so we can get to the cost (in $).

EconImpact$PROPCOST <- with(EconImpact, ifelse(PROPDMGEXP == 'B', PROPDMG*1000000000,
                                ifelse(PROPDMGEXP == 'M', PROPDMG*1000000,
                                ifelse(PROPDMGEXP == 'K', PROPDMG*1000,0))))
head(EconImpact)
##     EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP PROPCOST
## 1: TORNADO    25.0          K       0               25000
## 2: TORNADO     2.5          K       0                2500
## 3: TORNADO    25.0          K       0               25000
## 4: TORNADO     2.5          K       0                2500
## 5: TORNADO     2.5          K       0                2500
## 6: TORNADO     2.5          K       0                2500
Property <- EconImpact[,sum(PROPCOST), by=EVTYPE][order(V1, decreasing=TRUE)]
head(Property,10)
##                EVTYPE           V1
##  1:             FLOOD 144657709800
##  2: HURRICANE/TYPHOON  69305840000
##  3:           TORNADO  56925660480
##  4:       STORM SURGE  43323536000
##  5:       FLASH FLOOD  16140811510
##  6:              HAIL  15727366720
##  7:         HURRICANE  11868319010
##  8:    TROPICAL STORM   7703890550
##  9:      WINTER STORM   6688497250
## 10:         HIGH WIND   5270046260

Results

Which type of events are the most harmful with respect to population health?

library(ggplot2)
library(plyr)

First we’ll look at fatality counts and then at injury counts.

ggplot(fatals[1:10,],aes(reorder(EVTYPE, desc(V1)), V1)) + 
    geom_bar(colour="black", fill="red3", width=.7, stat="identity") + 
    theme(axis.text.x = element_text(angle = 70,hjust = 1)) + guides(fill=FALSE) +
    xlab("Event Type") + ylab("Total Fatalities") + ggtitle("Top Ten Event Types for Fatalities")

ggplot(injur[1:10,],aes(reorder(EVTYPE, desc(V1)), V1)) + 
    geom_bar(colour="black", fill="navyblue", width=.7, stat="identity") + 
    theme(axis.text.x = element_text(angle = 70,hjust = 1)) + guides(fill=FALSE) +
    xlab("Event Type") + ylab("Total Injuries") + ggtitle("Top Ten Event Types for Injuries")

It looks like tornadoes cause (by far) the most fatalities and injuries.

Which type of events have the greatest economic consequences?

As far as property damage cost goes, floods have the largest economic impact.

ggplot(Property[1:10,],aes(reorder(EVTYPE, desc(V1)), V1)) + 
    geom_bar(colour="black", fill="grey69", width=.7, stat="identity") + 
    theme(axis.text.x = element_text(angle = 70,hjust = 1)) + guides(fill=FALSE) +
    xlab("Event Type") + ylab("Total Property Damage ($)") + ggtitle("Top Ten Event Types by Property Damage")