Top Sources of Damage from Environmental Events to Humans and the Economy

Synopsis

For this assessment we are using data Storm Data collected by the US National Oceanic and Atmospheric Administration (NOAA). We analyzed the data, answering two questions:

Across the United States, which types of events are most harmful with respect to population health?

We found that tornados are by far the environmental event that causes the most harm to humans, clocking in at 46% of all harm done to humans. Other top sources of harm (> 4% of all harm) were excessive heat, flash flood, heat and lightning.

Across the United States, which types of events have the greatest economic consequences?

We found that largest sources of economic damage (> 4% or all damage) were: flood (33%), hurricane/typhoon (15%), tornado (12%), and storm surge (9%).

Data Processing

We begin by setting up some environment variables.

require(dplyr)
require(ggplot2)

We download the dataset from: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2

Next we load the data into R

df <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))

We do some preliminary inspection of the data.

dim(df)

## [1] 902297     37

df[1,]

##   STATE__          BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1 4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                        14   100 3   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15      25          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1

levels(df$EVTYPE) %>% length

## [1] 985

A once over of some basics is very revealing. There are more variables in the data set than we are really interested in, and there are too many event types to analyze them all individually.

Population Health

For human health, the key variables are EVTYPE, FATALITIES and INJURIES. We quantify if something is bad for human health by determining the resultant number of fatalities and injuries.

Injuries and fatalities are equally bad for human health, so we will use a completely subjective scaling factor that relates injuries to fatalities. We choose \(F = 20 I\), meaning fatalities are 20 times worse than injuries.

Please note that this ratio is not given for scientific reasons, but for personal feelings. I am open to different scaling factors for fatality vs injury.

With this, we find the fraction of all harm from human health.

health <- df %>% 
    select(c(EVTYPE, FATALITIES, INJURIES)) %>% 
    mutate(HARMFACTOR = FATALITIES + INJURIES/20)

health <- health %>% 
    mutate(HARMpcnt = HARMFACTOR / (HARMFACTOR %>% sum))

healthImpact <- group_by(health, EVTYPE) %>% 
    summarize(HARMsum = sum(HARMpcnt)) %>% 
    arrange(desc(HARMsum))

healthImpact %>% as.tbl

## Source: local data frame [985 x 2]
## 
##            EVTYPE    HARMsum
##            (fctr)      (dbl)
## 1         TORNADO 0.46006567
## 2  EXCESSIVE HEAT 0.10054620
## 3       LIGHTNING 0.04859865
## 4     FLASH FLOOD 0.04811830
## 5            HEAT 0.04699748
## 6       TSTM WIND 0.03842112
## 7           FLOOD 0.03650875
## 8     RIP CURRENT 0.01712116
## 9       HIGH WIND 0.01374970
## 10   WINTER STORM 0.01227031
## ..            ...        ...

Economic Consequences

For economic consequences, the key variables are EVTYPE, PROPDMG and PROPDMGEXP, CROPDMG and CROPDMGEXP.

We say that property and crop damage are equally bad. But we need to put all the damages onto a common scale. First we look at all the levels of our EXP variables.

econ <- df %>% 
    select(c(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP))

levels(econ$PROPDMGEXP)

##  [1] ""  "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"

levels(econ$CROPDMGEXP)

## [1] ""  "?" "0" "2" "B" "k" "K" "m" "M"

The EXP variables tell us what factor of 10 we should be muliplying by. Most EXP values make sense, but somevalues are confusing. We look those EXP examples in closer detail.

filter(econ,PROPDMGEXP %in% c("?","+", "-"))

##                 EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1     BREAKUP FLOODING      20          +       0           
## 2            HIGH WIND      20          +       0           
## 3  FLOODING/HEAVY RAIN       2          +       0           
## 4   THUNDERSTORM WINDS       0          ?       0           
## 5           HIGH WINDS      15          +       0           
## 6              TORNADO      60          +       0           
## 7          FLASH FLOOD       0          ?       0           
## 8          FLASH FLOOD       0          ?       0           
## 9            HIGH WIND      15          -       0           
## 10   THUNDERSTORM WIND       0          ?       0           
## 11                HAIL       0          ?       0           
## 12                HAIL       0          ?       0           
## 13                HAIL       0          ?       0           
## 14  THUNDERSTORM WINDS       0          ?       0

filter(econ,CROPDMGEXP %in% c("?","+", "-"))

##               EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1  FLASH FLOOD WINDS    0.41                  0          ?
## 2 THUNDERSTORM WINDS    0.50          K       0          ?
## 3 THUNDERSTORM WINDS    0.50          K       0          ?
## 4 THUNDERSTORM WINDS    0.00                  0          ?
## 5  FLOOD/FLASH FLOOD  400.00          K       0          ?
## 6  FLOOD/FLASH FLOOD    0.50          M       0          ?
## 7 THUNDERSTORM WINDS   80.00          K       0          ?

filter(econ,PROPDMGEXP == "") %>% head

##      EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TSTM WIND       0                  0           
## 2      HAIL       0                  0           
## 3      HAIL       0                  0           
## 4 TSTM WIND       0                  0           
## 5      HAIL       0                  0           
## 6 TSTM WIND       0                  0

The three most confusing symbols are ?, + and -. It seems that for these symbols, and for the null symbol we want to be multiplying by 1.

Now we’ll create a lookup table that lets us convert EXP values to real numbers.

lookup <- data.frame(EXP = 0:9) %>% 
    mutate(REPLACE = 10^EXP)

lookup <- rbind(lookup,
                c("",1),
                c("+",1),
                c("-",1),
                c("?",1),
                c("h",100),
                c("H",100),
                c("k",1000),
                c("K",1000),
                c("m",10^6),
                c("M",10^6),
                c("b",10^9),
                c("B",10^9))

lookup$REPLACE <- as.numeric(lookup$REPLACE)

We create new variables that express the total damage.

hash <- function(x) {
        sel <- which(lookup$EXP == x)
        lookup$REPLACE[sel]
        }

PFACTOR <- sapply(econ$PROPDMGEXP,hash)
CFACTOR <- sapply(econ$CROPDMGEXP,hash)

econ <- cbind(econ %>% select(-c(PROPDMGEXP,CROPDMGEXP)),PFACTOR,CFACTOR)

With them we sum the total economic damage across categories.

econ <- mutate(econ, DMGtotal = PROPDMG*PFACTOR + CROPDMG*CFACTOR)
econ <- mutate(econ, DMGpcnt = DMGtotal / (sum(econ$DMGtotal)))

econImpact <- group_by(econ, EVTYPE) %>% 
    summarize(DMGsum = sum(DMGpcnt)) %>% 
    arrange(desc(DMGsum))

econImpact %>% as.tbl

## Source: local data frame [985 x 2]
## 
##               EVTYPE     DMGsum
##               (fctr)      (dbl)
## 1              FLOOD 0.31491835
## 2  HURRICANE/TYPHOON 0.15065857
## 3            TORNADO 0.12017356
## 4        STORM SURGE 0.09076242
## 5               HAIL 0.03930459
## 6        FLASH FLOOD 0.03822099
## 7            DROUGHT 0.03146398
## 8          HURRICANE 0.03060830
## 9        RIVER FLOOD 0.02126081
## 10         ICE STORM 0.01878587
## ..               ...        ...

Results

Population Health

Here we present the top sources of harm to humans. A source of harm is considered a top source if it contributed to more than 4% of all harm caused to humans.

p <- ggplot(data = filter(healthImpact, HARMsum>.04), 
       aes(x=EVTYPE, y=HARMsum*100, fill = EVTYPE)) 

p + geom_bar(stat = "identity") + 
    labs(y = "Percent of Total Harm Caused") +
    labs(title = "Largest Sources of Environmental Harm") + 
    scale_fill_brewer(palette = "Set1")

We see that of the 5 top scources of harm caused to humans, the lion’s share of harm comes from Tornados at 46%, followed by Excessive Heat at 10%, with all other sources being at the 5% or less level.

Economic Damage

Here we present the top sources of economic damage. A source of damage is considered a top source if it contributed to more than 4% of all economic damage.

p <- ggplot(data = filter(econImpact, DMGsum>.04), 
       aes(x=EVTYPE, y=DMGsum*100, fill = EVTYPE)) 

p + geom_bar(stat = "identity") + 
    labs(y = "Percent of Economic Damage") +
    labs(title = "Largest Sources of Economic Damage") + 
    scale_fill_brewer(palette = "Dark2")

We found that largest sources of economic damage were: flood (33%), hurricane/typhoon (15%), tornado (12%), and storm surge (9%).

Top Sources of Damage from Environmental Events to Humans and the Economy

Kenneth Osborne

November 18, 2015

Synopsis

Data Processing

Population Health

Economic Consequences

Results

Population Health

Economic Damage