Effects of weather events on population health and property

Synopsis

The goal of this analysis is to investigate what kind of damages extreme weather events caueses to both property and population health. We will use the a storm database provided by the U.S. National Oceanic and Atmospheric Administration (NOAA). Due to earlier years in the database having limited records of different kinds of weather events I decided to only use data starting from 1996. The event type-values did not contain consistent values so I had to resort to regex replacements of the varying values. The economic damages analysis did also emit all records with invalid exponents to the reported economic damages, wich caused a size reduction of the data set. The analysis showed that tornados are by a large margin causing the most personal injuries, however excessive heat was shown to be the most fatal. The results of the economic analysis shows that floods are cuasing the greatest economical damages, followed by hail. This shows that the economic incentives to prevent damage due to weather events is not at it’s greatest where there is the biggest risk of personal injurie or even death.

Obtaining the data

The data can be downloaded here and should be saved as data/StormData.csv.bz2 relative to the working directory.

Documentation of the dataset is available here.

Data processing

First we need to load in the data. Since read.table supports reading directly from the compressed file we don’t need to do any extraction.

stormdata <- read.csv('data/StormData.csv.bz2')

Next we’ll refine the dataset to emit all of the rows which does not have any fatalities, injuries, property damage or crop damage value above 0. We’ll also turn all beginning dates into Date-objects.

stormdmg <- stormdata[which(stormdata$INJURIES > 0 | stormdata$FATALITIES > 0 | stormdata$PROPDMG > 0 | stormdata$CROPDMG > 0),]
stormdmg$BGN_DATE <- as.Date(stormdmg$BGN_DATE, '%m/%d/%Y')

According to the data specification, we’re supposed to have 48 different event types, however if we look at the current data, that is not the case.

str(stormdmg$EVTYPE)
##  Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...

So we need to classify each type into it’s correct event type. We’ll also subset the data to only include records from 1996 or later since earlier records did not contain as complete data of different kinds of weather events.

library(lubridate)

# Make wrapper function to make it easier to set options for the gsub command
mgsub <- function (pattern, replace, x) {
        gsub(pattern, replace, x, ignore.case=TRUE)
}

name_correct <- function (values) {
        values <- mgsub('\\(.*?\\)', '', values)
        values <- mgsub('avalanche', 'Avalanche', values)
        values <- mgsub('low.*?tide', 'Astronomical Low Tide', values)
        values <- mgsub('blizzard', 'Blizzard', values)
        values <- mgsub('coastal.*?flood', 'Coastal Flood', values)
        values <- mgsub('^.*?Extreme.*?[(Cold).*?(Wind)].*$', 'Extreme Cold/Wind Chill', values)
        values <- mgsub('[(Cold).*?(Wind)].*?Chill', 'Cold/Wind Chill', values)
        values <- mgsub('(^.*?LANDSPOUT.*$)|(^.*?rock slide.*$)|(^.*?landslide.*$)|(^.*?mudslide.*$)|(^.*?mud slide.*$)', 'Debris Flow', values)
        values <- mgsub('(^FOG$)|(Dense.*?Fog)', 'Dense Fog', values)
        values <- mgsub('Dense.*?Smoke', 'Dense Smoke', values)
        values <- mgsub('Drought', 'Drought', values)
        values <- mgsub('(Dust.*?Devil)|(WHIRLWIND)', 'Dust Devil', values)
        values <- mgsub('(Dust.*?Storm)|(BLOWING DUST)', 'Dust Storm', values)
        values <- mgsub('(record heat)|(Excessive.*?Heat)|(warm weather)', 'Excessive Heat', values)
        values <- mgsub('(Frost.*?Freeze)|(^FROST$)|(^GLAZE$)|(HARD FREEZE)|(blowing snow)|(BLACK ICE)', 'Frost/Freeze', values)
        values <- mgsub('Funnel.*?Cloud', 'Funnel Cloud', values)
        values <- mgsub('Freezing.*?Fog', 'Freezing Fog', values)
        values <- mgsub('(^.*?Marine.*?High.*?Wind.*$)|(^.*?high seas.*$)|(HEAVY SEAS)', 'Marine High Wind', values)
        values <- mgsub('(WIND AND WAVE)|(^.*?Marine.*?Strong.*?Wind.*$)|(^.*?rough seas.*$)|(HIGH WATER)', 'Marine Strong Wind', values)
        values <- mgsub('(^.*?Marine.*?TSTM Wind.*?$)|(^Marine.*?Thunderstorm).*?Wind|(COASTALSTORM)|(COASTAL STORM)', 'Marine Thunderstorm Wind', values)
        values <- mgsub('(^.*?MICROBURST.*$)|(^TSTM WIND.*$)|(^ TSTM WIND.*?)|(NON-Marine TSTM.*$)|(^Thunderstorm.*?Wind$)|(^thunderstorm$)|(^.*?THUNDERSTORM WIND.*$)', 'Thunderstorm Wind', values)
        values <- mgsub('(NON-SEVERE WIND DAMAGE)|(NON-TSTM WIND)|(NON TSTM WIND)|(^Strong.*?Wind.*$)|(^wind$)|(^.*?Gusty Wind.*$)|(^.*?gradient wind.*$)', 'Strong Wind', values)
        values <- mgsub('Tropical.*?Depression', 'Tropical Depression', values)
        values <- mgsub('Tropical.*?Storm', 'Tropical Storm', values)
        values <- mgsub('.*?Tsunami.*$', 'Tsunami', values)
        values <- mgsub('(Volcanic.*?Ash)|(COASTAL EROSION)', 'Volcanic Ash', values)
        values <- mgsub('Rip.*?Current.*$', 'Rip Current', values)
        values <- mgsub('Seiche', 'Seiche', values)
        values <- mgsub('Sleet', 'Sleet', values)
        values <- mgsub('(Storm.*?Surge.*?Tide)|(storm surge)|(.*?tide.*)|(HIGH WATER)|(HIGH SWELLS)', 'Storm Surge/Tide', values)
        values <- mgsub('Winter.*?Storm', 'Winter Storm', values)
        values <- mgsub('Waterspout', 'Waterspout', values)
        values <- mgsub('(BRUSH FIRE)|(Wildfire)|(Wild.*?fire)|(forest.*?fire)', 'Wildfire', values)
        values <- mgsub('Marine.*?Hail', 'Marine Hail', values)
        values <- mgsub('(Hail)|(small hail)', 'Hail', values)
        values <- mgsub('(Heat)|(UNSEASONABLY WARM)', 'Heat', values)
        values <- mgsub('(^.*?Heavy Rain.*$)|(Heavy.*?Rain)|(^rain$)|(UNSEASONAL RAIN)', 'Heavy Rain', values)
        values <- mgsub('(EXCESSIVE SNOW)|(Heavy.*?Snow)|(snow squalls)|(snow squall)', 'Heavy Snow', values)
        values <- mgsub('(^.*?High.*?Surf.*$)|(^.*?HEAVY SURF.*$)|(^.*?HAZARDOUS SURF.*$)', 'High Surf', values)
        values <- mgsub('^.*?High.*?Wind.*$', 'High Wind', values)
        values <- mgsub('(^.*?Hurricane.*$)|(^.*?typhoon.*$)', 'Hurricane (Typhoon)', values)
        values <- mgsub('Ice.*?Storm', 'Ice Storm', values)
        values <- mgsub('Lake.*?Effect.*?Snow', 'Lake-Effect Snow', values)
        values <- mgsub('Lightning', 'Lightning', values)
        values <- mgsub('(HYPERTHERMIA/EXPOSURE)|(FREEZING RAIN)|(AGRICULTURAL FREEZE)|(FREEZING DRIZZLE)|(COLD AND SNOW)|(RAIN.*?SNOW)|(COLD WEATHER)|(FALLING SNOW.*?ICE)|(ICE ON ROAD)|(LATE SEASON SNOW)|(.*?winter storm.*)|(UNSEASONABLY COLD)|(.*?winter weather.*?mix)|(.*?[(icy)(ice)] roads)|(snow and ice)|(winter weather.*?mix)|(Winter weather mix)|(wintry mix)|(^cold$)|(^freeze$)|(^Snow$)|(Light.*?Snow)|(Light.*?Freezing.*?Rain)|(Winter.*?Weather)', 'Winter Weather', values)
        values <- mgsub('Winter Weather.*', 'Winter Weather', values)
        values <- mgsub('Flash.*?Flood', 'Flash Flood', values)
        values <- mgsub('Lakeshore.*?Flood', 'Lakeshore Flood', values)
        values <- mgsub('(DAM BREAK)|(ROGUE WAVE)|(DROWNING)|(.*?Flood.*)|(urban.*?sml stream fld)|(tidal flooding)', 'Flood', values)
        values <- mgsub('Tornado', 'Tornado', values)
        factor(mgsub('(.*?MIXED.*$)|(.*?Other.*$)', 'Other', values))
}

storm_bgn_1996 <- stormdmg[which(year(stormdmg$BGN_DATE) > 1996),]
storm_bgn_1996$EVTYPE <- name_correct(storm_bgn_1996$EVTYPE)

Now we prepare the data summaries to be plotted to answer which type of events has the greatest impact on population health and property damage. In the economical damages data set we will emit all records that has an exponential that does not conform to the specified K, M or B values. This since we can’t know for sure what for instance the numerical values are supposed to mean.

library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# First we prepare the health data
storm_health <- storm_bgn_1996 %>% group_by(EVTYPE) %>% 
        summarize(injuries=sum(INJURIES), deaths=sum(FATALITIES))

# tiny helper function to detect whether a string contains another string
string_contains <- function (look_for, in_string) {
        length(grep(look_for, in_string, ignore.case=TRUE)) > 0
}

# Per documentation we multiple by 10 ^ 3 for K, 10 ^ 6 for M and 10 ^ 9 for B
exp_multiply <- function (dmg, expo) {
        if (string_contains('K', expo)) {
                return(dmg*(10^3))
        }
        
        if (string_contains('M', expo)) {
                return(dmg*(10^6))
        }
        
        if (string_contains('B', expo)) {
                return(dmg*(10^9))
        }
        
        FALSE
}

storm_prop_dmg <- storm_bgn_1996 %>%
        filter(exp_multiply(PROPDMG, PROPDMGEXP), exp_multiply(CROPDMG, CROPDMGEXP)) %>%
        mutate(property.damages=exp_multiply(PROPDMG, PROPDMGEXP),
               crop.damages=exp_multiply(CROPDMG, CROPDMGEXP),
               economic.damages=property.damages + crop.damages) %>%
        group_by(EVTYPE) %>%
        summarise(total.property.damages=sum(property.damages),
                  total.crop.damages=sum(crop.damages),
                  total.economic.damages=sum(economic.damages))

Result

We’ll use the multiplot-function made by Winston Chang.

# Multiple plot function
#
# ggplot objects can be passed in ..., or to plotlist (as a list of ggplot objects)
# - cols:   Number of columns in layout
# - layout: A matrix specifying the layout. If present, 'cols' is ignored.
#
# If the layout is something like matrix(c(1,2,3,3), nrow=2, byrow=TRUE),
# then plot 1 will go in the upper left, 2 will go in the upper right, and
# 3 will go all the way across the bottom.
#
multiplot <- function(..., plotlist=NULL, cols) {
    require(grid)

    # Make a list from the ... arguments and plotlist
    plots <- c(list(...), plotlist)

    numPlots = length(plots)

    # Make the panel
    plotCols = cols                          # Number of columns of plots
    plotRows = ceiling(numPlots/plotCols) # Number of rows needed, calculated from # of cols

    # Set up the page
    grid.newpage()
    pushViewport(viewport(layout = grid.layout(plotRows, plotCols)))
    vplayout <- function(x, y)
        viewport(layout.pos.row = x, layout.pos.col = y)

    # Make each plot, in the correct location
    for (i in 1:numPlots) {
        curRow = ceiling(i/plotCols)
        curCol = (i-1) %% plotCols + 1
        print(plots[[i]], vp = vplayout(curRow, curCol ))
    }

}

Weather events and effects on population health

library(ggplot2)

g_health_fatalities <- qplot(x=storm_health$EVTYPE, y=storm_health$deaths, 
                             geom='bar', 
                             stat='identity',
                             ylab='Number of fatalities',
                             xlab='',
                             main='Deaths due to weather events in the US since 1996') +
        theme(axis.text.x = element_text(angle = 90, hjust = 1))

g_health_injuries <- qplot(x=storm_health$EVTYPE, y=storm_health$injuries, 
                             geom='bar', 
                             stat='identity',
                             ylab='Number of injuries',
                             xlab='',
                             main='Injuries due to weather events in the US since 1996') +
        theme(axis.text.x = element_text(angle = 90, hjust = 1))

multiplot(g_health_fatalities, g_health_injuries, cols=1)
## Loading required package: grid

Plot showing the number of fatalities and injuries related to weather events in the US since 1996.

As seen above it seems like tornados causes by far the most injuries while excessive heat causes the most deaths.

Weather events and their economic damage

# storm_prop_dmg
g_economic_damages <- qplot(x=storm_prop_dmg$EVTYPE, y=storm_prop_dmg$total.economic.damages, 
                             geom='bar', 
                             stat='identity',
                             ylab='Economic damages in $',
                             xlab='',
                             main='Economic damages due to weather events in the US since 1996') +
        theme(axis.text.x = element_text(angle = 90, hjust = 1))

g_economic_damages

Plot showing the economic damages due to weather events since 1996. Both property and crop damages are included. Values with invalid exponents have been emitted.

From the plot above we can tell that flooding is causing the most economical damages.