Analysis of Weather Event Data on Public Health and Economics.

Synopsis

The purpose of this study is to determine the effect that weather events have on public health and economic activity. Toward that end the National Weather Services storm data was analyzed.

Data Processing

Reading Data

The National Weather Service storm data was originaly downloaded from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 on May 25, 2014. It consists of a csv file that has been compressed with the bz2 format. The following code decompresses and reads the data into memory.

data = read.csv(bzfile("repdata-data-StormData.csv.bz2"))

Finding effect of weather events on public health

The following R code is intended to find the top 10 weather events that affect public health. Injuries and fatalities are summed The number of injuries and fatalities are summed up to form the total Effect score assigned to the event type

healthData = aggregate(INJURIES ~ EVTYPE, data = data, sum)
healthData$deaths = aggregate(FATALITIES ~ EVTYPE, data = data, sum)$FATALITIES
healthData$totalEffect = healthData$INJURIES + healthData$deaths

top10HealthThreats <- healthData[with(healthData, order(totalEffect, decreasing = TRUE)), 
    ][1:10, ]

Finding effect of weather events on economic indicators

The following R code finds the top 10 weather events that affect economic damage.

First we start out by inflating all the damage values as indicated by the EXP field


inflateDamageValues <- function(data, expColname, colname) {
    if (sum(data[[expColname]] == "") > 0) {
        data[data[[expColname]] == "", ][[expColname]] = 1
    }

    if (sum(data[[expColname]] == "-") > 0) {
        data[data[[expColname]] == "-", ][[expColname]] = 1
    }

    if (sum(data[[expColname]] == "?") > 0) {
        data[data[[expColname]] == "?", ][[expColname]] = 1
    }

    if (sum(data[[expColname]] == "+") > 0) {
        data[data[[expColname]] == "+", ][[expColname]] = 1
    }

    if (sum(data[[expColname]] == "B") > 0) {
        data[data[[expColname]] == "B", ][[expColname]] = 9
    }

    if (sum(data[[expColname]] == "h" || data[[expColname]] == "H") > 0) {
        data[data[[expColname]] == "h" || data[[expColname]] == "H", ][[expColname]] = 2
    }

    if (sum(data[[expColname]] == "K") > 0) {
        data[data[[expColname]] == "K", ][[expColname]] = 3
    }

    if (sum(data[[expColname]] == "m" || data[[expColname]] == "M") > 0) {
        data[data[[expColname]] == "m" || data[[expColname]] == "M", ][[expColname]] = 6
    }

    data[[colname]] = data[[colname]]^as.numeric(data[[expColname]])
}

# inflateDamageValues(data, 'PROPDMGEXP', 'PROPDMG')
# inflateDamageValues(data, 'CROPDMGEXP', 'CROPDMG')

The next section of code demonstrates what was done to compute the total effect different weather events had.


economicData = aggregate(CROPDMG ~ EVTYPE, data = data, sum)
economicData$PROPDMG = aggregate(PROPDMG ~ EVTYPE, data = data, sum)$PROPDMG
economicData$totalEffect = economicData$PROPDMG + economicData$CROPDMG

top10EconomicThreats <- economicData[with(economicData, order(totalEffect, decreasing = TRUE)), 
    ][1:10, ]

Results

The following are the top 10 weather health threats:

top10HealthThreats
##                EVTYPE INJURIES deaths totalEffect
## 834           TORNADO    91346   5633       96979
## 130    EXCESSIVE HEAT     6525   1903        8428
## 856         TSTM WIND     6957    504        7461
## 170             FLOOD     6789    470        7259
## 464         LIGHTNING     5230    816        6046
## 275              HEAT     2100    937        3037
## 153       FLASH FLOOD     1777    978        2755
## 427         ICE STORM     1975     89        2064
## 760 THUNDERSTORM WIND     1488    133        1621
## 972      WINTER STORM     1321    206        1527

Health Threat Table Legend

EVTYPE - Event Type (type of severe weather)

INJURIES - Total number of injuries

deaths - Total number of deaths and injuries

The following are the top 10 weather econimic threats:

top10EconomicThreats
##                 EVTYPE CROPDMG PROPDMG totalEffect
## 834            TORNADO  100019 3212258     3312277
## 153        FLASH FLOOD  179200 1420125     1599325
## 856          TSTM WIND  109203 1335966     1445168
## 244               HAIL  579596  688693     1268290
## 170              FLOOD  168038  899938     1067976
## 760  THUNDERSTORM WIND   66791  876844      943636
## 464          LIGHTNING    3581  603352      606932
## 786 THUNDERSTORM WINDS   18685  446293      464978
## 359          HIGH WIND   17283  324732      342015
## 972       WINTER STORM    1979  132721      134700

Economic Threat Table Legend

EVTYPE - Event Type (type of severe weather)

CROPDMG - Total amount of Crop Damage in US dollars for the weather event type

PROPDMG - Total amount of Property Damage in US dollars for the weather event type

The top ten public health impacts due to severe weather are charted bellow:

library(ggplot2)

qplot(seq(1:10), top10HealthThreats$totalEffect, col = top10HealthThreats$EVTYPE, 
    ylab = "Deaths + Injuries", xlab = "", main = "Public Health impact of Weather Events (Top 10)")

plot of chunk unnamed-chunk-7

The top ten economic impacts due to severe weather are charted bellow:

qplot(seq(1:10), top10EconomicThreats$totalEffect, col = top10EconomicThreats$EVTYPE, 
    ylab = "Property + Crop Damage", xlab = "", main = "Econmic impact of Weather Events (Top 10)")

plot of chunk unnamed-chunk-8

The data seems to indicate that tornadoes pose the largest health and economic threat nationwide by far for the years covered by the data. The impacts of tornadoes is especially startly when you look at how many more injuries and deaths where caused by them compared to the next highest weather type which is Excessive Heat (3703 more deaths due to Tornadoes).

Unfortunatly I was not able to get scaling of monetary damage to work in time to submit with the report so the actual scaling may be off. It could be that if these values where scaled that tornadoes would not be the leading reson for economic loss due to weather (or the difference would not be as severe).

Another assumption being made is that the injuries and deaths are equally weighted. In a more detailed analysis its probable that deaths and injuries would be handled seperatly, or weighted in some way so that the effect of injuries is less than the effect of fatalities.