Most harmful storm events in US affecting population health and economy

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database to identify event types which are most harmful to population health, and those which have the greatest economic consequences.

Data Processing

We download the storm database file from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 and process it in R as follows -

Load data

# Read data from BZ2 file -
data <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))
total_rows <- nrow(data)

Load the CSV data in R, there are 902297 rows in data.

Analyze events affecting population health

The dataset contains number of injuries and fatalities per year, for each event type. We compute sum of all cases of injuries and fatalities, grouped by the event type. We then add the cases for injuries and fatalities to get total cases related to health.

# Sum on injuries & fatalities, grouped by event type
evt_health <- aggregate(cbind(data$INJURIES, data$FATALITIES), list(Event = data$EVTYPE), 
    sum)

names(evt_health) <- c("Event", "Injuries", "Fatalities")

# Get total health cases by adding injuries and fatalities
evt_health$Total <- evt_health$Injuries + evt_health$Fatalities

# Sort on the total health cases, descending
evt_health_top <- evt_health[order(-evt_health[, 4]), c(1:4)]

Analyze events with greatest economic consequences

The dataset contains columns for property damages, and crop damages, along with the units. The units can be either a number (0-9) which will represent the exponent (damage * 10exp). The unit can also be “k/K”, which means 1000, or “m/M” which means million or “b/B” which means billion. We compute sum of all property and crop damage values, grouped by the event type. We then add the amount for property and crop damage to get total financial damage.

computeAmount <- function(entry) {
    value <- as.numeric(entry[1])
    if (!is.numeric(value) | is.na(value)) {
        value <- 0
    }
    exp <- as.character(entry[2])
    amount <- value
    if (exp %in% c("0":"9")) {
        amount <- value * (10^as.numeric(exp))
    } else if ((exp == "b" | exp == "B")) {
        amount <- value * (10^9)
    } else if ((exp == "m" | exp == "M")) {
        amount <- value * (10^6)
    } else if ((exp == "k" | exp == "K")) {
        amount <- value * (10^3)
    }
    amount
}

# Compute property damage amount per row -
data$PROPDMGAMOUNT <- apply(data[, c("PROPDMG", "PROPDMGEXP")], 1, computeAmount)

# Compute crop damage amount per row -
data$CROPDMGAMOUNT <- apply(data[, c("CROPDMG", "CROPDMGEXP")], 1, computeAmount)

# Sum on damage amounts, aggregated by event type
evt_finance <- aggregate(cbind(data$PROPDMGAMOUNT, data$CROPDMGAMOUNT), list(Event = data$EVTYPE), 
    sum)
names(evt_finance) <- c("Event", "PropertyDamageAmount", "CropDamageAmount")

# Get total damage by adding property and crop damage
evt_finance$TotalDamageAmount <- evt_finance$PropertyDamageAmount + evt_finance$CropDamageAmount

# Sort on the total damage, descending
evt_finance_top <- evt_finance[order(-evt_finance[, 4]), c(1:4)]

Results

Events affecting population health

Here are the top five event types affecting population health -

rownames(evt_health_top) <- NULL
evt_health_top$Event <- as.factor(evt_health_top$Event)
evt_health_top <- evt_health_top[1:5, ]
evt_health_top
##            Event Injuries Fatalities Total
## 1        TORNADO    91346       5633 96979
## 2 EXCESSIVE HEAT     6525       1903  8428
## 3      TSTM WIND     6957        504  7461
## 4          FLOOD     6789        470  7259
## 5      LIGHTNING     5230        816  6046

Here is the chart showing the total health cases for top 5 event types.

barplot(evt_health_top$Total, main = "Storm events affecting population health", 
    xlab = "Event Type", ylab = "Total number of health cases", col = c("lightblue", 
        "mistyrose", "lightcyan", "lavender", "cornsilk"), legend = evt_health_top$Event)

plot of chunk health chart

As you can see, TORNADO is the single most harmful storm event type which affects population health.

Events with greatest economic consequences

Here are the top five event types causing maximum financial damage -

rownames(evt_finance_top) <- NULL
evt_finance_top$Event <- as.factor(evt_finance_top$Event)
evt_finance_top <- evt_finance_top[1:5, ]
evt_finance_top
##               Event PropertyDamageAmount CropDamageAmount
## 1             FLOOD            1.447e+11        5.662e+09
## 2 HURRICANE/TYPHOON            6.931e+10        2.608e+09
## 3           TORNADO            5.695e+10        4.150e+08
## 4       STORM SURGE            4.332e+10        5.000e+03
## 5              HAIL            1.574e+10        3.026e+09
##   TotalDamageAmount
## 1         1.503e+11
## 2         7.191e+10
## 3         5.736e+10
## 4         4.332e+10
## 5         1.876e+10

Here is the chart showing the total damage for top 5 event types.

barplot(evt_finance_top$TotalDamageAmount, main = "Storm events causing financial damage", 
    xlab = "Event Type", ylab = "Total financial damage", col = c("lightblue", 
        "mistyrose", "lightcyan", "lavender", "cornsilk"), legend = evt_finance_top$Event)

plot of chunk finance chart

As you can see, FLOOD is the most harmful storm event type which causes maximum financial damage.