Top 10 Most Damaging Weather Events to Citizens and Economies

Synopsis

This study analyzes the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database to determine which types of weather events have the greatest demonstrated potential for causing human and/or economic damage, and therefore should receive the highest priority concern and countermeasures by government officials. This database tracks characteristics of major storms and weather events in the United States from 1950 to November 2011. This study performs 2 analyses: one for human damage (deaths and injuries), and another for economic damage (property and crop damage).

Data Processing

The first step is to read our database into R:

data <- read.csv("repdata_data_StormData.csv.bz2", na.strings = "", stringsAsFactors = F)

Casualties

First we will analyze which events are most damaging to human health. The key variables here are “FATALITIES” and “INJURIES.” We will add those together to create a new variable called “CASUALTIES.”

We load the library “doBy” to create a table which summarizes the total fatalities and injuries for each event type (“EVTYPE”):

library("doBy")
DT <- summaryBy(FATALITIES + INJURIES ~ EVTYPE, data = data, keep.names = TRUE, 
    FUN = function(x) {
        s = sum(x)
    })

Now create “CASUALTIES”“ column which is the sum of "FATALITIES” and “INJURIES”:

CASUALTIES <- DT$FATALITIES + DT$INJURIES
DT2 <- cbind(DT, CASUALTIES)

Cleanup “EVTYPE”“ by eliminating those with 0 "CASUALTIES”:

DT3 <- subset(DT2, CASUALTIES > 0)

Sort by “CASUALTIES” to see most relevant “EVTYPE"s:

DT4 <- DT3[order(-DT3$CASUALTIES), ]

Note that two high ranking "EVTYPE"s appear to be the same, "TSTM Wind” and “Thunderstorm Wind,” so we will go back and combine all “Thunderstorm” or “TSTM” into one EVTYPE.

First we subset out all the rows that say “TSTM” or “THUNDERSTORM” and then erase them from our DT4 data.table.

TSTM <- subset(DT4, grepl("TSTM", DT4$EVTYPE))
TSTM2 <- subset(DT4, grepl("THUNDERSTORM", DT4$EVTYPE))
TSTM3 <- rbind(TSTM, TSTM2)
DT5 <- subset(DT4, !grepl("THUNDERSTORM", DT4$EVTYPE))
DT6 <- subset(DT5, !grepl("TSTM", DT5$EVTYPE))

We created a new variable “TSTM3” containing all the thunderstorm rows, now we will summarize those rows into 1 row, then insert it into our main data table again, and sort again.

TSTM4 <- c("TSTM", colSums(TSTM3[, 2:4]))
DT7 <- rbind(TSTM4, DT6)
DT8 <- DT7[order(-as.numeric(DT7$CASUALTIES)), ]

Now to create plot demonstrating the top 10 most harmful EVTYPEs

First shorten and change case of labels for better fit, then plot a barplot:

DT9 <- DT8[1:10, ]
DT9$EVTYPE <- c("Tornado", "TStm", "ExcHeat", "Flood", "Lightning", "Heat", 
    "FlashFld", "IceStm", "WinStm", "HiWind")
barplot(as.numeric(DT9$CASUALTIES), main = "Top 10 Casualty-Inducing Weather Events", 
    ylab = "Total Casualties", xlab = "Event Type", density = 50, col = "red", 
    border = "red", names.arg = DT9$EVTYPE, width = 1, xlim = c(0, 12), cex.names = 0.82, 
    space = 0.1, axis.lty = 1, col.lab = "blue", ylim = c(0, 1e+05), yaxp = c(0, 
        1e+05, 4))

See the “Results” section below for the resulting plot and discussion.

Economic Damage

The key variables for economic damage are: “PROPDMG” (Property Damage), PROPDMGEXP (Units of PROPDMG: Thousands [K], Millions[M], or Billions[B]), and CROPDMG, CROPDMGEXP which are the same as PROPDMG but for crops instead of property.

PROPDMGEXP has unique values: “K” “M” NA “B” “m” “+” “0” “5” “6” “?” “4” “2” “3” “h” “7” “H” “-” “1” “8” and CROPDMGEXP has unique values: NA “M” “K” “m” “B” “?” “0” “k” “2”.

Our documentation explains only that M or m is millions, k or K is thousands, B or b is billions. The number of entries which do not fit the standard form seems to be small in comparison to the overall data, about 5 records for each nonstandard value, out of 902,297 total records, so we will ignore them (treat them as NAs).

Next, extract these columns as vectors.

Important: we need to do a for loop over a very large number of records, and R processes a vector many times faster than it does if we reference columns that are part of a large data table.

PROPDMG1 <- data$PROPDMG
PROPDMGEXP1 <- data$PROPDMGEXP
PROPDMG2 <- PROPDMG1

Next, we multiply “PROPDMG” based on its corresponding “K,” “M,” or “B” to make the “PROPDMG2” vector contain all values in the same units so they can be easily compared. We ignore NA values and we return an NA value if something other than “K”/“k”, “M”/“m”, or “B”/“b” is present:

for (i in 1:902297) {
    if (!is.na(PROPDMGEXP1[i]) && !is.na(PROPDMG1[i])) {
        if (PROPDMGEXP1[i] == "K" || PROPDMGEXP1[i] == "k") {
            PROPDMG2[i] <- PROPDMG1[i] * 1000
        }
        if (PROPDMGEXP1[i] == "M" || PROPDMGEXP1[i] == "m") {
            PROPDMG2[i] <- PROPDMG1[i] * 1e+06
        }
        if (PROPDMGEXP1[i] == "B" || PROPDMGEXP1[i] == "b") {
            PROPDMG2[i] <- PROPDMG1[i] * 1e+09
        }
    } else {
        PROPDMG2[i] <- NA
    }
}

Now, we repeat the above procedure for “CROPDMG”:

CROPDMG1 <- data$CROPDMG
CROPDMGEXP1 <- data$CROPDMGEXP
CROPDMG2 <- CROPDMG1
for (i in 1:902297) {
    if (!is.na(CROPDMGEXP1[i]) && !is.na(CROPDMG1[i])) {
        if (CROPDMGEXP1[i] == "K" || CROPDMGEXP1[i] == "k") {
            CROPDMG2[i] <- CROPDMG1[i] * 1000
        }
        if (CROPDMGEXP1[i] == "M" || CROPDMGEXP1[i] == "m") {
            CROPDMG2[i] <- CROPDMG1[i] * 1e+06
        }
        if (CROPDMGEXP1[i] == "B" || CROPDMGEXP1[i] == "b") {
            CROPDMG2[i] <- CROPDMG1[i] * 1e+09
        }
    } else {
        CROPDMG2[i] <- NA
    }
}

Now, place these vectors into our data table:

data2 <- cbind(data, PROPDMG2, CROPDMG2)

Create a summary table using “doBy” which adds “PROPDMG” and “CROPDMG” for each “EVTYPE”:

library("doBy")
DT <- summaryBy(PROPDMG2 + CROPDMG2 ~ EVTYPE, data = data2, keep.names = TRUE, 
    FUN = function(x) {
        s = sum(x)
    })

Create “TOTDMG”“ column which adds "CROPDMG” and “PROPDMG”:

TOTDMG <- DT$PROPDMG2 + DT$CROPDMG2
DT2 <- cbind(DT, TOTDMG)

Cleanup “EVTYPE”“ by eliminating those with 0 "TOTDMG”:

DT3 <- subset(DT2, TOTDMG > 0)

Sort by TOTDMG to see most relevant “EVTYPE"s:

DT4 <- DT3[order(-DT3$TOTDMG), ]

Now, create a plot demonstrating the top 10 most harmful "EVTYPE"s. First shorten to top 10 and change case of labels for better fit:

DT5 <- DT4[1:10, ]
DT5$EVTYPE <- c("TndoTstmHl", "HiWind", "HurrOpal", "WintStm", "HvyRnHiSrf", 
    "LksFld", "HiWindRn", "FrFire", "FlshFld", "HvySnw")

Divide by 1000000 to make numbers smaller for readability:

TOTDMGM <- DT5$TOTDMG
for (i in 1:10) {
    TOTDMGM[i] <- DT5$TOTDMG[i]/1e+06
}
DT6 <- cbind(DT5, TOTDMGM)

Finally, plot:

barplot(DT6$TOTDMGM, main = "Top 10 Economic-Damage-Inducing Weather Events", 
    ylab = "Total Damage ($Millions)", xlab = "Event Type", density = 50, col = "green", 
    border = "green", names.arg = DT5$EVTYPE, width = 1, xlim = c(0, 12), cex.names = 0.75, 
    space = 0.1, axis.lty = 1, col.lab = "blue", ylim = c(0, 1600), yaxp = c(0, 
        1600, 4))

See the "Results” section below for the resulting plot and discussion.

Results

Tornadoes take by far the largest human toll.

Tornadoes take by far the largest human toll.

As we can clearly see, all event types pale in comparison to the devastating and deadly effects of Tornadoes, with nearly 100,000 casualties in our database. Thunderstorms and Excessive Heat are the next most deadly types of weather events, with around 10% of the casualties of tornadoes. Tornadoes also most economically devastating.

Tornadoes also most economically devastating.

Tornadoes are also by far the most economically devastating weather event, specifically when Tornado, Thunderstorm & Hail occur simultaneously, causing $1.6 Billion in damage during the study period. A distant second and third are High Winds and Hurricane Opal, with around $100 million in damages each.