Storm Damage Analysis from National Weather Service Data

Synopsis

Storms have the ability to cause devistation to both human well-being and property. Damage caused by storms can be viewed along two metrics. Cost to human life, and cost of property damage. In this paper, we look at data from the National Weather Service and use it to examine the costs to the US for various weather events.

Data Processing

Loading

We can begin by loading the data from our compressed csv. We use the option strip.white=TRUE to elliminate extraneous spaces.

data <- read.csv(bzfile("StormData.csv.bz2"), strip.white = TRUE)

Cleaning

Cleaning the data is challenging with this dataset as there are inconsistencies in labels due to data entry variations.

Before cleaning the data, we have 985 types of events.

Let's start by normalizing both spaces and other special characters, as well enforcing a consistent case.


# Remove extraneous whitespace
data$EVTYPE <- gsub("\\s+", " ", as.character(data$EVTYPE))
data$EVTYPE <- gsub("^\\s+|\\s+$", "", data$EVTYPE)

# Normalize case
data$EVTYPE <- tolower(data$EVTYPE)

After ensure the case is correct, we have 883 types of events.

We also would like to remove small variations such as variations on the word “hurricane”:

unique(data$EVTYPE[grep("hurricane", data$EVTYPE)])
##  [1] "hurricane opal/high winds"  "hurricane erin"            
##  [3] "hurricane opal"             "hurricane"                 
##  [5] "hurricane-generated swells" "hurricane emily"           
##  [7] "hurricane gordon"           "hurricane felix"           
##  [9] "hurricane edouard"          "hurricane/typhoon"

We can eliminate the variations by replacing all EVTYPE strings containing the word with the word itself.


words_to_search <- c("hurricane", "tornado", "flood", "wind", "heat")
for (word in words_to_search) {
    print(paste("Replacing", length(unique(data[grep(word, data$EVTYPE), ]$EVTYPE)), 
        "variations with", word, "."))
    data[grep(word, data$EVTYPE), ]$EVTYPE <- word
}
## [1] "Replacing 10 variations with hurricane ."
## [1] "Replacing 15 variations with tornado ."
## [1] "Replacing 97 variations with flood ."
## [1] "Replacing 200 variations with wind ."
## [1] "Replacing 14 variations with heat ."

Now we must ensure the units and scale are consistent. Scaling is required for both property and crop damage quantities.

scaleValue <- function(value, scaler) {
    if (is.na(scaler)) {
        return(0)
    }
    if (tolower(scaler) == "k") 
        scaleValue <- 1000 * value
    if (tolower(scaler) == "m") 
        scaleValue <- 1e+06 * value
    if (tolower(scaler) == "h") 
        scaleValue <- 100 * value
    if (tolower(scaler) == "b") 
        scaleValue <- 1e+09 * value else scaleValue <- value
}
data$property_damage <- mapply(data$PROPDMG, data$PROPDMGEXP, FUN = scaleValue)
data$crop_damage <- mapply(data$CROPDMG, data$CROPDMGEXP, FUN = scaleValue)

Aggregation

We can then aggregate to find totals for fatalities, injuries, property damage, and crop damage.

agg_data <- aggregate(cbind(FATALITIES, INJURIES, property_damage, crop_damage) ~ 
    EVTYPE, data, FUN = sum)

If an event occured with no injuries, fatalities or property damage it is not interesting for this analysis.

interesting_data <- agg_data[agg_data$property_damage > 0 | agg_data$INJURIES > 
    0 | agg_data$FATALITIES > 0 | agg_data$crop_damage > 0, ]

top_n <- 10

Results

Population health

The cost to population health of storms can be measured with injuries and fatalities.

top_n_fatality <- interesting_data[order(-interesting_data$FATALITIES)[1:top_n], 
    ]
# dotchart(top_n_fatality$FATALITIES, top_n_fatality$EVTYPE)
chart_matrix <- as.matrix(rbind(top_n_fatality$FATALITIES, top_n_fatality$INJURIES))
colnames(chart_matrix) <- top_n_fatality$EVTYPE
rownames(chart_matrix) <- c("fatalities", "injuries")
dotchart(chart_matrix, col = rainbow(2), xlab = "incidents", main = paste("Top", 
    top_n, "event types in terms of human health costs"))
legend("bottomright", c("fatalities", "injuries"), fill = rainbow(2))

Top 10 weather events in terms of cost to human health

From this, we can see that tornadoes have the highest cost to human health followed by heat and flash flooding.

Economic Consequences

The economic cost can be measured in the US dollar cost of the damage to property and crops.

top_n_crop <- interesting_data[order(-(interesting_data$property_damage + interesting_data$crop_damage))[1:top_n], 
    ]
chart_matrix <- as.matrix(rbind(top_n_crop$crop_damage, top_n_crop$property_damage))
colnames(chart_matrix) <- top_n_crop$EVTYPE
rownames(chart_matrix) <- c("crop", "property")
dotchart(chart_matrix, col = rainbow(2), xlab = "Damage in USD", main = paste("Top", 
    top_n, "event types in terms of economic costs"))
legend("bottomright", c("crop damage", "property damage"), fill = rainbow(2))

Top 10 weather events in terms of property damage

From this, we can see that floods have the highest economic cost followed by hurricanes and storm surges.

Conclusion

Tornados appear to be the worst on the cost to human lives, and floods have the highest costs in property values. One thing to keep in mind is that both weather events are localized to specific geographic regions. A realistic risk model for any particular location should take into account the location being considered.