Health and Economic Effects of United States Weather Events

National Weather Service data on severe weather events in the United States was analyzed to determine the event type with the most harmful health and economic effects. It is found that tornadoes are the most harmful events to population health, while floods have the highest economic impact.

Data Processing

Reading data

csv_name = "repdata_data_StormData.csv"
stormdata = read.csv(csv_name)

Cleaning data

By looking at the counts for each event type we can see that there are some problematic event type labels that need to be corrected/normalized. If the below chunk is run with print(n=1000), several issues can be seen including whitespace errrors, capitalization irregularities, redundant labels, and spelling errors.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
stormdata %>% group_by(EVTYPE) %>% tally()
## # A tibble: 985 × 2
##    EVTYPE                      n
##    <chr>                   <int>
##  1 "   HIGH SURF ADVISORY"     1
##  2 " COASTAL FLOOD"            1
##  3 " FLASH FLOOD"              1
##  4 " LIGHTNING"                1
##  5 " TSTM WIND"                4
##  6 " TSTM WIND (G45)"          1
##  7 " WATERSPOUT"               1
##  8 " WIND"                     1
##  9 "?"                         1
## 10 "ABNORMAL WARMTH"           4
## # ℹ 975 more rows

We first attempt to normalize the event types by removing whitespace, capitalizing all letters, removing multiple spaces, replacing certain patterns (“AND” is replaced with “/”), and replacing common label discrepancies.

library(stringr)
normalize_event_type <- function(ev) {
    ev <- toupper(ev)
    ev <- str_trim(ev)
    ev <- sub(" +AND +", "/", ev)
    ev <- sub(" +", " ", ev)
    ev <- sub(" +/ +", "/", ev)
    ev <- sub("EXTREME COLD$", "EXTREME COLD/WIND CHILL", ev)
    ev <- sub("EXTREME WINDCHILL$", "EXTREME COLD/WIND CHILL", ev)
    ev <- sub("^COLD$", "COLD/WIND CHILL", ev)
    ev <- sub("EXTREME HEAT$", "EXCESSIVE HEAT", ev)
    ev <- sub("^FROST$", "FROST/FREEZE", ev)
    ev <- sub("^HURRICANE$", "HURRICANE/TYPHOON", ev)
    ev <- sub("^LAKE EFFECT SNOW$", "LAKE-EFFECT SNOW", ev)
    ev <- sub("^STORM SURGE$", "STORM TIDE", ev)
    ev <- sub("^STORM SURGE/TIDE$", "STORM TIDE", ev)
    ev <- sub("TSTM", "THUNDERSTORM", ev)
    ev <- sub("^WILD/FOREST FIRE$", "WILDFIRE", ev)
}
stormdata <- mutate(
    stormdata, EVTYPE = normalize_event_type(EVTYPE))

In a more rigorous analysis, we would manually identify and correct these discrepancies. However for the purposes of this project, we will simply attempt to add “correct” event labels (only those listed in the NWS instruction documentation) by pattern matching. We also exclude labels containing the word “SUMMARY”.

true_event_type <- function(ev) {
    true_event_types = c(
        "ASTRONOMICAL LOW TIDE", "AVALANCHE", "BLIZZARD", "COASTAL FLOOD",
        "COLD/WIND CHILL", "DEBRIS FLOW", "DENSE FOG", "DENSE SMOKE",
        "DROUGHT", "DUST DEVIL", "DUST STORM", "EXCESSIVE HEAT",
        "EXTREME COLD/WIND CHILL", "FLASH FLOOD", "FLOOD",
        "FREEZING FOG", "FROST/FREEZE", "FUNNEL CLOUD",
        "HAIL", "HEAT", "HEAVY RAIN", "HEAVY SNOW", "HIGH SURF",
        "HIGH WIND", "HURRICANE/TYPHOON", "ICE STORM",
        "LAKESHORE FLOOD", "LAKE-EFFECT SNOW", "LIGHTNING",
        "MARINE HAIL", "MARINE HIGH WIND", "MARINE STRONG WIND",
        "MARINE THUNDERSTORM WIND", "RIP CURRENT", "SEICHE",
        "STORM TIDE", "STRONG WIND", "THUNDERSTORM WIND",
        "TORNADO", "TROPICAL DEPRESSION", "TROPICAL STORM", "TSUNAMI",
        "VOLCANIC ASH", "WATERSPOUT", "WILDFIRE", "WINTER STORM",
        "WINTER WEATHER"
    )
    match = sapply(true_event_types, function(e) { grepl(e, ev) })
    match[is.na(match)] = FALSE
    if (!any(match)) {
        NA
    } else {
        names(which(match)[1])
    }
}
stormdata <- filter(stormdata, !grepl("SUMMARY", EVTYPE))
# mutate expects a vectorizable function
stormdata <- mutate(stormdata, evtype=sapply(EVTYPE, true_event_type))

Which types of events are most harmful with respect to population health?

Given the available data, fatality and injury estimates are likely the best proxy for impacts of weather events on population health. Ignoring unmatched event types, we group by the “true event type” obtained above and accumulate fatalities and injuries for each group.

health_effects <- 
    filter(stormdata, evtype != "") %>%
    group_by(evtype) %>%
    summarize(fatalities=sum(FATALITIES),
    injuries=sum(INJURIES))

Which types of events have the greatest economic consequences?

Given the available data, the estimated property damage in dollars is likely the best proxy for economic impact of weather events. However, the data for property damage is given in two columns, one indicating an order of magnitude. For ease of computation we convert this into a single column:

    stormdata <- mutate(stormdata, prop_dmg_total=PROPDMG*
        sapply(PROPDMGEXP, function(x){
             if (x == 'K') {
                1e3
             } else if (x == 'M') {
                1e6
             } else if (x == 'B') {
                1e9
             } else {
                NA
             }
        }))

Ignoring unmatched event types, we group by the “true event type” and accumulate known property damage for each group.

econ_effects <- 
    filter(stormdata, evtype != "") %>%
    group_by(evtype) %>%
    summarize(prop_dmg=sum(prop_dmg_total, na.rm=TRUE))

Results

We plot the results of analysis in bar graphs below, sorting by decreasing category impact (i.e. in fatalities, injuries, etc).

library(ggplot2)
plot1 <- ggplot(health_effects,
    aes(x=reorder(evtype, -fatalities, FUN=sum), y=fatalities)) +
     geom_col() + theme(axis.text.x = element_text(angle=45, hjust=1)) + labs(
        title="Fatalities by Event Type", x="Event Type", y="Fatalities"
     )

plot2 <- ggplot(health_effects,
    aes(x=reorder(evtype, -injuries, FUN=sum), y=injuries)) +
     geom_col() + theme(axis.text.x = element_text(angle=45, hjust=1)) + labs(
        title="Injuries by Event Type", x="Event Type", y="Injuries"
     )
print(plot1)

print(plot2)

According to the above plot, tornadoes are the event type which results in the most fatalities and injuries and is therefore most harmful to population health.

plot3 <- ggplot(econ_effects,
    aes(x=reorder(evtype, -prop_dmg, FUN=sum), y=prop_dmg)) +
     geom_col() + theme(axis.text.x = element_text(angle=45, hjust=1)) + labs(
        title="Property Damage by Event Type",
         x="Event Type", y="Property Damage ($)"
     )
print(plot3)

According to the above plot, floods are the event type which has the highest measured property damage impact and therefore the greatest economic consequences.