Synopsis

Here I analyze Storm Data data available from the US National Weather Service. I attempt to provide some information regarding the most damaging types of severe weather events so that policy makers can decide where to prioritize spending their budgets.

I split the definition of damage into two areas; Human damage and Economic damage. I summarize human cost as the mean average number of people killed or seriously injured per event type. For the economic cost I add together to cost of damage to property and crops and then take a mean average per event type.

I will summarize the results with graphs showing the top 5 event types for each type of damage.

Data Processing

After initially loading and exploring the data I established the following variables of interest;

The damage signifier will be either: K (thousands), M (millions), B (Billions) or blank (actual amount) and the dollar amount needs to be increased as appropriate to get the actual amount

First let’s load some libraries.

library(dplyr)
library(tidyr)
library(ggplot2)
library(knitr)

Then download file, and read some data.

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "StormData.csv.bz2")
data <- read.csv("StormData.csv.bz2")

For the Population damage analysis I first obtain a KSI (Killed or seriously injured) amount by adding the FATALITIES and INJURIES variables, then group by the event type and then summarise with the mean and keep the top 5 rows.

pop_dmg <- data %>%
           mutate(KSI = FATALITIES + INJURIES) %>%
           group_by(EVTYPE) %>%
           summarise(TOTALKSI = mean(KSI), 
                     .groups = "drop") %>%
           top_n(5) 

When looking at the economic damage I noticed that not all of the signifiers are valid, here’s a count for each type of signifier for the property damage, the crop damage was similar.

#some EXP values cant be interpreted
kable(table(data$PROPDMGEXP), 
      col.names = c("signifier", "count"), 
      format = "simple")
signifier count
465934
- 1
? 8
+ 5
0 216
1 25
2 13
3 4
4 4
5 28
6 4
7 5
8 1
B 40
h 1
H 6
K 424665
m 7
M 11330

So i wondered how many rows are affected, lets see…

#make helper function
`%not_in%` <- Negate(`%in%`)
#how many rows affected (105 rows)
kable(data.frame(
    prop_dmg_rows = sum(data$PROPDMGEXP %not_in% c(0, "K", "M", "m" , "B", "")),
    crop_dmg_rows = sum(data$CROPDMGEXP %not_in% c(0, "K", "M", "m" , "B", "")),
    total_data_rows = dim(data)[1]
))
prop_dmg_rows crop_dmg_rows total_data_rows
105 29 902297

Turns out an insignificant amount of rows are affected, so lets carry on.

The economic data was a bit more of a challenge to process. First I have to multiply the dollar amount by the signifier, only then can i add them together, group by event type, summarise with the mean and then keep only the top 5 rows.

Due to the sums involved I divide the mean average to give an amount in billions of dollars.

econ_dmg <- data %>% 
            mutate(PROPVALUE = 
                        case_when(
                                PROPDMGEXP == 0 ~ (PROPDMG * 1),
                                PROPDMGEXP == "" ~ (PROPDMG * 1),
                                PROPDMGEXP == "K" ~ (PROPDMG * 1e03),
                                PROPDMGEXP == "k" ~ (PROPDMG * 1e03),
                                PROPDMGEXP == "M" ~ (PROPDMG * 1e06),
                                PROPDMGEXP == "m" ~ (PROPDMG * 1e06),
                                PROPDMGEXP == "B" ~ (PROPDMG * 1e09),
                                PROPDMGEXP == "b" ~ (PROPDMG * 1e09)
                                )
                ) %>%
           mutate(CROPVALUE = 
                        case_when(
                                CROPDMGEXP == 0 ~ (CROPDMG * 1),
                                CROPDMGEXP == "" ~ (CROPDMG * 1),
                                CROPDMGEXP == "K" ~ (CROPDMG * 1e03),
                                CROPDMGEXP == "k" ~ (CROPDMG * 1e03),
                                CROPDMGEXP == "M" ~ (CROPDMG * 1e06),
                                CROPDMGEXP == "m" ~ (CROPDMG * 1e06),
                                CROPDMGEXP == "B" ~ (CROPDMG * 1e09),
                                CROPDMGEXP == "b" ~ (CROPDMG * 1e09)
                                )
                ) %>%
           mutate(ECONDMG = PROPVALUE + CROPVALUE) %>%
           group_by(EVTYPE) %>%
           summarise(TOTALECONDMG = mean(ECONDMG)/1e09,
                     .groups = "drop") %>%
           top_n(5)

Results

Here I present the two results, the top 5 severe weather event types for Population Harm and Economic harm

ggplot(data = pop_dmg) +
        geom_bar(aes(reorder(EVTYPE, TOTALKSI), 
                      TOTALKSI), stat = "identity",
                 fill = "steelblue")+
        labs(title = "Top 5 Event Types by Population Damage") +
        xlab("") +
        ylab("Average Number of People Killed or Seriously injured") +
        coord_flip()

ggplot(data = econ_dmg) +
        geom_bar(aes(reorder(EVTYPE, TOTALECONDMG), 
                     TOTALECONDMG), stat = "identity",
                 fill = "steelblue") +
        labs(title = "Top 5 Event Types by Economic Damage") +
        xlab("") +
        ylab("Average Cost in Billions of Dollars") +
        coord_flip()

Conclusion

The most damaging severe weather event in terms of human damage is heat wave, and the most damaging in terms of economic cost is Tornadoes, TSTS wind, Hail.