This report describes the relationship in the United States between severe weather events and their impact on population health and the economy. The data below is from the NOAA Storm Database, which contains data for 48 official weather events (as described here) and measures the number of fatalities, the amount of property damage, and the amount of crop damage inflicted by each event. This analysis found that certain events such as tornadoes and floods have a relatively low number of fatalities and property or crop damage per event, but the number of times they happen far exceeds others events. Meanwhile there are certain events, such as hurricanes especially, which happen a relatively low number of times but inflict a large number of fatalities and a large amount of property and crop damage per event.

Data Processing

if (!file.exists("stormdata.csv.bz2")) {
  datalocation <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
  download.file(datalocation, "stormdata.csv.bz2")
  }
library(readr)
dat <- read_csv("stormdata.csv.bz2", col_types = paste(rep("c", 37), collapse = ""))

Here, we clean up the BGN_DATE to only show the date in yyyy-mm-dd format.

suppressMessages(library(lubridate))
dat$BGN_DATE <- mdy(gsub(" .*","",dat$BGN_DATE))

The PROPDMGEXP column represents exponent values for the PROPDMG column, but it is riddled with inconsistencies. The following removes rows with certain values, converts certain values to their numeric values, and creates a new PROPDMGLONG column that multiplies PROPDMG by the new PROPDMGEXP column. Credit to figuring out these values goes to the work done here. Does the same for CROPDMG and CROPDMGEXP, and creates a CROPDMGLONG column for the product of those columns once they’ve been tidied.

dat <- dat[!grepl("\\?|-", dat$PROPDMGEXP),]
newpropdmg <- numeric(length = nrow(dat))
newpropdmg[grep("[1-8]", dat$PROPDMGEXP)] <- 10
newpropdmg[grep("\\+", dat$PROPDMGEXP)] <- 1
newpropdmg[grep("[Bb]", dat$PROPDMGEXP)] <- 1000000000
newpropdmg[grep("[Mm]", dat$PROPDMGEXP)] <- 1000000
newpropdmg[grep("[Kk]", dat$PROPDMGEXP)] <- 1000
newpropdmg[grep("[Hh]", dat$PROPDMGEXP)] <- 100
newpropdmg[is.na(dat$PROPDMGEXP)] <- 0
dat$PROPDMGLONG <- as.numeric(dat$PROPDMG) * newpropdmg

dat <- dat[!grepl("\\?|-", dat$CROPDMGEXP),]
newcropdmg <- numeric(length = nrow(dat))
newcropdmg[grep("[1-8]", dat$CROPDMGEXP)] <- 10
newcropdmg[grep("\\+", dat$CROPDMGEXP)] <- 1
newcropdmg[grep("[Bb]", dat$CROPDMGEXP)] <- 1000000000
newcropdmg[grep("[Mm]", dat$CROPDMGEXP)] <- 1000000
newcropdmg[grep("[Kk]", dat$CROPDMGEXP)] <- 1000
newcropdmg[grep("[Hh]", dat$CROPDMGEXP)] <- 100
newcropdmg[is.na(dat$CROPDMGEXP)] <- 0
dat$CROPDMGLONG <- as.numeric(dat$CROPDMG) * newcropdmg
dat$ALLDMGLONG <- dat$PROPDMGLONG + dat$CROPDMGLONG

Now, to clean up the EVTYPE column. We’ll just look for anything that matches the official 48 event values. Keeping only these 48 values preserves 79.49% of all property damage, 75.85% of all crop damage, and gets rid of 29.59% of the rows.

evtype_vals <- tolower(paste("^Astronomical Low Tide$", "^Avalanche$", 
                             "^Blizzard$", "^Coastal Flood$", 
                             "^Cold/Wind Chill$", "^Debris Flow$", 
                             "^Dense Fog$", "^Dense Smoke$", "^Drought$", 
                             "^Dust Devil$", "^Dust Storm$", 
                             "^Excessive Heat$", "^Extreme Cold/Wind Chill$", 
                             "^Flash Flood$", "^Flood$", "^Freezing Fog$", 
                             "^Frost/Freeze$", "^Funnel Cloud$", "^Hail$", 
                             "^Heat$", "^Heavy Rain$", "^Heavy Snow$", 
                             "^High Surf$", "^High Wind$",
                             "^Hurricane/Typhoon$", "^Ice Storm$", 
                             "^Lakeshore Flood$", "^Lake-Effect Snow$", 
                             "^Lightning$", "^Marine Hail$", 
                             "^Marine High Wind$", "^Marine Strong Wind$", 
                             "^Marine Thunderstorm Wind$", "^Rip Current$", 
                             "^Seiche$", "^Sleet$", "^Storm Tide$", 
                             "^Strong Wind$", "^Thunderstorm Wind$", 
                             "^Tornado$", "^Tropical Depression$", 
                             "^Tropical Storm$", "^Tsunami$", 
                             "^Volcanic Ash$", "^Waterspout$", 
                             "^Wildfire$", "^Winter Storm$", 
                             "^Winter Weather$",
                             sep = "|"))
dat <- dat[grep(evtype_vals, tolower(dat$EVTYPE)),]
dat$EVTYPE <- tolower(dat$EVTYPE)

Finally, we use the dplyr package to create a new dat_grouped table that summarizes key metrics, such as the number of events and the number of fatalities, for each event type. One of these columns, alldmg_total, represents the sum of crop damage and property damage. Another column, alldmg_avg, represents the average amount of damage and property damage. The kableExtra package is used to print the entire dat_grouped table.

suppressMessages(library(dplyr))
suppressMessages(library(kableExtra))
dat$FATALITIES <- as.numeric(dat$FATALITIES)
dat$INJURIES <- as.numeric(dat$INJURIES)
dat_grouped <- dat %>% 
  group_by(EVTYPE) %>% 
  summarise(count = n(),
            fatalities_avg = mean(FATALITIES),
            alldmg_avg = mean(ALLDMGLONG))
kable(dat_grouped, format.args = list(big.mark = ","), 
      booktabs = T) %>%
  kable_styling(latex_options = "striped", full_width = F, 
                position = "left")
EVTYPE count fatalities_avg alldmg_avg
astronomical low tide 174 0.0000000 1.839080e+03
avalanche 386 0.5803109 9.641969e+03
blizzard 2,719 0.0371460 2.836609e+05
coastal flood 657 0.0045662 3.950846e+05
cold/wind chill 539 0.1762523 4.805195e+03
dense fog 1,293 0.0139211 7.481825e+03
dense smoke 10 0.0000000 1.000000e+04
drought 2,488 0.0000000 6.036444e+06
dust devil 149 0.0134228 4.823020e+03
dust storm 427 0.0515222 2.025527e+04
excessive heat 1,678 1.1340882 2.980666e+05
extreme cold/wind chill 1,002 0.1247505 8.680639e+03
flash flood 54,276 0.0180190 3.235717e+05
flood 25,327 0.0185573 5.935155e+06
freezing fog 46 0.0000000 4.743478e+04
frost/freeze 1,343 0.0000000 8.225361e+05
funnel cloud 6,844 0.0000000 2.843366e+01
hail 288,658 0.0000520 6.498424e+04
heat 767 1.2216428 5.257608e+05
heavy rain 11,742 0.0083461 1.215847e+05
heavy snow 15,708 0.0080851 6.794259e+04
high surf 734 0.1416894 1.225545e+05
high wind 20,213 0.0121704 2.923177e+05
hurricane/typhoon 88 0.7272727 8.172013e+08
ice storm 2,006 0.0443669 4.470110e+06
lake-effect snow 636 0.0000000 6.307390e+04
lakeshore flood 23 0.0000000 3.278261e+05
lightning 15,755 0.0517931 5.971130e+04
marine hail 442 0.0000000 9.049774e+00
marine high wind 135 0.0074074 9.607481e+03
marine strong wind 48 0.2916667 8.715208e+03
marine thunderstorm wind 5,812 0.0017206 8.368892e+01
rip current 470 0.7829787 2.127660e+00
seiche 21 0.0000000 4.666667e+04
sleet 59 0.0338983 0.000000e+00
strong wind 3,569 0.0288596 6.730539e+04
thunderstorm wind 82,563 0.0016109 4.721200e+04
tornado 60,652 0.0928741 9.455931e+05
tropical depression 60 0.0000000 2.895000e+04
tropical storm 690 0.0840580 1.214817e+07
tsunami 20 1.6500000 7.204100e+06
volcanic ash 23 0.0000000 2.173913e+04
waterspout 3,797 0.0007901 2.463445e+03
wildfire 2,761 0.0271641 1.832882e+06
winter storm 11,433 0.0180180 5.873735e+05
winter weather 7,045 0.0046842 5.090987e+03

Results

First, we look how much property and crop damage is inflicted by each event type. Instead of plotting all 48 event types, we use the quantile function to filter down to only the event types in the 75th percentile of average property/damage inflicted.

suppressMessages(library(ggplot2))
suppressMessages(library(ggrepel))
top_dmg_val <- quantile(dat_grouped$alldmg_avg, c(.75))
top_dmg <- dat_grouped[dat_grouped$alldmg_avg > top_dmg_val,] %>%
  select(-fatalities_avg)
kable(top_dmg, format.args = list(big.mark = ","), 
      booktabs = T) %>%
  kable_styling(latex_options = "striped", full_width = F, 
                position = "left")
EVTYPE count alldmg_avg
coastal flood 657 395,084.6
drought 2,488 6,036,443.7
flood 25,327 5,935,155.3
frost/freeze 1,343 822,536.1
heat 767 525,760.8
hurricane/typhoon 88 817,201,281.8
ice storm 2,006 4,470,110.3
tornado 60,652 945,593.1
tropical storm 690 12,148,168.9
tsunami 20 7,204,100.0
wildfire 2,761 1,832,881.9
winter storm 11,433 587,373.5

We use the ggplot2 and ggrepel package plot the average amount of property/crop damage per event type (y-axis) against the number of times each event type happened (x-axis). We transform each axis to be on a logarithmic scale, to better see the differences between the different event types.

plot1 <- ggplot(top_dmg, aes(count, alldmg_avg)) + 
  geom_point() + 
  scale_x_log10() + scale_y_log10() +
  labs(y = "Avg property/crop damage (log10)", 
       x = "Number of events since 1950 (log10") + 
  geom_text_repel(aes(label = EVTYPE))
plot1

We can see that tornadoes and floods don’t inflict on average a massive amount of property/crop damage, but they happen much more frequently. Meanwhile, hurricanes/typhoons don’t happen very frequently but on average they inflict the highest amount of property/crop damage.

Now, we look how many fatalities are inflicted by each event type. Again, instead of plotting all 48 event types, we use the quantile function to filter down to only the event types in the 75th percentile of average fatalities.

top_ftl_val <- quantile(dat_grouped$fatalities_avg, c(.75))
top_ftl <- dat_grouped[dat_grouped$fatalities_avg > top_ftl_val,] %>%
  select(-alldmg_avg)
kable(top_ftl, format.args = list(big.mark = ","), 
      booktabs = T) %>%
  kable_styling(latex_options = "striped", full_width = F, 
                position = "left")
EVTYPE count fatalities_avg
avalanche 386 0.5803109
cold/wind chill 539 0.1762523
excessive heat 1,678 1.1340882
extreme cold/wind chill 1,002 0.1247505
heat 767 1.2216428
high surf 734 0.1416894
hurricane/typhoon 88 0.7272727
marine strong wind 48 0.2916667
rip current 470 0.7829787
tornado 60,652 0.0928741
tropical storm 690 0.0840580
tsunami 20 1.6500000

We again use the ggplot2 and ggrepel package plot the average number of fatalities per event type (y-axis) against the number of times each event type happened (x-axis). We again transform each axis to be on a logarithmic scale, to better see the differences between the different event types.

plot2 <- ggplot(top_ftl, aes(count, fatalities_avg)) + 
  geom_point() +
  scale_x_log10() + scale_y_log10() +
  labs(y = "Avg fatalities (log10)", 
       x = "Number of events since 1950 (log10)") + 
  geom_text_repel(aes(label = EVTYPE))
plot2

We can see that tornadoe don’t inflict on average a large number of fatalities, but they happen much more frequently. Meanwhile, events like heat and excessive heat don’t happen a large number of times but they are among the event types with the highest fatalities per event.