Impact of Severe Weather Events on Population Health and the Economy in the United States

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project aims to explore the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which tracks characteristics of major storms and weather events in the United States. These include when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

For a detailed description of the NOAA storm database, please refer to https://www.ncdc.noaa.gov/stormevents/. Essentially, our analysis suggests that Tornados are the most harmful to Population Health, while Floods cause the greatest Economic Consequences in the United States.

Data Processing

Loading the required libraries

library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
library(ggplot2)

Loading the data

if(!file.exists("StormData.csv.bz2")){
  fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
  download.file(fileUrl, destfile="StormData.csv.bz2", method="curl")
}
if(!file.exists("StormData.csv.bz2")){
  stop("Can't locate file 'StormData.csv.bz2'!")
}
stormDataRaw <- read.csv("StormData.csv.bz2", header = TRUE, stringsAsFactors = FALSE)

Show structure of the dataset

str(stormDataRaw)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

There are 902,297 observations of 37 variables in the dataset. Only a subset is required for our analysis.

Relevant variables include the date (BGN_DATE), event type (EVTYPE), health impact (FATALITIES and INJURIES), economic damages (PROPDMG and CROPDMG), as well as their corresponding exponents (PROPDMGEXP and CROPDMGEXP).

stormData <- select(stormDataRaw, BGN_DATE, EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP, FATALITIES, INJURIES)
stormData$BGN_DATE <- as.Date(stormData$BGN_DATE, "%m/%d/%Y")
stormData$YEAR <- year(stormData$BGN_DATE)

According to the NOAA storm database, the full dataset of weather events (48 event types) is only available from the year 1996. From the years 1950 to 1995, only a subset of event types is available. For a fair comparison, we will limit our dataset to observations from the years 1996 to 2011.

stormData <- filter(stormData, YEAR >= 1996)

Variables that are not relevant to population health and the economy are excluded from our analysis.

stormData <- filter(stormData, PROPDMG > 0 | CROPDMG > 0 | FATALITIES > 0 | INJURIES > 0)

The variables for economic damages, PROPDMG and CROPDMG, require adjustments. They each have a separate exponent variable, PROPDMGEXP and CROPDMGEXP, which need to be converted into a proper factor.

table(stormData$PROPDMGEXP)
## 
##             B      K      M 
##   8448     32 185474   7364
table(stormData$CROPDMGEXP)
## 
##             B      K      M 
## 102767      2  96787   1762

Both exponent variables, PROPDMGEXP and CROPDMGEXP, are converted to uppercase for translation to their corresponding factors: "“,”?“,”+“,”-" = 1 “0” = 1 “1” = 10 “2” = 100 “3” = 1,000 “4” = 10,000 “5” = 100,000 “6” = 1,000,000 “7” = 10,000,000 “8” = 100,000,000 “9” = 1,000,000,000 “H” = 100 “K” = 1,000 “M” = 1,000,000 “B” = 1,000,000,000

stormData$PROPDMGEXP <- toupper(stormData$PROPDMGEXP)
stormData$CROPDMGEXP <- toupper(stormData$CROPDMGEXP)

stormData$PROPDMGFACTOR[(stormData$PROPDMGEXP == "")] <- 10^0
stormData$PROPDMGFACTOR[(stormData$PROPDMGEXP == "-")] <- 10^0
stormData$PROPDMGFACTOR[(stormData$PROPDMGEXP == "?")] <- 10^0
stormData$PROPDMGFACTOR[(stormData$PROPDMGEXP == "+")] <- 10^0
stormData$PROPDMGFACTOR[(stormData$PROPDMGEXP == "0")] <- 10^0
stormData$PROPDMGFACTOR[(stormData$PROPDMGEXP == "1")] <- 10^1
stormData$PROPDMGFACTOR[(stormData$PROPDMGEXP == "2")] <- 10^2
stormData$PROPDMGFACTOR[(stormData$PROPDMGEXP == "3")] <- 10^3
stormData$PROPDMGFACTOR[(stormData$PROPDMGEXP == "4")] <- 10^4
stormData$PROPDMGFACTOR[(stormData$PROPDMGEXP == "5")] <- 10^5
stormData$PROPDMGFACTOR[(stormData$PROPDMGEXP == "6")] <- 10^6
stormData$PROPDMGFACTOR[(stormData$PROPDMGEXP == "7")] <- 10^7
stormData$PROPDMGFACTOR[(stormData$PROPDMGEXP == "8")] <- 10^8
stormData$PROPDMGFACTOR[(stormData$PROPDMGEXP == "H")] <- 10^2
stormData$PROPDMGFACTOR[(stormData$PROPDMGEXP == "K")] <- 10^3
stormData$PROPDMGFACTOR[(stormData$PROPDMGEXP == "M")] <- 10^6
stormData$PROPDMGFACTOR[(stormData$PROPDMGEXP == "B")] <- 10^9

stormData$CROPDMGFACTOR[(stormData$CROPDMGEXP == "")] <- 10^0
stormData$CROPDMGFACTOR[(stormData$CROPDMGEXP == "?")] <- 10^0
stormData$CROPDMGFACTOR[(stormData$CROPDMGEXP == "0")] <- 10^0
stormData$CROPDMGFACTOR[(stormData$CROPDMGEXP == "2")] <- 10^2
stormData$CROPDMGFACTOR[(stormData$CROPDMGEXP == "K")] <- 10^3
stormData$CROPDMGFACTOR[(stormData$CROPDMGEXP == "M")] <- 10^6
stormData$CROPDMGFACTOR[(stormData$CROPDMGEXP == "B")] <- 10^9

The distinction between FATALITIES and INJURIES is not important for our analysis. Therefore, both variables are combined to form a new variable, HEALTHIMPACT.

Likewise for economic damages, both PROPDMG and CROPDMG are multiplied by their corresponding factors and combined to form a new variable, ECONOMICIMPACT.

stormData <- mutate(stormData, HEALTHIMPACT = FATALITIES + INJURIES)
stormData <- mutate(stormData, ECONOMICIMPACT = PROPDMG * PROPDMGFACTOR + CROPDMG * CROPDMGFACTOR)

The variable, event type (EVTYPE), also requires cleaning up. Since our analysis is looking at the most impactful events, only part of EVTYPE needs to be cleaned up. For our analysis, we will look at event types in the 95% percentile.

First, we look at event types in the 95% percentile for health impact (HEALTHIMPACT).

healthImpact <- with(stormData, aggregate(HEALTHIMPACT ~ EVTYPE, FUN = sum))
subset(healthImpact, HEALTHIMPACT > quantile(HEALTHIMPACT, prob = 0.95))
##                EVTYPE HEALTHIMPACT
## 45     EXCESSIVE HEAT         8188
## 53        FLASH FLOOD         2561
## 55              FLOOD         7172
## 86               HEAT         1459
## 102         HIGH WIND         1318
## 107 HURRICANE/TYPHOON         1339
## 130         LIGHTNING         4792
## 177 THUNDERSTORM WIND         1530
## 181           TORNADO        22178
## 186         TSTM WIND         3870
## 211          WILDFIRE          986
## 217      WINTER STORM         1483

There are two event types in the 95% percentile which are not official definitions in https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf. They are “TSTM WIND” (“THUNDERSTORM WIND”) and “HURRICANE/TYPHOON” (“HURRICANE (TYPHOON)”).

stormData$EVTYPE[(stormData$EVTYPE == "TSTM WIND")] <- "THUNDERSTORM WIND"
stormData$EVTYPE[(stormData$EVTYPE == "HURRICANE/TYPHOON")] <- "HURRICANE (TYPHOON)"

Next, we look at event types in the 95% percentile for economic impact (ECONOMICIMPACT).

economicImpact <- with(stormData, aggregate(ECONOMICIMPACT ~ EVTYPE, FUN = sum))
subset(economicImpact, ECONOMICIMPACT > quantile(ECONOMICIMPACT, prob = 0.95))
##                  EVTYPE ECONOMICIMPACT
## 37              DROUGHT    14413667000
## 53          FLASH FLOOD    16557105610
## 55                FLOOD   148919611950
## 83                 HAIL    17071172870
## 102           HIGH WIND     5881421660
## 105           HURRICANE    14554229010
## 106 HURRICANE (TYPHOON)    71913712800
## 170         STORM SURGE    43193541000
## 177   THUNDERSTORM WIND     8812927230
## 181             TORNADO    24900370720
## 184      TROPICAL STORM     8320186550

There are two event types in the 95% percentile which are not official definitions in https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf. They are “HURRICANE” (“HURRICANE (TYPHOON)”) and “STORM SURGE” (“STORM SURGE/TIDE”).

stormData$EVTYPE[(stormData$EVTYPE == "HURRICANE")] <- "HURRICANE (TYPHOON)"
stormData$EVTYPE[(stormData$EVTYPE == "STORM SURGE")] <- "STORM SURGE/TIDE"

Results

First, we look at the Top 10 Severe Weather Events (EVTYPE) that have the greatest impact on Population Health (healthImpact) in the United States.

healthImpact <- stormData %>%
  group_by(EVTYPE) %>%
  summarise(HEALTHIMPACT = sum(HEALTHIMPACT)) %>%
  arrange(desc(HEALTHIMPACT))

g1 <- ggplot(healthImpact[1:10,], aes(x=reorder(EVTYPE, -HEALTHIMPACT), y=HEALTHIMPACT, color=EVTYPE)) +
  geom_bar(stat="identity", fill="white") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  xlab("Event Type") + ylab("Number of Fatalities and Injuries") +
  theme(legend.position="none") +
  ggtitle("Impact of Severe Weather Events on Population Health in the United States")
print(g1)

Essentially, the figure above shows that Tornados have the greatest impact on Population Health in the United States.

Next, we look at the Top 10 Severe Weather Events (EVTYPE) that have the greatest impact on the Economy (economicImpact) in the United States.

economicImpact <- stormData %>%
  group_by(EVTYPE) %>%
  summarise(ECONOMICIMPACT = sum(ECONOMICIMPACT)) %>%
  arrange(desc(ECONOMICIMPACT))

g2 <- ggplot(economicImpact[1:10,], aes(x=reorder(EVTYPE, -ECONOMICIMPACT), y=ECONOMICIMPACT, color=EVTYPE)) +
  geom_bar(stat="identity", fill="white") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  xlab("Event Type") + ylab("Economic Damages in USD") +
  theme(legend.position="none") +
  ggtitle("Impact of Severe Weather Events on the Economy in the United States")
print(g2)

Essentially, the figure above shows that Floods have the greatest impact on the Economy in the United States.

In summary, our analysis suggests that Tornados are the most harmful to Population Health, while Floods cause the greatest Economic Consequences in the United States.