Exploring Health and Economic Impact of Weather Events Between 1950 and 2011

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

This analysis will be used to explore the NOAA Storm Database and answer some basic questions about severe weather events. The data set in this analysis consists of weather events from 1950 to November 2011.

Data Processing

I will use the Storm Data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to perform the analysis.

Reading the Data

Displaying Session Info

sessionInfo()

## R version 4.2.2 (2022-10-31)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.30   R6_2.5.1        jsonlite_1.8.3  magrittr_2.0.3 
##  [5] evaluate_0.18   stringi_1.7.8   rlang_1.0.6     cachem_1.0.6   
##  [9] cli_3.4.1       rstudioapi_0.14 jquerylib_0.1.4 bslib_0.4.1    
## [13] rmarkdown_2.18  tools_4.2.2     stringr_1.4.1   xfun_0.33      
## [17] fastmap_1.1.0   compiler_4.2.2  htmltools_0.5.3 knitr_1.40     
## [21] sass_0.4.2

data_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"

if (!file.exists(".\\Storm_Data.csv.bz2")) {
    download.file(url = data_url, destfile = ".\\Storm_Data.csv.bz2")
    }

storm_data <- read.csv(".\\Storm_Data.csv.bz2")

After reading the data into storm_data dataframe, checking the number of variables and observations.

str(storm_data)

## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

head(storm_data)

##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6

Loading the needed packages

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
library(reshape2)

Subsetting Data

Since not all the variables and observations necessary to perform the analysis, data will be subsetted where Event Type was captured and also damage is more than 0.

storm_data_subset <- subset(storm_data, EVTYPE != "?" & (FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0), select = c("BGN_DATE", "END_DATE", "STATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))
dim(storm_data_subset)

## [1] 254632     10

This subset data has only 254632 observations and 10 variables we need to use for the rest of the analysis. However, Event Type still needs to be cleaned since it has more than 487 unique values in it.

length(unique(storm_data_subset$EVTYPE))

## [1] 487

I will start cleaning every single event type by converting them to uppercase.

storm_data_subset$EVTYPE <- toupper(storm_data_subset$EVTYPE)

Next I will convert every event type to a unique value

# AVALANCHE
storm_data_subset$EVTYPE <- gsub('.*AVALANCE.*', 'AVALANCHE', storm_data_subset$EVTYPE)
# BLIZZARD
storm_data_subset$EVTYPE <- gsub('.*BLIZZARD.*', 'BLIZZARD', storm_data_subset$EVTYPE)
# CLOUD
storm_data_subset$EVTYPE <- gsub('.*CLOUD.*', 'CLOUD', storm_data_subset$EVTYPE)
# COLD
storm_data_subset$EVTYPE <- gsub('.*COLD.*', 'COLD', storm_data_subset$EVTYPE)
storm_data_subset$EVTYPE <- gsub('.*FREEZ.*', 'COLD', storm_data_subset$EVTYPE)
storm_data_subset$EVTYPE <- gsub('.*FROST.*', 'COLD', storm_data_subset$EVTYPE)
storm_data_subset$EVTYPE <- gsub('.*ICE.*', 'COLD', storm_data_subset$EVTYPE)
storm_data_subset$EVTYPE <- gsub('.*LOW TEMPERATURE RECORD.*', 'COLD', storm_data_subset$EVTYPE)
storm_data_subset$EVTYPE <- gsub('.*LO.*TEMP.*', 'COLD', storm_data_subset$EVTYPE)
# DRY
storm_data_subset$EVTYPE <- gsub('.*DRY.*', 'DRY', storm_data_subset$EVTYPE)
# DUST
storm_data_subset$EVTYPE <- gsub('.*DUST.*', 'DUST', storm_data_subset$EVTYPE)
# FIRE
storm_data_subset$EVTYPE <- gsub('.*FIRE.*', 'FIRE', storm_data_subset$EVTYPE)
# FLOOD
storm_data_subset$EVTYPE <- gsub('.*FLOOD.*', 'FLOOD', storm_data_subset$EVTYPE)
# FOG
storm_data_subset$EVTYPE <- gsub('.*FOG.*', 'FOG', storm_data_subset$EVTYPE)
# HAIL
storm_data_subset$EVTYPE <- gsub('.*HAIL.*', 'HAIL', storm_data_subset$EVTYPE)
# HEAT
storm_data_subset$EVTYPE <- gsub('.*HEAT.*', 'HEAT', storm_data_subset$EVTYPE)
storm_data_subset$EVTYPE <- gsub('.*WARM.*', 'HEAT', storm_data_subset$EVTYPE)
storm_data_subset$EVTYPE <- gsub('.*HIGH.*TEMP.*', 'HEAT', storm_data_subset$EVTYPE)
storm_data_subset$EVTYPE <- gsub('.*RECORD HIGH TEMPERATURES.*', 'HEAT', storm_data_subset$EVTYPE)
# HYPOTHERMIA/EXPOSURE
storm_data_subset$EVTYPE <- gsub('.*HYPOTHERMIA.*', 'HYPOTHERMIA/EXPOSURE', storm_data_subset$EVTYPE)
# LANDSLIDE
storm_data_subset$EVTYPE <- gsub('.*LANDSLIDE.*', 'LANDSLIDE', storm_data_subset$EVTYPE)
# LIGHTNING
storm_data_subset$EVTYPE <- gsub('^LIGHTNING.*', 'LIGHTNING', storm_data_subset$EVTYPE)
storm_data_subset$EVTYPE <- gsub('^LIGNTNING.*', 'LIGHTNING', storm_data_subset$EVTYPE)
storm_data_subset$EVTYPE <- gsub('^LIGHTING.*', 'LIGHTNING', storm_data_subset$EVTYPE)
# MICROBURST
storm_data_subset$EVTYPE <- gsub('.*MICROBURST.*', 'MICROBURST', storm_data_subset$EVTYPE)
# MUDSLIDE
storm_data_subset$EVTYPE <- gsub('.*MUDSLIDE.*', 'MUDSLIDE', storm_data_subset$EVTYPE)
storm_data_subset$EVTYPE <- gsub('.*MUD SLIDE.*', 'MUDSLIDE', storm_data_subset$EVTYPE)
# RAIN
storm_data_subset$EVTYPE <- gsub('.*RAIN.*', 'RAIN', storm_data_subset$EVTYPE)
# RIP CURRENT
storm_data_subset$EVTYPE <- gsub('.*RIP CURRENT.*', 'RIP CURRENT', storm_data_subset$EVTYPE)
# STORM
storm_data_subset$EVTYPE <- gsub('.*STORM.*', 'STORM', storm_data_subset$EVTYPE)
# SUMMARY
storm_data_subset$EVTYPE <- gsub('.*SUMMARY.*', 'SUMMARY', storm_data_subset$EVTYPE)
# TORNADO
storm_data_subset$EVTYPE <- gsub('.*TORNADO.*', 'TORNADO', storm_data_subset$EVTYPE)
storm_data_subset$EVTYPE <- gsub('.*TORNDAO.*', 'TORNADO', storm_data_subset$EVTYPE)
storm_data_subset$EVTYPE <- gsub('.*LANDSPOUT.*', 'TORNADO', storm_data_subset$EVTYPE)
storm_data_subset$EVTYPE <- gsub('.*WATERSPOUT.*', 'TORNADO', storm_data_subset$EVTYPE)
# SURF
storm_data_subset$EVTYPE <- gsub('.*SURF.*', 'SURF', storm_data_subset$EVTYPE)
# VOLCANIC
storm_data_subset$EVTYPE <- gsub('.*VOLCANIC.*', 'VOLCANIC', storm_data_subset$EVTYPE)
# WET
storm_data_subset$EVTYPE <- gsub('.*WET.*', 'WET', storm_data_subset$EVTYPE)
# WIND
storm_data_subset$EVTYPE <- gsub('.*WIND.*', 'WIND', storm_data_subset$EVTYPE)
# WINTER
storm_data_subset$EVTYPE <- gsub('.*WINTER.*', 'WINTER', storm_data_subset$EVTYPE)
storm_data_subset$EVTYPE <- gsub('.*WINTRY.*', 'WINTER', storm_data_subset$EVTYPE)
storm_data_subset$EVTYPE <- gsub('.*SNOW.*', 'WINTER', storm_data_subset$EVTYPE)

Next I will convert the damage values appropriately by using *EXP values. We will start with converting the EXP values to UPPER case and also trim any of the whitespace before or after the values.

storm_data_subset$PROPDMGEXP <- toupper(storm_data_subset$PROPDMGEXP)
storm_data_subset$PROPDMGEXP <- trimws(storm_data_subset$PROPDMGEXP)
storm_data_subset$CROPDMGEXP <- toupper(storm_data_subset$CROPDMGEXP)
storm_data_subset$CROPDMGEXP <- trimws(storm_data_subset$CROPDMGEXP)

We will then create a function to find the appropriate multiplier based on EXP values.

find_multiplier <- function(exp) {
  exp <- exp;
  if (exp == "") return (10^0);
  if (exp == "+") return (10^0);
  if (exp == "-") return (10^0);
  if (exp == "?") return (10^0);
  if (exp == "H") return (10^2);
  if (exp == "K") return (10^3);
  if (exp == "M") return (10^6);
  if (exp == "B") return (10^9);
  if (exp == "0") return (10^0);
  if (exp == "1") return (10^1);
  if (exp == "2") return (10^2);
  if (exp == "3") return (10^3);
  if (exp == "4") return (10^4);
  if (exp == "5") return (10^5);
  if (exp == "6") return (10^6);
  if (exp == "7") return (10^7);
  if (exp == "8") return (10^8);
  if (exp == "9") return (10^9);
  return (NA);
}

Now we can actually multiply the values in the DMG values with the multiplier to find the actual dollar amount. We will leverage the function we created to make this easier.

storm_data_subset$PROPDMG <- storm_data_subset$PROPDMG * sapply(storm_data_subset$PROPDMGEXP, find_multiplier)
storm_data_subset$CROPDMG <- storm_data_subset$CROPDMG * sapply(storm_data_subset$CROPDMGEXP, find_multiplier)

Next we will deal with date columns to convert them to appropriate date format and also add a new column for the year.

storm_data_subset$BGN_DATE <- as.Date(storm_data_subset$BGN_DATE, format = "%m/%d/%Y")
storm_data_subset$END_DATE <- as.Date(storm_data_subset$END_DATE, format = "%m/%d/%Y")
storm_data_subset$YEAR <- as.integer(format(storm_data_subset$BGN_DATE, "%Y"))

Results

Now that we have our data set prepared, we will proceed with answering some questions related to analysis.

In order to make the analysis easier, we will convert the damage numbers to millions and store them in new columns. We will use these 2 columns going forward in the rest of the analysis.

storm_data_subset$PROPDMG_MIL <- storm_data_subset$PROPDMG / 10^6
storm_data_subset$CROPDMG_MIL <- storm_data_subset$CROPDMG / 10^6

We will group our results next by event type and store this in our final dataframe.

final_data <- storm_data_subset %>%
  group_by(EVTYPE) %>%
  summarize(EVENTS = length(EVTYPE),
            FATALITIES = sum(FATALITIES),
            INJURIES = sum(INJURIES),
            PROP_DMG = sum(PROPDMG_MIL),
            CROP_DMG = sum(CROPDMG_MIL)) %>%
            arrange(desc(EVENTS))

Population Health Impact Analysis

We will add a new column to calculate the casualty total which will be adding of Fatalities and Injuries. We will then sort the final data by the casualty and pick Top 10 to plot it.

final_data$CASUALTY <- final_data$FATALITIES + final_data$INJURIES
final_data <- arrange(final_data, desc(CASUALTY))
top10_casualty_data <- head(final_data, n=10)

We will plot this data next.

plot1 <- top10_casualty_data %>%
     select(EVTYPE, FATALITIES, INJURIES)
plot1 <- melt(plot1, id = c("EVTYPE"))
names(plot1) <- c("EV_TYPE", "CAS_TYPE", "VALUE")
pl1 <-ggplot(plot1, aes(fill = CAS_TYPE, y = VALUE, x = reorder(EV_TYPE, -VALUE))) + 
  geom_bar(position = "stack", stat = "identity") + 
  ggtitle("Population Health Impact of Weather Events") +
  xlab("Event") + ylab("Number of Casualties")
print(pl1)

Conclusion 1 - Population Health Impact

As seen by the plot, Tornado events by far the biggest impact on the population health followed by Wind, Heat and Flood.

Economic Consequences of Weather Events Analysis

We will start with adding a new column to sum of crop damage and property damage so that we can sort the data by this total damage.

final_data$TOTAL_DAMAGE <- final_data$CROP_DMG + final_data$PROP_DMG
final_data <- arrange(final_data, desc(TOTAL_DAMAGE))
top10_damage_data <- head(final_data, n=10)

We will plot this data next.

plot2 <- top10_damage_data %>%
     select(EVTYPE, PROP_DMG, CROP_DMG)
plot2 <- melt(plot2, id = c("EVTYPE"))
names(plot2) <- c("EV_TYPE", "DMG_TYPE", "VALUE")
pl2 <-ggplot(plot2, aes(fill = DMG_TYPE, y = VALUE, x = reorder(EV_TYPE, -VALUE))) + 
  geom_bar(position = "stack", stat = "identity") + 
  ggtitle("Economic Consequences of Weather Events") +
  xlab("Event") + ylab("Damage Dollar Amount in Millions") +
  theme(axis.text.x = element_text(angle = 45), text = element_text(size=15))
print(pl2)

Conclusion 2 - Economic Consequence of Weather Events

As seen from the 2nd plot, Floods have the highest economic impact followed by Hurricane, Storm and Tornado.