Reproducible Research Course Project 02: Health and Economic Impact of US Severe Weather

Jane Nyandele

April 29, 2023

Synopsis

In this project, we analyze the storm database from the U.S. National Oceanic and Atmospheric Administration (NOAA). We estimate fatalities, injuries, property damage, and crop damage for each type of weather event (i.e., Flood, Typhoon, Tornado, Hail, Hurricane, etc.). The goal is to determine which event(s) are most harmful to US population health and which event(s) have the most economic consequences. Our analysis shows that Tornadoes have the greatest health impact on US populations, while floods have the greatest economic impacts.

Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data

The data for this assignment comes in the form of a comma-separated-value (csv) file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Questions

This analysis addresses the following questions: 1. Within the United States, which weather events are most harmful with respect to population health? 2. Within the United States, which weather events have the greatest economic consequences?

Data Processing

0. Load Libraries

library(data.table)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.1.2

1. Load the data into R

data <- read.csv("repdata_data_StormData.csv", header = TRUE, sep = ",")

2. Inspect the data

Use col.names to check the column names

colnames(data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

3. Subset the data

Only subset columns related to health and economic impacts. For this reason, only subest the following columns: EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP

selection <- c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
data <- data[, selection]
summary(data)
##     EVTYPE            FATALITIES          INJURIES            PROPDMG       
##  Length:902297      Min.   :  0.0000   Min.   :   0.0000   Min.   :   0.00  
##  Class :character   1st Qu.:  0.0000   1st Qu.:   0.0000   1st Qu.:   0.00  
##  Mode  :character   Median :  0.0000   Median :   0.0000   Median :   0.00  
##                     Mean   :  0.0168   Mean   :   0.1557   Mean   :  12.06  
##                     3rd Qu.:  0.0000   3rd Qu.:   0.0000   3rd Qu.:   0.50  
##                     Max.   :583.0000   Max.   :1700.0000   Max.   :5000.00  
##   PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Length:902297      Min.   :  0.000   Length:902297     
##  Class :character   1st Qu.:  0.000   Class :character  
##  Mode  :character   Median :  0.000   Mode  :character  
##                     Mean   :  1.527                     
##                     3rd Qu.:  0.000                     
##                     Max.   :990.000

Extract only the rows in which fatalities, injuries and damages occurred (i.e. are not = 0)

data <- as.data.table(data)
data <- data[(EVTYPE != "?" & (INJURIES > 0 | FATALITIES > 0 | PROPDMG > 0 | CROPDMG >0)),
             c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]

4. Convert the exponent columns (PROPDMGEXP and CROPDMGEXP)

Convert the exponent values in the above columns from K, M, B to 1000, 1000000, 1000000000

cols <- c("PROPDMGEXP", "CROPDMGEXP")
data[, (cols) := c(lapply(.SD, toupper)), .SDcols = cols]

PROPDMGKey <-  c("\"\"" = 10^0, 
                 "-" = 10^0, "+" = 10^0, "0" = 10^0, "1" = 10^1, "2" = 10^2, "3" = 10^3,
                 "4" = 10^4, "5" = 10^5, "6" = 10^6, "7" = 10^7, "8" = 10^8, "9" = 10^9, 
                 "H" = 10^2, "K" = 10^3, "M" = 10^6, "B" = 10^9)
CROPDMGKey <-  c("\"\"" = 10^0, "?" = 10^0, "0" = 10^0, "K" = 10^3, "M" = 10^6, "B" = 10^9)

data[, PROPDMGEXP := PROPDMGKey[as.character(data[,PROPDMGEXP])]]
data[is.na(PROPDMGEXP), PROPDMGEXP := 10^0 ]

data[, CROPDMGEXP := CROPDMGKey[as.character(data[,CROPDMGEXP])] ]
data[is.na(CROPDMGEXP), CROPDMGEXP := 10^0 ]

5. Create new columns for Property Cost and Crop Cost

data <- data[, .(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, PROPCOST = PROPDMG * PROPDMGEXP,
                 CROPDMG, CROPDMGEXP, CROPCOST = CROPDMG * CROPDMGEXP)]

Data Analysis

Estimating the health impact (i.e. total fatalities and injuries)

Estimate the total fatalities and injuries for each weather event, sorted by event type

Health_Impact <- data[, .(FATALITIES = sum(FATALITIES), INJURIES = sum(INJURIES), 
                          TOTAL_HEALTH_IMPACT = sum(FATALITIES) + sum(INJURIES)), by = .(EVTYPE)]
# Order by total health impact in descending order
Health_Impact <- Health_Impact[order(-TOTAL_HEALTH_IMPACT),]
# Extract the top 10 event types with the greatest health impact
Health_Impact <- Health_Impact[1:10,]
head(Health_Impact, 10)
##                EVTYPE FATALITIES INJURIES TOTAL_HEALTH_IMPACT
##  1:           TORNADO       5633    91346               96979
##  2:    EXCESSIVE HEAT       1903     6525                8428
##  3:         TSTM WIND        504     6957                7461
##  4:             FLOOD        470     6789                7259
##  5:         LIGHTNING        816     5230                6046
##  6:              HEAT        937     2100                3037
##  7:       FLASH FLOOD        978     1777                2755
##  8:         ICE STORM         89     1975                2064
##  9: THUNDERSTORM WIND        133     1488                1621
## 10:      WINTER STORM        206     1321                1527

Estimating the economic impact (i.e. the property and crop costs)

Estimate the total of property cost and crop cost to know the economic impact

eco_impact <- data[, .(PROPCOST = sum(PROPCOST), CROPCOST = sum(CROPCOST), TOTAL_ECO_IMPACT = 
                         sum(PROPCOST) + sum(CROPCOST)), by = .(EVTYPE)]
# Order by total economic impact in descending order
eco_impact <- eco_impact[order(-TOTAL_ECO_IMPACT)]
# Extract the top ten weather events with the most economic impact
eco_impact <- eco_impact[1:10, ]
head(eco_impact, 10)
##                EVTYPE     PROPCOST    CROPCOST TOTAL_ECO_IMPACT
##  1:             FLOOD 144657709807  5661968450     150319678257
##  2: HURRICANE/TYPHOON  69305840000  2607872800      71913712800
##  3:           TORNADO  56947380676   414953270      57362333946
##  4:       STORM SURGE  43323536000        5000      43323541000
##  5:              HAIL  15735267513  3025954473      18761221986
##  6:       FLASH FLOOD  16822673978  1421317100      18243991078
##  7:           DROUGHT   1046106000 13972566000      15018672000
##  8:         HURRICANE  11868319010  2741910000      14610229010
##  9:       RIVER FLOOD   5118945500  5029459000      10148404500
## 10:         ICE STORM   3944927860  5022113500       8967041360

Results

Question 01: Events most harmful with respect to population health

Generate a histogram to illustrate the top 10 weather event that most affect population health

#elongate the dataframe to specify fatalities and injuries
health_consequences <- melt(Health_Impact, id.vars = "EVTYPE", variable.name = "Fatalities_or_Injuries")

#plot health_consequences
ggplot(health_consequences, aes(x = reorder(EVTYPE, -value), y = value)) + 
  geom_bar(stat = "identity", aes(fill = Fatalities_or_Injuries), position = "dodge") + 
  ylab("Total Injuries/Fatalities") + 
  xlab("Event Type") + 
  theme(axis.text.x = element_text(angle=65, hjust=1)) + 
  ggtitle("Top 10 US Weather Events Most Harmful to Population Health") + 
  theme(plot.title = element_text(hjust = 0.5))

Question 02: Events with the greatest economic consequences

Generate a histogram of the top 10 weather events most with the biggest health consequences

#elongate the dataframe to specify property and crop damage costs
eco_consequences <- melt(eco_impact, id.vars = "EVTYPE", variable.name = "Damage_Type")

#plot economic consequences
ggplot(eco_consequences, aes(x = reorder(EVTYPE, -value), y = value/1e9)) + 
  geom_bar(stat = "identity", aes(fill = Damage_Type), position = "dodge") + 
  ylab("Cost/Damage (in billion USD)") + 
  xlab("Event Type") + 
  theme(axis.text.x = element_text(angle=45, hjust=1)) + 
  ggtitle("Top 10 US Weather Events with the Greatest Economic consequences") + 
  theme(plot.title = element_text(hjust = 0.5))