Impacts of Weather Events on Health and Economics

Synopsis

Severe weather events like storms and floods can cause massive lost in both public health and economics for communities and municipalities. To diminish the lost caused by these disasters, it is crucial to identify the most harmful weather events so that resources can be allocated strategically. This project aims to pinpoint the top weather events that have caused the most severe damages on population health or economics during 1950 to November 2011 using the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. Our results showed that tornado and flood were the top weather events that have damaged public health and economics, respectively.

Data Processing

Load the packages required for analysis.

library(dplyr)
library(tidyr)
library(ggplot2)

Download the dataset and save it as st.

if(!file.exists("repdata_data_StormData.csv.bz2")) {
        download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                      destfile = "repdata_data_StormData.csv.bz2")
}

st <- read.csv(bzfile("repdata_data_StormData.csv.bz2"))
print(names(st))
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

To determine the economic consequences, I will focus on two aspects:

  1. Property damage: It can be acquired using the column PROPDMG and PROPDMGEXP. PROPDM contains the numerical value indicating the magnitude of property damage (in Dollars as described in the Storm Data Documentation), and PROPDMGEXP denotes the character-based exponent (e.g., ‘K’ for thousands, ‘M’ for millions, ‘B’ for billions) that modifies the corresponding PROPDMG value.

  2. Crop damage: It can be calculated using the column CROPDMG and CROPDMGEXP as described above.

To calculate the actual damages on properties and crops, I investigated PROPDMGEXP and CROPDMGEXP columns.

unique(c(st$CROPDMGEXP, st$PROPDMGEXP))
##  [1] ""  "M" "K" "m" "B" "?" "0" "k" "2" "+" "5" "6" "4" "3" "h" "7" "H" "-" "1"
## [20] "8"

According to general concepts and guidelines listed in the Storm Data Documentation, I defined (and deduced) the rules for both PROPDMGEXP and CROPDMGEXP, as listed below:

  • B: 10^9
  • M or m: 10^6
  • K or k: 10^3
  • H or h: 10^2
  • 1, 2, 3, 4, 5, 6, 7, 8: 10^1, 10^2, 10^3, 10^4, 10^5, 10^6, 10^7, 10^8, respectively.
  • Blank, ?, +, -, 0 : 10^0 (or 1)

Based on these rules, I now can transform the data to get actual monetary damages on property and crop, and save them in the new columns PropDol and CropDol respectively within the new data.frame st2.

st2 <- st %>%
        mutate(CropDol = ifelse(grepl("[Mm]", CROPDMGEXP), CROPDMG * 10^6, NA)) %>%
        mutate(CropDol = ifelse(grepl("[Kk]", CROPDMGEXP), CROPDMG * 10^3, CropDol)) %>%
        mutate(CropDol = ifelse(CROPDMGEXP == "B", CROPDMG * 10^9, CropDol)) %>%
        mutate(CropDol = ifelse(CROPDMGEXP == "2", CROPDMG * 10^2, CropDol)) %>%
        mutate(CropDol = ifelse(CROPDMGEXP == "0", CROPDMG, CropDol)) %>%
        mutate(CropDol = ifelse(CROPDMGEXP %in% c("0", "?", ""), CROPDMG, CropDol)) %>%
        mutate(PropDol = ifelse(grepl("[Mm]|6", PROPDMGEXP), PROPDMG * 10^6, NA)) %>%
        mutate(PropDol = ifelse(grepl("[Kk]", PROPDMGEXP), PROPDMG * 10^3, PropDol)) %>%
        mutate(PropDol = ifelse(grepl("[Hh]|2", PROPDMGEXP), PROPDMG * 10^2, PropDol)) %>%
        mutate(PropDol = ifelse(PROPDMGEXP == "B", PROPDMG * 10^9, PropDol)) %>%
        mutate(PropDol = ifelse(PROPDMGEXP %in% c("", "+", "0", "?", "-"), PROPDMG, PropDol)) %>%
        mutate(PropDol = ifelse(PROPDMGEXP == "1", PROPDMG * 10, PropDol)) %>%
        mutate(PropDol = ifelse(PROPDMGEXP == "3", PROPDMG * 10^3, PropDol)) %>%
        mutate(PropDol = ifelse(PROPDMGEXP == "4", PROPDMG * 10^4, PropDol)) %>%
        mutate(PropDol = ifelse(PROPDMGEXP == "5", PROPDMG * 10^5, PropDol)) %>%
        mutate(PropDol = ifelse(PROPDMGEXP == "7", PROPDMG * 10^7, PropDol)) %>%
        mutate(PropDol = ifelse(PROPDMGEXP == "8", PROPDMG * 10^8, PropDol))
summary(st2$PropDol)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.000e+00 0.000e+00 0.000e+00 4.746e+05 5.000e+02 1.150e+11
summary(st2$CropDol)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.000e+00 0.000e+00 0.000e+00 5.442e+04 0.000e+00 5.000e+09

Results

Question 1: Across the United States, which types of events (as indicated in the EVTYPE) are most harmful with respect to population health?

To determine the influences on population health, I will focus on fatality (the FATALITIES column) and injury (the INJURIES column), which represent the number of people affected under these categories.

I analyze the total fatality and injury caused by each weather events and stored them in the new columns FATALITIES and INJURIES:

stH <- st2 %>%
        group_by(EVTYPE) %>%
        summarise(FATALITIES = sum(FATALITIES), INJURIES = sum(INJURIES))
print(summary(st2$FATALITIES))
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##   0.0000   0.0000   0.0000   0.0168   0.0000 583.0000
print(summary(st2$INJURIES))
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##    0.0000    0.0000    0.0000    0.1557    0.0000 1700.0000

I identify the top 10 weather events with the highest combined total of fatalities and injuries, representing those that most significantly impacted population health.

stH10 <- stH %>%
         mutate(sumIncidence = FATALITIES + INJURIES) %>%
         arrange(desc(sumIncidence)) %>%
         select(-sumIncidence) %>%
         head(10)
print(stH10$EVTYPE)
##  [1] "TORNADO"           "EXCESSIVE HEAT"    "TSTM WIND"        
##  [4] "FLOOD"             "LIGHTNING"         "HEAT"             
##  [7] "FLASH FLOOD"       "ICE STORM"         "THUNDERSTORM WIND"
## [10] "WINTER STORM"

I further make a plot to illustrate the influences of these top 10 weather events on population health. Before plotting, I transform the data to a long format, which makes it easier for plotting using ggplot2.

stH10l <- stH10 %>%
        pivot_longer(cols = c(2,3), names_to = "HealthIssues", values_to = "Counts")
print(names(stH10l))
## [1] "EVTYPE"       "HealthIssues" "Counts"

Then I make a plot.

ggplot(stH10l, aes(x = reorder(EVTYPE, Counts, sum), y = Counts)) +
        geom_bar(stat = "identity", aes(fill = HealthIssues)) +
        coord_flip() +
        labs(title = "Top10 Weather Events Harmful to Health") +
        labs(x = "Weather Events", y = "Number of people affected") +
        labs(fill = "Health Impacts") +
        theme(axis.text.x = element_text(angle = 45, hjust = 1)) 

This plot shows the top 10 weather events with the highest combined total of fatalities and injuries, representing those that most significantly impacted population health. FATILITIES and INJURIES are stacked together for each weather events and labeled in red and green, respectively. This plot clearly shows that tornado is the most harmful weather event that influenced population health.

Question 2: Across the United States, which types of events have the greatest economic consequences?

As described previously, economic consequences are determined based on the sum of property damage and crop damage, which were already calculated in the Data Processing section and were stored in the new columns PropDol and CropDol respectively within the new data.frame st2

I analyze the total property damage and crop damage caused by each weather events and stored them in the new columns TotPropDol and TotCropDol. This was followed by the identification of the top 10 weather events with the highest combined total of fatalities and injuries, representing those that most significantly impacted population health.

st2g10 <- st2 %>%
        group_by(EVTYPE) %>%
        summarise(TotCropDol = sum(CropDol), TotPropDol = sum(PropDol)) %>%
        mutate(TotDamDol = TotCropDol + TotPropDol) %>%
        arrange(desc(TotDamDol)) %>%
        select(EVTYPE, TotCropDol, TotPropDol) %>%
        head(10)
print(st2g10$EVTYPE)
##  [1] "FLOOD"             "HURRICANE/TYPHOON" "TORNADO"          
##  [4] "STORM SURGE"       "HAIL"              "FLASH FLOOD"      
##  [7] "DROUGHT"           "HURRICANE"         "RIVER FLOOD"      
## [10] "ICE STORM"

I then make a plot to visualize the impacts of these top 10 weather events on economics. Before plotting, I transform the data to a long format, which makes it easier for plotting using ggplot2.

st2g10l <- pivot_longer(st2g10, cols = c(TotCropDol, TotPropDol), names_to = "DamageTypes", values_to = "Dollars")
print(names(st2g10l))
## [1] "EVTYPE"      "DamageTypes" "Dollars"

Then I make a plot.

ggplot(st2g10l, aes(x = reorder(EVTYPE, Dollars, sum), y = Dollars, fill = DamageTypes)) +
        geom_bar(stat = "identity")  +
        coord_flip() +
        labs(x = "Weather Events", y = "Lost Dollars") +
        labs(title = "Top10 Weather Events Impacting Economics") +
        labs(fill = "Influences") +
        scale_fill_discrete(labels = c("TotCropDol" = "Crop Damage", "TotPropDol" = "Property Damage")) +
        theme(axis.text.x = element_text(angle = 45, hjust = 1))

This plot shows the top 10 weather events with the highest combined total of property and crop damages, representing those that had the most economic consequences. Crop and property damages are stacked together for each weather events and labeled in red and green, respectively. This plot evidently shows that flood is the most harmful weather event to economics.

Conclusion

My analysis of the NOAA Storm Data reveals that the tornado is the most devastating weather event concerning population health (fatalities and injuries combined), while the flood impose the greatest economic consequences (property and crop damage combined) across the United States.