Synopsis

Severe weather events can cause significant damage. In this study, we analyzed storm data from the U.S. National Oceanic and Atmospheric Administration (NOAA) to identify the most harmful weather events to population health and the ones that caused the greatest economic damage. Our finding indicate that tornadoes are the most harmful to population health including fatalities and injuries. Additionally, floods caused the greatest economic damage, including damage to properties and crops.

Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Data Processing

Load packages

library(R.utils)
library(tidyverse)

Unzip and read the file

url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, destfile = "repdata_data_StormData.csv.bz2")
bunzip2("repdata_data_StormData.csv.bz2")
storm <- read.csv("repdata_data_StormData.csv")

Inspect data

str(storm)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Subset data

To explore population health and economy, we will only need “FATALITIES”,“INJURIES”,“PROPDMG”,“PROPDMGEXP”,“CROPDMG”,“CROPDMGEXP” columns with “EVTYPE”.

storm2 <- storm %>%
    select("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")
summary(storm2)
##     EVTYPE            FATALITIES          INJURIES            PROPDMG       
##  Length:902297      Min.   :  0.0000   Min.   :   0.0000   Min.   :   0.00  
##  Class :character   1st Qu.:  0.0000   1st Qu.:   0.0000   1st Qu.:   0.00  
##  Mode  :character   Median :  0.0000   Median :   0.0000   Median :   0.00  
##                     Mean   :  0.0168   Mean   :   0.1557   Mean   :  12.06  
##                     3rd Qu.:  0.0000   3rd Qu.:   0.0000   3rd Qu.:   0.50  
##                     Max.   :583.0000   Max.   :1700.0000   Max.   :5000.00  
##   PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Length:902297      Min.   :  0.000   Length:902297     
##  Class :character   1st Qu.:  0.000   Class :character  
##  Mode  :character   Median :  0.000   Mode  :character  
##                     Mean   :  1.527                     
##                     3rd Qu.:  0.000                     
##                     Max.   :990.000

Calculate population health

  1. To get total population health values, “FATALITIES” and INJURIES” are added.
storm2 <- storm2 %>%
    mutate(TOTAL_HEALTH = FATALITIES + INJURIES)
  1. Get Top 10 most harmful weather events on population health
health10 <- storm2 %>%
      select(EVTYPE, FATALITIES, INJURIES, TOTAL_HEALTH) %>%
      group_by(EVTYPE) %>%
      summarize(
        TOTAL_HEALTH = sum(TOTAL_HEALTH, na.rm = TRUE),
        FATALITIES = sum(FATALITIES, na.rm = TRUE),
        INJURIES = sum(INJURIES, na.rm = TRUE)
        ) %>%     
      arrange(desc(TOTAL_HEALTH)) %>%
      slice(1:10)
health10
## # A tibble: 10 × 4
##    EVTYPE            TOTAL_HEALTH FATALITIES INJURIES
##    <chr>                    <dbl>      <dbl>    <dbl>
##  1 TORNADO                  96979       5633    91346
##  2 EXCESSIVE HEAT            8428       1903     6525
##  3 TSTM WIND                 7461        504     6957
##  4 FLOOD                     7259        470     6789
##  5 LIGHTNING                 6046        816     5230
##  6 HEAT                      3037        937     2100
##  7 FLASH FLOOD               2755        978     1777
##  8 ICE STORM                 2064         89     1975
##  9 THUNDERSTORM WIND         1621        133     1488
## 10 WINTER STORM              1527        206     1321

Calculate total economic damage

  1. Inspect the values in both property and crop damage exponent columns.
unique(storm2$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
unique(storm2$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"
  1. Create the function “convert_exp()” to convert character values into corresponding numerical values.
convert_exp <- function(exp){
      case_when(
        exp %in% c("+", "-", "", " ", "?") ~ 10^0,
        exp %in% c("0","1","2","3","4","5","6","7","8") ~ 10^as.numeric(exp),
        exp %in% c("K","k") ~ 10^3,
        exp %in% c("M","m") ~ 10^6,
        exp %in% c("B") ~ 10^9,
        exp %in% c("H","h") ~ 10^2,
        TRUE ~ 10^0
      )
}
  1. Make new columns containing complete property and crop damage values by multiplying PROP or CROP damage columns with their corresponding exponent columns.
storm2 <- storm2 %>%
      mutate(cPROPDMG = PROPDMG * convert_exp(PROPDMGEXP),
             cCROPDMG = CROPDMG * convert_exp(CROPDMGEXP)) %>%
      select(-PROPDMG, -PROPDMGEXP, -CROPDMG, -CROPDMGEXP)
  1. Calculate total economic damage by getting sum of cPROPDMG and cCROPDMG
storm2 <- storm2 %>%
      mutate(TOTAL_ECONOMIC_IMPACT = cPROPDMG + cCROPDMG)
head(storm2)
##    EVTYPE FATALITIES INJURIES TOTAL_HEALTH cPROPDMG cCROPDMG
## 1 TORNADO          0       15           15    25000        0
## 2 TORNADO          0        0            0     2500        0
## 3 TORNADO          0        2            2    25000        0
## 4 TORNADO          0        2            2     2500        0
## 5 TORNADO          0        2            2     2500        0
## 6 TORNADO          0        6            6     2500        0
##   TOTAL_ECONOMIC_IMPACT
## 1                 25000
## 2                  2500
## 3                 25000
## 4                  2500
## 5                  2500
## 6                  2500
  1. Get Top 10 weather events with the greatest economic impacts
economic10 <- storm2 %>%
      select(EVTYPE, cPROPDMG, cCROPDMG, TOTAL_ECONOMIC_IMPACT) %>%
      group_by(EVTYPE) %>%
      summarize(
        TOTAL_ECONOMIC_IMPACT = sum(TOTAL_ECONOMIC_IMPACT, na.rm=TRUE),
        cPROPDMG = sum(cPROPDMG, na.rm=TRUE),
        cCROPDMG = sum(cCROPDMG, na.rm=TRUE)
        ) %>%
      arrange(desc(TOTAL_ECONOMIC_IMPACT)) %>%
      slice(1:10)
economic10
## # A tibble: 10 × 4
##    EVTYPE            TOTAL_ECONOMIC_IMPACT      cPROPDMG    cCROPDMG
##    <chr>                             <dbl>         <dbl>       <dbl>
##  1 FLOOD                     150319678257  144657709807   5661968450
##  2 HURRICANE/TYPHOON          71913712800   69305840000   2607872800
##  3 TORNADO                    57362333946.  56947380676.   414953270
##  4 STORM SURGE                43323541000   43323536000         5000
##  5 HAIL                       18761221986.  15735267513.  3025954473
##  6 FLASH FLOOD                18243991078.  16822673978.  1421317100
##  7 DROUGHT                    15018672000    1046106000  13972566000
##  8 HURRICANE                  14610229010   11868319010   2741910000
##  9 RIVER FLOOD                10148404500    5118945500   5029459000
## 10 ICE STORM                   8967041360    3944927860   5022113500

Results

1. Tornado is the most harmful weather event on population health across US.

health10_tidy <- health10 %>%
      pivot_longer(cols = c(FATALITIES, INJURIES),
               names_to = "variable",
               values_to = "value")

ggplot(health10_tidy, aes(x=reorder(EVTYPE, -value), y=value, fill=variable)) +
      geom_bar(stat="identity") +
      labs(title = "TOP 10 Most Harmful Weather Events on Health",
           x = "Event Type", y= "Health Impact") +
      theme_classic() +
      guides(fill = guide_legend(title = NULL)) +
      theme(axis.text.x = element_text(angle = 45, hjust = 1))

Fig.1 Top 10 most harmful weather events on health. Stacked bar plot shows top 10 weather events with the greatest number of the sum of fatalities and injuries. Pink indicates fatalities. Teal indicates injuries.

According to Fig.1, tornadoes, excessive heat, thunderstorm winds, floods, lightning, heat, flash floods, ice storms, thunderstorm winds, and winter storms are the most harmful weather events, listed in order of severity. Tornadoes are extremely dangerous, causing a significant number of both fatalities and injuries.

2. Flood has the greatest economic consequences across US.

economic10_tidy <- economic10 %>%
      pivot_longer(cols = c(cPROPDMG, cCROPDMG),
               names_to = "variable",
               values_to = "value")

ggplot(economic10_tidy, aes(x=reorder(EVTYPE, -value), y=value/10^9, fill=variable)) +
      geom_bar(stat="identity") +
      labs(title = "Top 10 weather events with the greatest economic impacts",
           x = "Event Type", y= "Economic Impact (in billion)") +
      theme_classic() +
      guides(fill = guide_legend(title = NULL)) +
      theme(axis.text.x = element_text(angle = 45, hjust = 1))

Fig.2 Top 10 weather events with the greatest economic impact. Stacked bar plot shows top 10 weather events with the greatest cost of the sum of property damages and crop damages. Pink indicates the cost of crop damages. Teal indicates the cost of property damages.

Based on Fig.2, floods, hurricanes/typhoons, tornadoes, storm surges, hail, flash floods, droughts, hurricanes, river floods, and ice storms have caused the most damage to both properties and crops, listed in order of severity. Floods have the greatest economic impact. Most weather events cause damage primarily to properties, but droughts particularly affect crops. River floods and ice storms also cause significant crop damage.