Analysis of the Impacts of Severe Weather Events on Public Health and the Economy

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
Data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database has been processed to highlight to most seevere events. TORNADO is by far the most harmful type of event. EXCESSIVE HEAT is the second most harmful type of event. FLASH FLOOD, HEAT, and LIGHTNING seem to cause some level of harm. Other events seem to be much less harmful.
Property damage seem to be well spread on many types of events, but TORNADO seems to cause about 30% of property damages.

The dataset can be downdloaded with this link

In order to get more information regarding the dataset, please consult
National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ

Data Processing

Loading packages

library(dplyr)
library(ggplot2)

Getting and reading data

Downloading the data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.

if(!file.exists("FStormData.csv.bz2")){
download.file(url = "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "FStormData.csv.bz2")
}

Reading bz2 file

if(!exists("rawData")){
rawData <- read.csv("FStormData.csv.bz2")      
}

Overview of the raw data

str(rawData)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Keeping only relevant variables for analysis

  • FATALITIES - Fatalities
  • INJURIES - Injuries
  • PROPDMG - Property damage
  • EVTYPE - Even type
data <- rawData %>% select("FATALITIES", "INJURIES", "PROPDMG", "EVTYPE")

Some exploratory analysis

Showing the discribution of data

head(table(data$FATALITIES))
## 
##      0      1      2      3      4      5 
## 895323   5010    996    314    166    114
head(table(data$INJURIES))
## 
##      0      1      2      3      4      5 
## 884693   7756   3134   1552    931    709
head(table(data$PROPDMG))
## 
##      0   0.01   0.02   0.03   0.04   0.05 
## 663123    931     80   1610      4    588
head(sort(table(data$EVTYPE), decreasing = TRUE))
## 
##              HAIL         TSTM WIND THUNDERSTORM WIND           TORNADO 
##            288661            219940             82563             60652 
##       FLASH FLOOD             FLOOD 
##             54277             25326

More information on the variables

summary(data)
##    FATALITIES          INJURIES            PROPDMG       
##  Min.   :  0.0000   Min.   :   0.0000   Min.   :   0.00  
##  1st Qu.:  0.0000   1st Qu.:   0.0000   1st Qu.:   0.00  
##  Median :  0.0000   Median :   0.0000   Median :   0.00  
##  Mean   :  0.0168   Mean   :   0.1557   Mean   :  12.06  
##  3rd Qu.:  0.0000   3rd Qu.:   0.0000   3rd Qu.:   0.50  
##  Max.   :583.0000   Max.   :1700.0000   Max.   :5000.00  
##                                                          
##                EVTYPE      
##  HAIL             :288661  
##  TSTM WIND        :219940  
##  THUNDERSTORM WIND: 82563  
##  TORNADO          : 60652  
##  FLASH FLOOD      : 54277  
##  FLOOD            : 25326  
##  (Other)          :170878

Whe can already see that for fatalities, injuries and property damage, medians are equal or close to 0, meaning that a very large amount of the events resulted in no fatalities, injuries or property damage.

Hail and Too Small to Mesure Winds are the 2 most recorded events.

Processing for most harmful event types to the population health

Grouping data by event type and computing sum of each type of event for Fatalities and Injuries. Creating an index of the most harmful event types, which I estimate as the sum of the normalized sum of Fatalities and Injuries. Ordering the results by the index and keeping only the top 10 even types for lisibility reasons.

harmful <- data %>%
      group_by(EVTYPE) %>%
      summarize(sumFatalities = sum(FATALITIES, na.rm = TRUE),
                sumInjuries = sum(INJURIES, na.rm = TRUE))

harmful$Index <- harmful$sumFatalities/max(harmful$sumFatalities)
      + harmful$sumInjuries/max(harmful$sumInjuries)

mostHarmful<- harmful[order(-harmful$Index),]
mostHarmful <- mostHarmful[1:10,]

Processing for most damaging event types on the economy

Grouping data by event type and computing sum of the property damage. Ordering the results and keeping only the top 10 even types for lisibility reasons.

damage <- data %>%
      group_by(EVTYPE) %>%
      summarize(sumPropdmg = sum(PROPDMG, na.rm = TRUE))

mostDamage<- damage[order(-damage$sumPropdmg),]
mostDamage$Percentage <- mostDamage$sumPropdmg/sum(mostDamage$sumPropdmg)
mostDamage <- mostDamage[1:10,]

Results

1. Most harmful types of event to the population health

The table and plot below show that

  • TORNADO is by far the most harmful type of event
  • EXCESSIVE HEAT is the second most harmful type of event
  • FLASH FLOOD, HEAT, and LIGHTNING seem to cause some level of harm
  • Other events seem to be much less harmful
  • TSTM WIND and FLOOD seem to be the second and the third cause of injuries

Table of the most harmful event types

mostHarmful
## # A tibble: 10 x 4
##    EVTYPE         sumFatalities sumInjuries  Index
##    <fct>                  <dbl>       <dbl>  <dbl>
##  1 TORNADO                 5633       91346 1.00  
##  2 EXCESSIVE HEAT          1903        6525 0.338 
##  3 FLASH FLOOD              978        1777 0.174 
##  4 HEAT                     937        2100 0.166 
##  5 LIGHTNING                816        5230 0.145 
##  6 TSTM WIND                504        6957 0.0895
##  7 FLOOD                    470        6789 0.0834
##  8 RIP CURRENT              368         232 0.0653
##  9 HIGH WIND                248        1137 0.0440
## 10 AVALANCHE                224         170 0.0398

Barplot of the most harmful event types

### Making a clean dataset first
mostHarmfulBarPlot <- data.frame(EVTYPE = mostHarmful$EVTYPE, variable = "Fatalities", value = mostHarmful$sumFatalities, Index = mostHarmful$Index)
mostHarmfulBarPlot <- rbind(mostHarmfulBarPlot,
      data.frame(EVTYPE = mostHarmful$EVTYPE, variable = "Injuries", value = mostHarmful$sumInjuries, Index = mostHarmful$Index))

mostHarmfulBarPlot<- mostHarmfulBarPlot[order(-mostHarmfulBarPlot$Index),]

### Making the plot
ggplot(data = mostHarmfulBarPlot, aes(x=EVTYPE, y=value)) +
      geom_bar(stat="identity", position=position_dodge()) +
      coord_flip() +
      facet_grid(. ~ variable, scales = "free") +
      ggtitle("Event types causing the most fatalities and injuries") +
      scale_x_discrete(limits=rev(mostHarmfulBarPlot$EVTYPE)) +
      xlab("Type of event") +
      ylab("Number of fatalities and injuries")

2. Most damaging types of event on properties

The table and plot below show that

  • Tornados seems to cause about 30% of property damages
  • Property damage seem to be well spread on many types of events

Table of the most damaging event types on properties

mostDamage
## # A tibble: 10 x 3
##    EVTYPE             sumPropdmg Percentage
##    <fct>                   <dbl>      <dbl>
##  1 TORNADO               3212258     0.295 
##  2 FLASH FLOOD           1420125     0.130 
##  3 TSTM WIND             1335966     0.123 
##  4 FLOOD                  899938     0.0827
##  5 THUNDERSTORM WIND      876844     0.0806
##  6 HAIL                   688693     0.0633
##  7 LIGHTNING              603352     0.0554
##  8 THUNDERSTORM WINDS     446293     0.0410
##  9 HIGH WIND              324732     0.0298
## 10 WINTER STORM           132721     0.0122

Barplot of the most damaging event types on properties

### Making the plot
options(scipen = 10)
ggplot(data = mostDamage, aes(x=EVTYPE, y=sumPropdmg)) + 
      geom_bar(stat="identity", position=position_dodge()) +
      scale_x_discrete(limits=rev(mostDamage$EVTYPE)) +
      coord_flip() +
      ggtitle("Event types causing the most damage on properties") +
      xlab("Type of event") +
      ylab("Damage to properties")