The Economic and Population Health Effects of Storm Events in the United States

Synopsis

In this project the effects of “storm events” in the United States are analyzed. There were two main questions to be answered. Firstly, across the United States, which types of storm events are most harmful with respect to population health? Secondly, across the United States, which types of events have the greatest economic consequences? The raw data used to perform this analysis is from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database which tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The events in the database start in the year 1950 and end in November 2011.

Raw Data

The raw data used to perform the following analysis can be found here: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2

Documentation on the raw data used to perform the following analysis can be found here: https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf

DATA PROCESSING

Loading the raw data file

sd <- read.csv("StormData.csv.bz2")

Loading the libraries used to process the data

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Pre-processing the data by taking only the relevent columns of data frame

sda <- select(sd, EVTYPE, FATALITIES, 
                          INJURIES, 
                          PROPDMG,
                          CROPDMG)

Analysis Question 1 Data Processing:

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Take only the columns relevent to this question

sdpophealth <- select(sda, EVTYPE, FATALITIES, INJURIES)

Take averages and sums of deaths and injuries by event type

pophealthsum <- summarise(group_by(sdpophealth, EVTYPE), Total_Deaths = sum(FATALITIES), 
          Total_Injuries = sum(INJURIES), Average_Deaths = mean(FATALITIES), 
          Average_Injuries = mean(INJURIES))

Filter out the top averages and totals of deaths and injuries using logical relative thresholds for “top”

Thresholds were determined by raising the threshold value and rerunning the plotting code until the plots were uncluttered and easy to read

phsd <- filter(pophealthsum, Total_Deaths > 500)
phsad <- filter(pophealthsum, Average_Deaths > 5)
phsi <- filter(pophealthsum, Total_Injuries > 1500)
phsai <- filter(pophealthsum, Average_Injuries > 30)

These are the values that made the threshold:

##Total Deaths
phsd
## # A tibble: 6 × 5
##           EVTYPE Total_Deaths Total_Injuries Average_Deaths
##           <fctr>        <dbl>          <dbl>          <dbl>
## 1 EXCESSIVE HEAT         1903           6525    1.134088200
## 2    FLASH FLOOD          978           1777    0.018018682
## 3           HEAT          937           2100    1.221642764
## 4      LIGHTNING          816           5230    0.051796369
## 5        TORNADO         5633          91346    0.092874101
## 6      TSTM WIND          504           6957    0.002291534
## # ... with 1 more variables: Average_Injuries <dbl>
##Average Deaths
phsad
## # A tibble: 4 × 5
##                       EVTYPE Total_Deaths Total_Injuries Average_Deaths
##                       <fctr>        <dbl>          <dbl>          <dbl>
## 1              COLD AND SNOW           14              0      14.000000
## 2      RECORD/EXCESSIVE HEAT           17              0       5.666667
## 3 TORNADOES, TSTM WIND, HAIL           25              0      25.000000
## 4      TROPICAL STORM GORDON            8             43       8.000000
## # ... with 1 more variables: Average_Injuries <dbl>
##Total Injuries
phsi
## # A tibble: 8 × 5
##           EVTYPE Total_Deaths Total_Injuries Average_Deaths
##           <fctr>        <dbl>          <dbl>          <dbl>
## 1 EXCESSIVE HEAT         1903           6525    1.134088200
## 2    FLASH FLOOD          978           1777    0.018018682
## 3          FLOOD          470           6789    0.018558004
## 4           HEAT          937           2100    1.221642764
## 5      ICE STORM           89           1975    0.044366899
## 6      LIGHTNING          816           5230    0.051796369
## 7        TORNADO         5633          91346    0.092874101
## 8      TSTM WIND          504           6957    0.002291534
## # ... with 1 more variables: Average_Injuries <dbl>
##Average Injuries
phsai
## # A tibble: 3 × 5
##                  EVTYPE Total_Deaths Total_Injuries Average_Deaths
##                  <fctr>        <dbl>          <dbl>          <dbl>
## 1             Heat Wave            0             70           0.00
## 2 TROPICAL STORM GORDON            8             43           8.00
## 3            WILD FIRES            3            150           0.75
## # ... with 1 more variables: Average_Injuries <dbl>

Analysis Question 2 Data Processing:

Across the United States, which types of events have the greatest economic consequences?

Take only the columns relevent to this question

sdecon <- select(sda, EVTYPE, PROPDMG,
                           CROPDMG)

Take averages and sums of property damage and crop damage by event type

econsum <- summarise(group_by(sdecon, EVTYPE), Total_Property_Damage = sum(PROPDMG), 
          Total_Crop_Damage = sum(CROPDMG), Average_Property_Damage = mean(PROPDMG), 
          Average_Crop_Damage = mean(CROPDMG))

Filter out the top averages and totals of property and crop damage using logical relative thresholds for “top”

Thresholds were determined by raising the threshold value and rerunning the plotting code until the plots were uncluttered and easy to read

tpd <- filter(econsum, Total_Property_Damage > 750000)
apd <- filter(econsum, Average_Property_Damage > 500)
tcd <- filter(econsum, Total_Crop_Damage > 125000)
acd <- filter(econsum, Average_Crop_Damage > 250)

These are the values that made the threshold:

##Total Property Damage
tpd
## # A tibble: 5 × 5
##              EVTYPE Total_Property_Damage Total_Crop_Damage
##              <fctr>                 <dbl>             <dbl>
## 1       FLASH FLOOD             1420124.6         179200.46
## 2             FLOOD              899938.5         168037.88
## 3 THUNDERSTORM WIND              876844.2          66791.45
## 4           TORNADO             3212258.2         100018.52
## 5         TSTM WIND             1335965.6         109202.60
## # ... with 2 more variables: Average_Property_Damage <dbl>,
## #   Average_Crop_Damage <dbl>
##Average Property Damage
apd
## # A tibble: 4 × 5
##                   EVTYPE Total_Property_Damage Total_Crop_Damage
##                   <fctr>                 <dbl>             <dbl>
## 1        COASTAL EROSION                   766                 0
## 2   HEAVY RAIN AND FLOOD                   600                 0
## 3              Landslump                   570                 0
## 4 RIVER AND STREAM FLOOD                  1200                 0
## # ... with 2 more variables: Average_Property_Damage <dbl>,
## #   Average_Crop_Damage <dbl>
##Total Crop Damage
tcd
## # A tibble: 3 × 5
##        EVTYPE Total_Property_Damage Total_Crop_Damage
##        <fctr>                 <dbl>             <dbl>
## 1 FLASH FLOOD             1420124.6          179200.5
## 2       FLOOD              899938.5          168037.9
## 3        HAIL              688693.4          579596.3
## # ... with 2 more variables: Average_Property_Damage <dbl>,
## #   Average_Crop_Damage <dbl>
##Average Crop Damage
acd
## # A tibble: 4 × 5
##                  EVTYPE Total_Property_Damage Total_Crop_Damage
##                  <fctr>                 <dbl>             <dbl>
## 1 DUST STORM/HIGH WINDS                    50               500
## 2          FOREST FIRES                     5               500
## 3       HIGH WINDS/COLD                   610              2005
## 4 TROPICAL STORM GORDON                   500               500
## # ... with 2 more variables: Average_Property_Damage <dbl>,
## #   Average_Crop_Damage <dbl>

RESULTS

Analysis Question 1 Results:

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Plot of the top averages and sums of deaths and injuries

par(mfrow = c(2,2), mar = c(5,5,5,5), pch = 15, cex = 0.5, cex.axis = 0.8)
barplot(phsai$Average_Injuries, xlab = "Event", 
     ylab = "Average Injuries", main = "Top Average Injuries",
     names = phsai$EVTYPE)
barplot(phsi$Total_Injuries, xlab = "Event", 
     ylab = "Total Injuries", main = "Top Total Injuries",
     names = phsi$EVTYPE)
barplot(phsad$Average_Deaths, xlab = "Event", 
     ylab = "Average Deaths", main = "Top Average Deaths",
     names = phsad$EVTYPE)
barplot(phsd$Total_Deaths, xlab = "Event", 
     ylab = "Total Deaths", main = "Top Total Deaths",
     names = phsd$EVTYPE)

FIGURE 1

This plot clearly shows that EVTYPE = TORNADO is the most

harmful with respect to population health.

Analysis Question 2 Results:

Across the United States, which types of events have the greatest economic consequences?

Plot the top averages and sums of deaths and injuries

par(mfrow = c(2,2), mar = c(5,5,5,5), pch = 15, cex = 0.5, cex.axis = 0.8)
barplot(acd$Average_Crop_Damage, xlab = "Event", 
     ylab = "Average Crop Damage", main = "Top Average Crop Damage",
     names = acd$EVTYPE)
barplot(tcd$Total_Crop_Damage, xlab = "Event", 
     ylab = "Total Crop Damage", main = "Top Total Crop Damage",
     names = tcd$EVTYPE)
barplot(apd$Average_Property_Damage, xlab = "Event", 
     ylab = "Average Property Damage", main = "Top Average Property Damage",
     names = apd$EVTYPE)
barplot(tpd$Total_Property_Damage, xlab = "Event", 
     ylab = "Total Property Damage", main = "Top Total Property Damage",
     names = tpd$EVTYPE)

FIGURE2

This plot shows that EVTYPE = TORNADO and EVTYPE = HAIL have th greatest econmic consequences.

CONCLUSIONS

Table of results:

Category of Devistation Event Type
Greatest Total Deaths TORNADO
Greatest Average Deaths TORNADO, TSTM WIND, HAIL
Greatest Total Injuries TORNADO
Greatest Average Injuries Heat Wave
Greatest Total Property Damage TORNADO
Greatest Average Propert Damage COASTAL EROSION
Greatest Total Crop Damage HAIL
Greatest Average Crop Damage DUST STORM + FOREST FIRES + TROPICAL STORM GORDAN

IN CONCLUSION

Tornados are the most harmful storm event to population health based on the fact that they cause the greatest total number of fatalaties. They also have the greatest economic consequences based on the fact that they cause the greatest amount of total property damage. According to this analysis, tornados are the most catestrophic storm event in terms of population health and economic consequences.