Final Course Project: Analyzing Health and Economic Consequences of Storms

Synopsis

This document analyzes damage caused by different storm types in the United States that occurred from 1950 to 1992. It does so first by analyzing the total damage to population health, defined as injuries and fatalities, caused by each storm and then doing the same for economic damage, defined as property damage and crop damage.

Data Processing

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr)
stormData <- read.csv("/Users/samistvan/Downloads/repdata_data_StormData.csv")

Now that the data is loaded in let’s take a look at what type of information it gives us

head(stormData)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6

Each observation in this dataset represents certain characteristics of the effects of a storm, including location, time, injuries, fatalities, property damage, and crop damage.

Results

To analyze the consequences of a storm on population health, we will combine fatalities and injuries to give us a sense of the number of individuals adversely impacted by the storm.

stormData <- stormData %>% mutate(pop_casualties = INJURIES + FATALITIES)

Next, we’ll group the data by event type and pop_health, and then summarize the data by the total number of injuries and fatalities that each type of storm has caused.

casualties <- stormData %>% group_by(EVTYPE) %>% summarize(pop_casualties = sum(pop_casualties), FATALITIES = sum(FATALITIES),INJURIES = sum(INJURIES)) %>% select(EVTYPE, pop_casualties, INJURIES, FATALITIES) %>% arrange(desc(pop_casualties))

top10 <- casualties[c(1:10),]

top10
## # A tibble: 10 × 4
##    EVTYPE            pop_casualties INJURIES FATALITIES
##    <chr>                      <dbl>    <dbl>      <dbl>
##  1 TORNADO                    96979    91346       5633
##  2 EXCESSIVE HEAT              8428     6525       1903
##  3 TSTM WIND                   7461     6957        504
##  4 FLOOD                       7259     6789        470
##  5 LIGHTNING                   6046     5230        816
##  6 HEAT                        3037     2100        937
##  7 FLASH FLOOD                 2755     1777        978
##  8 ICE STORM                   2064     1975         89
##  9 THUNDERSTORM WIND           1621     1488        133
## 10 WINTER STORM                1527     1321        206

This data shows us that tornadoes are by far the most damaging type of storm to population health, causing far more casualties, both injuries and fatalities, than any other type of storm. After tornadoes, excessive heat is responsible for the seocond most casualties. The graph below, showing the storm types with the 10 most casualties, helps visualize this.

top10_pivoted <- pivot_longer(top10[,-2], cols = c(INJURIES, FATALITIES), names_to = "casualty_type")
p <- ggplot(top10_pivoted, aes(fill = casualty_type, x = EVTYPE, y = value)) + geom_bar(stat = "identity",position = "stack")

p + theme(axis.text.x = element_text(angle = 45, hjust = 0.75)) + labs(title = "Casualties By Storm Type", x = "Storm Type", y = "Casualties", fill = "Casualty Type") 

Next we’ll look at the economic consequences of each type of storm. Similar to how we estimated population health consequences by combining injuries and fatalities, here we will combine property damage and crop damage to get an estimate of the total economic consequences of a storm

stormData <- stormData %>% mutate(econ_damage = PROPDMG + CROPDMG)

econ_damage <- stormData %>% group_by(EVTYPE) %>% summarize(econ_damage = sum(econ_damage), Property_Damage = sum(PROPDMG),Crop_Damage = sum(CROPDMG)) %>% select(EVTYPE, econ_damage, Property_Damage, Crop_Damage) %>% arrange(desc(econ_damage))

top10econ <- econ_damage[c(1:10),]

top10econ
## # A tibble: 10 × 4
##    EVTYPE             econ_damage Property_Damage Crop_Damage
##    <chr>                    <dbl>           <dbl>       <dbl>
##  1 TORNADO               3312277.        3212258.     100019.
##  2 FLASH FLOOD           1599325.        1420125.     179200.
##  3 TSTM WIND             1445168.        1335966.     109203.
##  4 HAIL                  1268290.         688693.     579596.
##  5 FLOOD                 1067976.         899938.     168038.
##  6 THUNDERSTORM WIND      943636.         876844.      66791.
##  7 LIGHTNING              606932.         603352.       3581.
##  8 THUNDERSTORM WINDS     464978.         446293.      18685.
##  9 HIGH WIND              342015.         324732.      17283.
## 10 WINTER STORM           134700.         132721.       1979.

Grouping the data by each storm type and summing up the economic damage shows us that once again, tornadoes are at the top, causing the most economic damage of any other storm type. This time, the second most damaging type of storm is flash flooding, causing a significant amount of property damage but also the most crop damage of any other storm, including tornadoes. The graph below depicts these findings.

top10econ <- rename(top10econ, Property = Property_Damage)
top10econ <- rename(top10econ, Crops = Crop_Damage)

top10econ_pivoted <- pivot_longer(top10econ[,-2], cols = c(Property, Crops), names_to = "damage_type")

top10econ_pivoted$value <- top10econ_pivoted$value/1000
top10econ_pivoted <- rename(top10econ_pivoted, value_thousands = value)

p <- ggplot(top10econ_pivoted, aes(fill = damage_type, x = EVTYPE, y = value_thousands)) + geom_bar(stat = "identity",position = "stack")

p + theme(axis.text.x = element_text(angle = 45, hjust = 0.75)) + labs(title = "Economic Damage By Storm Type", x = "Storm Type", y = "Damage (Thousands)", fill = "Damage Type")