This document analyzes damage caused by different storm types in the United States that occurred from 1950 to 1992. It does so first by analyzing the total damage to population health, defined as injuries and fatalities, caused by each storm and then doing the same for economic damage, defined as property damage and crop damage.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr)
stormData <- read.csv("/Users/samistvan/Downloads/repdata_data_StormData.csv")
Now that the data is loaded in let’s take a look at what type of information it gives us
head(stormData)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
Each observation in this dataset represents certain characteristics of the effects of a storm, including location, time, injuries, fatalities, property damage, and crop damage.
To analyze the consequences of a storm on population health, we will combine fatalities and injuries to give us a sense of the number of individuals adversely impacted by the storm.
stormData <- stormData %>% mutate(pop_casualties = INJURIES + FATALITIES)
Next, we’ll group the data by event type and pop_health, and then summarize the data by the total number of injuries and fatalities that each type of storm has caused.
casualties <- stormData %>% group_by(EVTYPE) %>% summarize(pop_casualties = sum(pop_casualties), FATALITIES = sum(FATALITIES),INJURIES = sum(INJURIES)) %>% select(EVTYPE, pop_casualties, INJURIES, FATALITIES) %>% arrange(desc(pop_casualties))
top10 <- casualties[c(1:10),]
top10
## # A tibble: 10 × 4
## EVTYPE pop_casualties INJURIES FATALITIES
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 96979 91346 5633
## 2 EXCESSIVE HEAT 8428 6525 1903
## 3 TSTM WIND 7461 6957 504
## 4 FLOOD 7259 6789 470
## 5 LIGHTNING 6046 5230 816
## 6 HEAT 3037 2100 937
## 7 FLASH FLOOD 2755 1777 978
## 8 ICE STORM 2064 1975 89
## 9 THUNDERSTORM WIND 1621 1488 133
## 10 WINTER STORM 1527 1321 206
This data shows us that tornadoes are by far the most damaging type of storm to population health, causing far more casualties, both injuries and fatalities, than any other type of storm. After tornadoes, excessive heat is responsible for the seocond most casualties. The graph below, showing the storm types with the 10 most casualties, helps visualize this.
top10_pivoted <- pivot_longer(top10[,-2], cols = c(INJURIES, FATALITIES), names_to = "casualty_type")
p <- ggplot(top10_pivoted, aes(fill = casualty_type, x = EVTYPE, y = value)) + geom_bar(stat = "identity",position = "stack")
p + theme(axis.text.x = element_text(angle = 45, hjust = 0.75)) + labs(title = "Casualties By Storm Type", x = "Storm Type", y = "Casualties", fill = "Casualty Type")
Next we’ll look at the economic consequences of each type of storm. Similar to how we estimated population health consequences by combining injuries and fatalities, here we will combine property damage and crop damage to get an estimate of the total economic consequences of a storm
stormData <- stormData %>% mutate(econ_damage = PROPDMG + CROPDMG)
econ_damage <- stormData %>% group_by(EVTYPE) %>% summarize(econ_damage = sum(econ_damage), Property_Damage = sum(PROPDMG),Crop_Damage = sum(CROPDMG)) %>% select(EVTYPE, econ_damage, Property_Damage, Crop_Damage) %>% arrange(desc(econ_damage))
top10econ <- econ_damage[c(1:10),]
top10econ
## # A tibble: 10 × 4
## EVTYPE econ_damage Property_Damage Crop_Damage
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 3312277. 3212258. 100019.
## 2 FLASH FLOOD 1599325. 1420125. 179200.
## 3 TSTM WIND 1445168. 1335966. 109203.
## 4 HAIL 1268290. 688693. 579596.
## 5 FLOOD 1067976. 899938. 168038.
## 6 THUNDERSTORM WIND 943636. 876844. 66791.
## 7 LIGHTNING 606932. 603352. 3581.
## 8 THUNDERSTORM WINDS 464978. 446293. 18685.
## 9 HIGH WIND 342015. 324732. 17283.
## 10 WINTER STORM 134700. 132721. 1979.
Grouping the data by each storm type and summing up the economic damage shows us that once again, tornadoes are at the top, causing the most economic damage of any other storm type. This time, the second most damaging type of storm is flash flooding, causing a significant amount of property damage but also the most crop damage of any other storm, including tornadoes. The graph below depicts these findings.
top10econ <- rename(top10econ, Property = Property_Damage)
top10econ <- rename(top10econ, Crops = Crop_Damage)
top10econ_pivoted <- pivot_longer(top10econ[,-2], cols = c(Property, Crops), names_to = "damage_type")
top10econ_pivoted$value <- top10econ_pivoted$value/1000
top10econ_pivoted <- rename(top10econ_pivoted, value_thousands = value)
p <- ggplot(top10econ_pivoted, aes(fill = damage_type, x = EVTYPE, y = value_thousands)) + geom_bar(stat = "identity",position = "stack")
p + theme(axis.text.x = element_text(angle = 45, hjust = 0.75)) + labs(title = "Economic Damage By Storm Type", x = "Storm Type", y = "Damage (Thousands)", fill = "Damage Type")