Storms and other severe weather events regularly cause public health disasters and economic hardships. These events often result in fatalities, injuries, and property damage.
This analysis is a brief exploration of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks when and where events occur along with estimated impacts. The data can be found here: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2.
The primary goal of this analysis is to answer the following two questions: 1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? 2. Across the United States, which types of events have the greatest economic consequences?
The first step is to load the data. In this analysis I chose to use the tidyverse family of packages for data processing, including dplyr, readr and ggplot2. For plotting I used ggplot2.
Using fatalities and injuries the data was aggregated using the {tidyverse} family of R packages. Sorting by fatalities first then injuries and eliminating event without documented damage to human population.
This segment of code selects the relevant data columns, aggregates fatalities and injuries then sorts and selects the top 10.
##Selects the columns which include type of disaster, fatality and injury counts.
storm_h <- storm %>% select(c("EVTYPE","FATALITIES" ,"INJURIES"))
##Using dplyr, I aggregate the sums for fatalities and injuries by event type and sorts by fatality count then by injury count (highest first).
health_agg <- storm_h %>% group_by(EVTYPE) %>%
summarize(Total_Fatalities = sum(FATALITIES),
Total_Injuries = sum(INJURIES)) %>%
arrange(desc(Total_Fatalities),desc(Total_Injuries))
##Considers only the top 10 most damaging events to human populations.
health_top10 <- health_agg %>%
head(10)
This segment of code selects the relevant data columns, aggregates property damage and crop damage and then selects the top 25.
##Selects the columns with type of disaster, property damage and crop damage.
storm_e <- storm %>% select(c("EVTYPE", "PROPDMG" , "CROPDMG"))
##Aggregates damage amounts by event type.
econ_agg <- storm_e %>% group_by(EVTYPE) %>%
summarize(Total_Prop_Damage = sum(PROPDMG),
Total_Crop_Damage = sum(CROPDMG)) %>%
arrange(desc(Total_Prop_Damage),desc(Total_Crop_Damage))
##Selects 25 event types with highest damage amounts
econ_top25 <- econ_agg[1:25,]
The primary objective of this code is to create useful factors for graphing. The first portion deals with injury/fatality factorization. The 2nd segment assigns an order to the 10 most dangerous events.
health_top10
## # A tibble: 10 x 3
## EVTYPE Total_Fatalities Total_Injuries
## <chr> <dbl> <dbl>
## 1 TORNADO 5633 91346
## 2 EXCESSIVE HEAT 1903 6525
## 3 FLASH FLOOD 978 1777
## 4 HEAT 937 2100
## 5 LIGHTNING 816 5230
## 6 TSTM WIND 504 6957
## 7 FLOOD 470 6789
## 8 RIP CURRENT 368 232
## 9 HIGH WIND 248 1137
## 10 AVALANCHE 224 170
##Reformats the data, which allows use of the injury/fatality distinction as a factor rather than a column.
IncidentRank <- health_top10 %>% pull(EVTYPE)
health_top10p <- health_top10 %>% pivot_longer(Total_Fatalities:Total_Injuries,names_to = "Type",values_to = "Value")
health_top10p <- health_top10p %>% mutate(EVTYPE = factor(EVTYPE,levels=IncidentRank))
head(health_top10p) %>% kable()
EVTYPE | Type | Value |
---|---|---|
TORNADO | Total_Fatalities | 5633 |
TORNADO | Total_Injuries | 91346 |
EXCESSIVE HEAT | Total_Fatalities | 1903 |
EXCESSIVE HEAT | Total_Injuries | 6525 |
FLASH FLOOD | Total_Fatalities | 978 |
FLASH FLOOD | Total_Injuries | 1777 |
The primary objective of this code is to create useful factors for graphing. The first portion deals with property/crop damage factorization. The 2nd segment assigns an order to the 25 most dangerous events. The Damage values will be transformed into a variable denominated in millions of dollars, for the sake of legibility.
#For later use
IncidentRank <- econ_top25 %>% pull(EVTYPE)
econ_top25p <- econ_top25 %>% pivot_longer(Total_Prop_Damage:Total_Crop_Damage,names_to = "Type",values_to = "Value")
econ_top25p <- econ_top25p %>% mutate(EVTYPE = factor(EVTYPE,levels=IncidentRank))
#Convert to Millions
econ_top25p <- econ_top25p %>% mutate(Value = Value/100000)
This section briefly presents the most dangerous and costly severe weather events based on the data set.
The following graph shows the 10 most dangerous disasters in terms of injuries and fatalities. The graph is ordered from left to right starting with the highest fatality rate.
The following graph shows the 25 most costly disasters in terms of property and crop damage. The graph is ordered from left to rate starting with the most property damage.
This analysis addresses the question of which types of events are most harmful to population health, and which types of events have the greatest economic consequences?
Based on these results we can conclusively state that tornadoes are the most serious cause of concern in both counts, dwarving many other incident types in terms of severity.
In terms of economic damage, floods (including Flash floods) and wind damage are also of serious concern.