The data concering the adverse impact on the health and economic well-being of the US population between 1950 and 2011 illustrates a complex picture. After examining the data, it is clear that, cumulatively, tornadoes have been responsible for more deaths and injuries by a wide margin - more than the next five weather events combined. However, on a per event Heat is responsible for deaths and injuries than any other event. From an economic perspective, cumulatively, flooding has caused higher property damage costs than other weather events by a wide margin. Also, cumulatively, drought has caused higher crop damage costs than other events by a significant margin. On a per event basis, excluding exceptionally rare events (i.e. occuring less than 10 times), the cost per event tells a different story. On a per event basis, typhoons have cost the most in terms of property and crop damage on a per event basis.
The data for this analysis was sourced from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. Packages were used to perform data processing, transformation and analysis. The data was sourced into R by downloading the csv.bz2 file via URL and then using the base R read.csv() function. To facilitate data processing and transformation, multiple packages were installed and loaded: tidyverse (for data tidying and manipulation), ggplot2 (for data visualization), and forcats (for factor recoding)
if(!file.exists("weatherevents.csv")) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile="weatherevents.csv", method="curl");
}
df <- read.csv("weatherevents.csv")
list.of.packages <- c("tidyverse", "ggplot2", "forcats")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[, "Package"])]
if(length(new.packages)) install.packages(new.packages)
library(tidyverse); library(ggplot2); library(forcats)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
To examine the effects of weather events on population health, the data was grouped by event type (e.g. Tornado, Flood, etc.) and the cumulative number of deaths and injuries for each event type was calculated. In addition, the average number of deaths and injuries for each individual was calculated to examine which events have the greatest impact on health on a per event basis. A subset of the event types that resulted in the 10 highest cumulative death rates was extracted for further analysis.
healthEffect <- df %>% group_by(type=EVTYPE) %>% summarize(events=length(EVTYPE), deaths=sum(FATALITIES), injuries=sum(INJURIES), deathByEvent=deaths/events, injuryByEvent=injuries/events) %>% arrange(desc(deaths)) %>% head(10)
Deaths and injuries data was combined to examine the cumulative casuality rate by event type.
casualtyEffect <- healthEffect %>% gather(deaths, injuries, key="casualtyType", value="casualties") %>% group_by(type) %>% arrange(desc(casualties,type))
The graph below illustrates the highest cumulative casualty rates among the 10 deadliest weather event types. By far, tornadoes have resulted in the greatest number of deaths and injuries.
ggplot(casualtyEffect, aes(x=reorder(type,-casualties), y=casualties,fill=casualtyType)) + geom_bar(stat="identity") + labs(x="Event", y="Casualties") + ggtitle("Casualties by Event (Deaths & Injuries)") + theme(axis.text.x=element_text(angle = -90, hjust = 0)) + scale_fill_discrete(name="", labels=c("Deaths","Injuries"))
While tornadoes may have resulted in the highest number of total deaths and injuries, heat and excessive heat are more detrimental to public health on a per event basis.
The table below illustrates the 10 deadliest weather events on a per event basis. As you can see, Heat, Rip Currents, and Avalanches are far more lethal events:
healthEffect %>% select(type, events, deathByEvent) %>% arrange(desc(deathByEvent)) %>% head(10)
## # A tibble: 10 x 3
## type events deathByEvent
## <fctr> <int> <dbl>
## 1 HEAT 767 1.221642764
## 2 EXCESSIVE HEAT 1678 1.134088200
## 3 RIP CURRENT 470 0.782978723
## 4 AVALANCHE 386 0.580310881
## 5 TORNADO 60652 0.092874101
## 6 LIGHTNING 15754 0.051796369
## 7 FLOOD 25326 0.018558004
## 8 FLASH FLOOD 54277 0.018018682
## 9 HIGH WIND 20212 0.012269939
## 10 TSTM WIND 219940 0.002291534
And the table below illustrates the events that result in the most injuries per event. As you can see, again, Heat has the most adverse effect:
healthEffect %>% select(type, events, injuryByEvent) %>% arrange(desc(injuryByEvent)) %>% head(10)
## # A tibble: 10 x 3
## type events injuryByEvent
## <fctr> <int> <dbl>
## 1 EXCESSIVE HEAT 1678 3.88855781
## 2 HEAT 767 2.73794003
## 3 TORNADO 60652 1.50606740
## 4 RIP CURRENT 470 0.49361702
## 5 AVALANCHE 386 0.44041451
## 6 LIGHTNING 15754 0.33197918
## 7 FLOOD 25326 0.26806444
## 8 HIGH WIND 20212 0.05625371
## 9 FLASH FLOOD 54277 0.03273947
## 10 TSTM WIND 219940 0.03163135
To examine the effects of weather on US economic health, some data tidying was required prior to analysis. First, a subset of data containing event types, property damage and crop damage was extracted. Then, property and crop damage exponent data (e.g. “B” for billion, “5”, for one hundred thousand, “K” for thousand, etc.) was recoded corresponding to its numerical value. Property and crop damage data was multiplied by the converted exponent value to obtain the actual property and crop damage figures. The data was then grouped by event type and the cumulative property and crop damage costs for each event type was calculated. In addition, the average property damange and crop damage costs per event was calculated to examine which events have the greatest impact on health on a per event basis. Subsets of the event types that resulted in the 10 highest costs in property damage and crop damage rates was extracted for further analysis.
econEffect <- df %>% select(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
econEffect1 <- econEffect %>% mutate(propDmgExp=fct_collapse(PROPDMGEXP, "1000000000"="B", "100000000"="8", "10000000"="7", "1000000"=c("M","6"), "100000"="5", "10000"="4", "1000"=c("K","3"), "100"="2", "10"="1", "1"=c("?","-","+","0","")))
econEffect2 <- econEffect1 %>% mutate(cropDmgExp=fct_collapse(CROPDMGEXP, "1000000000"="B", "1000000"=c("M","m"), "1000"=c("K","k"), "100"="2", "1"=c("0","?","")))
econEffect3 <- econEffect2 %>% mutate(propDmgExp=as.numeric(as.character(propDmgExp)), cropDmgExp=as.numeric(as.character(cropDmgExp)), propDmg=PROPDMG * propDmgExp, cropDmg=CROPDMG * cropDmgExp)
## Warning in eval(substitute(expr), envir, enclos): NAs introduced by
## coercion
propEffect <- econEffect3 %>% group_by(type=EVTYPE) %>% summarize(events=length(EVTYPE), propDmg=sum(propDmg, na.rm=TRUE), cropDmg=sum(cropDmg, na.rm=TRUE), propDmgByEvent=propDmg/events, cropDmgByEvent=cropDmg/events) %>% arrange(desc(propDmg)) %>% head(10)
cropEffect <- econEffect3 %>% group_by(type=EVTYPE) %>% summarize(events=length(EVTYPE), propDmg=sum(propDmg, na.rm=TRUE), cropDmg=sum(cropDmg, na.rm=TRUE), propDmgByEvent=propDmg/events, cropDmgByEvent=cropDmg/events) %>% arrange(desc(cropDmg)) %>% head(10)
The first graph below illustrates the 10 highest cumulative property damage costs by weather event types; By far, floods have resulted in the greatest amount of property damage in terms of economic costs. The second graph below illustrates the 10 highest cumulative crop damage costs by weather event types; By far, droughts have resulted in the greatest amount of crop damage in terms of economic costs.
ggplot(propEffect, aes(x=reorder(type, -propDmg), y=propDmg, fill=type)) + geom_bar(stat="identity") + labs(x="Event", y="Damage (in Billions)") + ggtitle("Property Damage by Event") + theme(axis.text.x=element_text(angle = -90, hjust = 0), legend.position ="none")
ggplot(cropEffect, aes(x=reorder(type, -cropDmg), y=cropDmg, fill=type)) + geom_bar(stat="identity") + labs(x="Event", y="Damage (in Billions)") + ggtitle("Crop Damage by Event") + theme(axis.text.x=element_text(angle = -90, hjust = 0), legend.position ="none")
Similar to health effects data, on a per individual event basis the data tells a different story. The table below illustrates the 10 most costly weather events on a per event basis in terms of property damage. The data has been filtered to account for events that have occurred more than 10 times to exclude rare events. As you can see, Typhoons, Storm Surges and Severe Thunderstorms, although infrequent, have had disproportionately adverse property damage effects on a per event basis.
propDamageByEvent <- econEffect3 %>% group_by(type=EVTYPE) %>% summarize(events=length(EVTYPE), propDmg=sum(propDmg, na.rm=TRUE), cropDmg=sum(cropDmg, na.rm=TRUE), propDmgByEvent=propDmg/events, cropDmgByEvent=cropDmg/events) %>% arrange(desc(propDmgByEvent))
propDamageByEvent %>% select(type, events, propDmgByEvent) %>% filter(events > 10) %>% arrange(desc(propDmgByEvent)) %>% head(10)
## # A tibble: 10 x 3
## type events propDmgByEvent
## <fctr> <int> <dbl>
## 1 HURRICANE/TYPHOON 88 787566364
## 2 STORM SURGE 261 165990559
## 3 SEVERE THUNDERSTORM 13 92720000
## 4 HURRICANE 174 68208730
## 5 TYPHOON 11 54566364
## 6 STORM SURGE/TIDE 148 31359378
## 7 RIVER FLOOD 173 29589280
## 8 FLASH FLOOD/FLOOD 22 12384091
## 9 TROPICAL STORM 690 11165059
## 10 TSUNAMI 20 7203100
The table below illustrates the 10 most costly weather events on a per event basis in terms of crop damage. The data has been filtered to account for events that have occurred more than 10 times to exclude rare events. Typhoons, River Floods, and Hurricanes, although infrequent, have had disproportionately adverse crop damage effects on a per event basis.
cropDamageByEvent <- econEffect3 %>% group_by(type=EVTYPE) %>% summarize(events=length(EVTYPE), propDmg=sum(propDmg, na.rm=TRUE), cropDmg=sum(cropDmg, na.rm=TRUE), propDmgByEvent=propDmg/events, cropDmgByEvent=cropDmg/events) %>% arrange(desc(cropDmgByEvent))
cropDamageByEvent %>% select(type, events, cropDmgByEvent) %>% filter(events > 10) %>% arrange(desc(cropDmgByEvent)) %>% head(10)
## # A tibble: 10 x 3
## type events cropDmgByEvent
## <fctr> <int> <dbl>
## 1 HURRICANE/TYPHOON 88 29634918
## 2 RIVER FLOOD 173 29072017
## 3 HURRICANE 174 15758103
## 4 FREEZE 74 6030068
## 5 DROUGHT 2488 5615983
## 6 ICE STORM 2006 2503546
## 7 HEAVY RAINS 26 2326923
## 8 EXTREME COLD 655 1974005
## 9 FROST 53 1245283
## 10 UNSEASONABLY COLD 23 1088804