In this analysis we wanted to analyse which weather events cause the most harm to humans and property to help officials to judge the severity and to prepare appropriate responses to severe weather events. For this we looked at the storm database from the U.S. National Oceanic and Atmospheric Administration (NOAA). This database collects major weather events and their impact. The datasets includes data from 1950 to 2011. We sorted the data by their event type and summarised which events caused the most economic damage (crop and property damage) as well as harm to human health (casualties and injuries).
The data is available online and a description can be found here.
The data shows that floods, hurricanes/typhoons and tornados are the cause for the most economic damage while tornados, excessive heat and thunderstorm winds result in the most injuries and deaths.
To process and visualise the data we used the following libraries.
library(data.table)
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
##
## between, first, last
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
The data was downloaded and loaded into the R workspace.
# Download data
if(!file.exists("StormData.csv.bz2")){
URL<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(URL, destfile = "StormData.csv.bz2")
}
#load data as data.table
if(!exists("StormData")){
StormDataDF <- read.csv("~/RProgramming/Reproducable Research/CourseProject2/StormData.csv.bz2")
StormData <- as.data.table(StormDataDF)
rm(StormDataDF)
}
To save memory we selected only the data of interest. We also replaced the strings data in the exponent data to simplify the calculation of economic damage.
StormDataTidy <- StormData %>%
select(EVTYPE,FATALITIES,INJURIES,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP) %>%
mutate(PROPDMGEXP = case_match(PROPDMGEXP, c("", "-", "+", "?") ~ 0,
"B" ~ 9,
c("h", "H") ~ 2,
"K" ~ 3,
c("m", "M") ~ 6,
.default = 0)) %>%
mutate(CROPDMGEXP = case_match(CROPDMGEXP, c("?") ~ 0,
"B" ~ 9,
"K" ~ 3,
c("m", "M") ~ 6,
.default = 0))
str(StormDataTidy)
## Classes 'data.table' and 'data.frame': 902297 obs. of 7 variables:
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: num 3 3 3 3 3 3 3 3 3 3 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: num 0 0 0 0 0 0 0 0 0 0 ...
## - attr(*, ".internal.selfref")=<externalptr>
To calculate the the sum of injuries and deaths caused by weather events we created a new data.table. We also sorted and only collected the data for the ten event types with the most impact. We then prepared the data for easier plotting by creating a factor to sort for injuries and deaths.
# Summarise the injury and fatality numbers by EVTYPE
TotalInjury <- StormDataTidy %>%
group_by(EVTYPE) %>%
summarise(TOTAL = sum(FATALITIES, INJURIES),
FATALITIES = sum(FATALITIES),
INJURIES = sum(INJURIES)) %>%
slice_max(n = 10, order_by = TOTAL)
TotalInjury
## # A tibble: 10 × 4
## EVTYPE TOTAL FATALITIES INJURIES
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 96979 5633 91346
## 2 EXCESSIVE HEAT 8428 1903 6525
## 3 TSTM WIND 7461 504 6957
## 4 FLOOD 7259 470 6789
## 5 LIGHTNING 6046 816 5230
## 6 HEAT 3037 937 2100
## 7 FLASH FLOOD 2755 978 1777
## 8 ICE STORM 2064 89 1975
## 9 THUNDERSTORM WIND 1621 133 1488
## 10 WINTER STORM 1527 206 1321
TotalInjury <- TotalInjury %>% pivot_longer(cols = c(FATALITIES, INJURIES), names_to = "CATEGORY", values_to = "COUNT")
To calculate the monetary damage we created a data.table. We sorted and collected the data for the ten event types which caused the most damage. We then prepared the data for easier plotting by creating a factor to sort for property and crop damage.
# Summarise Damage Data by Eventtype
TotalDamage <- StormDataTidy %>%
group_by(EVTYPE) %>%
mutate(CROPTOTAL = CROPDMG * 10^CROPDMGEXP,
PROPTOTAL = PROPDMG * 10^PROPDMGEXP) %>%
summarise(TOTAL = sum(PROPTOTAL, CROPTOTAL),
CROPDMG = sum(CROPTOTAL),
PROPDMG = sum(PROPTOTAL)) %>%
slice_max(n = 10, order_by = TOTAL)
TotalDamage
## # A tibble: 10 × 4
## EVTYPE TOTAL CROPDMG PROPDMG
## <chr> <dbl> <dbl> <dbl>
## 1 FLOOD 150319678257 5661968450 144657709807
## 2 HURRICANE/TYPHOON 71913712800 2607872800 69305840000
## 3 TORNADO 57352114049. 414953270 56937160779.
## 4 STORM SURGE 43323541000 5000 43323536000
## 5 HAIL 18757805433. 3025537890 15732267543.
## 6 FLASH FLOOD 17562129167. 1421317100 16140812067.
## 7 DROUGHT 15018672000 13972566000 1046106000
## 8 HURRICANE 14610229010 2741910000 11868319010
## 9 RIVER FLOOD 10148404500 5029459000 5118945500
## 10 ICE STORM 8967041360 5022113500 3944927860
TotalDamage <- TotalDamage %>% pivot_longer(cols = c(CROPDMG, PROPDMG), names_to = "CATEGORY", values_to = "COUNT")
To visualise the data we plotted both newly created data.tables TotalInjuries and TotalDamage as a barplot using ggplot.
First we looked at the impact on health.
ggplot(TotalInjury, aes(reorder(EVTYPE, -TOTAL), COUNT, fill = CATEGORY)) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle=45, hjust=1)) +
ylab("Casualties") +
xlab("Event Type") +
scale_fill_discrete(labels=c("Fatalities","Injuries")) +
labs(title = "Most dangerous weather events to human health")
The graph shows the number of casualties, both fatalities and injuries, by event type. We could show that tornados were the most dangerous event type between 1950 and 2011 with 96,979 injuries and deaths followed by excessive heat and thunderstorm winds at around 10,000. Injuries were more common in all event types.
Next we looked at the monetary damage.
ggplot(TotalDamage, aes(reorder(EVTYPE, -COUNT), COUNT, fill = CATEGORY)) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle=45, hjust=1)) +
ylab("Damage (Dollars)") +
xlab("Event Type") +
scale_fill_discrete(labels=c("Crop Damage","Property Damage")) +
labs(title = "Most costly weather events")
The graph shows floods were the leading cause for economic damage from 1950 to 2011 with 150,319,678,257$ (150 billion $). They were followed by hurricane/typhoons and tornados.
In this analysis we were able to show that historically tornados are the most dangerous weather events to human lives. They are also a major cause for economic damages though they fall behind floods and hurricanes/typhoons. This data could help officials to appropriately judge the severity of weather events.