The U.S. National Oceanic and Atmospheric Administration’s (NOAA) has a database who tracks severe weather events in the U.S., including where and when the event ocurred, as well as an estimate of injuries, fatalities and property damages.
The database used for this analysis contains 902,297 records of severe wheather events, starting in 1950 and ending in November, 2011. The database is available at the following link:
NOAA’s documentation provides guidelines for entering event types and estimate property damages. 48 Storm Data Events are defined in the documentation, available in the following link:
Useful information regarding how the data is collected and published is available in the following document:
The aim of the analysis is to answer two questions regarding the health of the population and the economic consequences as a result of these events:
library(ggplot2)
library(stringr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
my_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(my_url, destfile="StormData.csv.bz2")
my_df <- read.csv("StormData.csv.bz2")
The events are captured by different users, resulting in differences in the event type (EVTYPE) compared to those defined by NOAA. For instance, “Thunderstorm Wind” appears as “Thunderstormw”, “TSTM Wind”, “Thundertorm”, to name a few.
To clen the data, specifically the EVTYPE, some tyding was made according to the following criteria:
As for the Property Damages, the amount in US Dollars is the combination of the numeric value PROPDMG by the factor PROPDMGEXP, which can be thousands (K), millions (M,m) or billions (B) of dolars. Any other value was converted to a factor of 1.
ty_df <- my_df %>%
select(EVTYPE, INJURIES, FATALITIES, PROPDMG, PROPDMGEXP) %>%
mutate(EVTYPE=toupper(EVTYPE)) %>%
mutate(EVTYPE=str_replace_all(EVTYPE, c("AVALANCE"="AVALANCHE", "LIGHTNING."="LIGHTNING", "WINDSS"="WIND", "THUNDERSTORMW"="THUNDERSTORM WIND", "THUNDERTORM"="THUNDERSTORM"))) %>%
mutate(EVTYPE=str_replace_all(EVTYPE, c("WINDS"="WIND", "STORMS"="STORM", "FLOODS"="FLOOD", "RAINS"="RAIN", "SLIDES"="SLIDE", "THUNDERSTORMS"="THUNDERSTORM"))) %>%
mutate(EVTYPE=str_replace_all(EVTYPE, c("FLOODING"="FLOOD", "TSTM"="THUNDERSTORM"))) %>%
mutate(EVTYPE=str_replace_all(EVTYPE, c("G40"="", "G52"="", "13"="", "G35"="", "G45"="", "F2"="", "F3"=""))) %>%
mutate(EVTYPE=str_replace_all(EVTYPE, c("[[(]][[)]]"="", "[[ ]]$"="", "^[[ ]]"="", "[[ ]][[ ]]"=""))) %>%
mutate(EVTYPE=replace(EVTYPE, str_detect(EVTYPE, "^HURRICANE"), "HURRICANE (TYPHOON)")) %>%
mutate(EVTYPE=replace(EVTYPE, str_detect(EVTYPE, "^TYPHOON"), "HURRICANE (TYPHOON)")) %>%
mutate(EVTYPE=replace(EVTYPE, str_detect(EVTYPE, "FLOOD$"), "FLOOD")) %>%
mutate(PROPDMGEXP=str_replace_all(PROPDMGEXP, "[012345678Hh]", "1")) %>%
mutate(PROPDMGEXP=str_replace_all(PROPDMGEXP, c("[[+]]"="1", "[[-]]$"="1", "[[?]]$"="1", "^$"="1"))) %>%
mutate(PROPDMGEXP=str_replace_all(PROPDMGEXP, c("K"="1000", "M"="1000000", "B"="1000000000", "m"="1000000"))) %>%
mutate(PROPDMGEXP=as.numeric(PROPDMGEXP))
hh_df <- ty_df %>%
select(EVTYPE, INJURIES, FATALITIES) %>%
group_by(EVTYPE) %>%
summarize(SUM.INJURIES=sum(INJURIES), SUM.FATALITIES=sum(FATALITIES)) %>%
filter(SUM.INJURIES!=0 | SUM.FATALITIES!=0) %>%
arrange(desc(SUM.INJURIES), desc(SUM.FATALITIES))
head(hh_df)
## # A tibble: 6 x 3
## EVTYPE SUM.INJURIES SUM.FATALITIES
## <chr> <dbl> <dbl>
## 1 TORNADO 91364 5633
## 2 THUNDERSTORM WIND 9390 705
## 3 FLOOD 8599 1523
## 4 EXCESSIVE HEAT 6525 1903
## 5 LIGHTNING 5230 817
## 6 HEAT 2100 937
plot1 <- ggplot(top_n(hh_df, 10), aes(x=reorder(EVTYPE, SUM.INJURIES), y=SUM.INJURIES)) +
geom_bar(fill="blue", stat="identity") +
coord_flip() +
labs(title="Number of Injuries by Severe Weather Events") +
labs(subtitle="Top 10 Events") +
labs(x="Severe Weather Event", y="Injuries")
## Selecting by SUM.FATALITIES
print(plot1)
plot2 <- ggplot(top_n(hh_df, 10), aes(x=reorder(EVTYPE, SUM.FATALITIES), y=SUM.FATALITIES)) +
geom_bar(fill="red", stat="identity") +
coord_flip() +
labs(title="Number of Fatalities by Severe Weather Events") +
labs(subtitle="Top 10 Events") +
labs(x="Severe Weather Event", y="Fatalities")
## Selecting by SUM.FATALITIES
print(plot2)
The most harmful event for both injuries and fatalities is Tornado. Thunderstorm Wind and Flood account for an important ammount of injuries, though significantly less than Tornado. On the other hand, Excessive Heat and Flood have an important impact in fatalities.
pd_df <- ty_df %>%
select(EVTYPE, PROPDMG, PROPDMGEXP) %>%
mutate(PROPDMGUSD=PROPDMG*PROPDMGEXP/10^9) %>%
group_by(EVTYPE) %>%
summarize(PROPDMG.BUSD=sum(PROPDMGUSD)) %>%
arrange(desc(PROPDMG.BUSD))
head(pd_df)
## # A tibble: 6 x 2
## EVTYPE PROPDMG.BUSD
## <chr> <dbl>
## 1 FLOOD 167
## 2 HURRICANE (TYPHOON) 85.4
## 3 TORNADO 56.9
## 4 STORM SURGE 43.3
## 5 HAIL 15.7
## 6 THUNDERSTORM WIND 9.72
plot3 <- ggplot(top_n(pd_df, 10), aes(x=reorder(EVTYPE, PROPDMG.BUSD), y=PROPDMG.BUSD)) +
geom_bar(fill="green", stat="identity") +
coord_flip() +
labs(title="Estimated Property Damage by Severe Weather Events") +
labs(subtitle="Top 10 Events") +
labs(x="Severe Weather Event", y="Property Damages (Billions USD)")
## Selecting by PROPDMG.BUSD
print(plot3)
Regarding the economic impact of Severe Weather Events, Flood is the most harmful event, with an estimated impact of 167 billions of USD, followed by Hurricane, with 85.4 billions and Tornado, with 56.9 billions.