Weather Event like Lightning, Hurricane, Flood, Tornado and more, can pose danger to life and properties. This paper gives a clear cut or easily reproduciable experiment that answer the following question:
1. Across the United States, which types of events are most harmful with respect to population health?.
2. Across the United States, which types of events have the greatest economic consequences?
Data used for this experiment was goten form The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.
In this paper I was able conclude that the event type mostly associated to human health in the United State is Tornado and for the economy is Flood.
library(readr, quietly = T, warn.conflicts = F)
library(dplyr, quietly = T, warn.conflicts = F)
library(tidyr, quietly = T, warn.conflicts = F)
library(ggplot2, quietly = T, warn.conflicts = F)
sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18362)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United Kingdom.1252
## [2] LC_CTYPE=English_United Kingdom.1252
## [3] LC_MONETARY=English_United Kingdom.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United Kingdom.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggplot2_3.2.1 tidyr_1.0.0 dplyr_0.8.3 readr_1.3.1
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.2 knitr_1.25 magrittr_1.5 hms_0.5.2
## [5] munsell_0.5.0 tidyselect_0.2.5 colorspace_1.4-1 R6_2.4.0
## [9] rlang_0.4.1 stringr_1.4.0 tools_3.6.1 grid_3.6.1
## [13] gtable_0.3.0 xfun_0.10 withr_2.1.2 htmltools_0.4.0
## [17] lazyeval_0.2.2 yaml_2.2.0 digest_0.6.22 assertthat_0.2.1
## [21] lifecycle_0.1.0 tibble_2.1.3 crayon_1.3.4 purrr_0.3.3
## [25] vctrs_0.2.0 zeallot_0.1.0 glue_1.3.1 evaluate_0.14
## [29] rmarkdown_1.16 stringi_1.4.3 compiler_3.6.1 pillar_1.4.2
## [33] scales_1.0.0 backports_1.1.5 pkgconfig_2.0.3
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. STORM DATASET
There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
1. National Weather Service Storm Data Documentation.
2. National Climatic Data Center Storm Events FAQ.
dataset.url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
dataset.dir <- "data"
dataset.name <- "repdata_data_StormData.csv.bz2"
dataset.destfile <- file.path(dataset.dir, dataset.name)
if(!file.exists(dataset.dir)){
dir.create(path = dataset.dir)
}
if(!file.exists(dataset.destfile)){
download.file(url = dataset.url,
destfile = dataset.destfile)
}
storm_df <- read_csv(dataset.destfile)
storm_df <- storm_df %>%
select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
The Event types variable (EVTYPE) have some input error resulting from spellings errors, case, etc. (i.e “THUNDERSTORM WIND”, “THUNDERSTORM WINDS”, “TSTM”).
uniqueEvent <- length(unique(storm_df$EVTYPE))
There are currently 977 unique entry in the EVTYPE variable whereass, according to the Chapter 7.1 - 7.48 of the STORM DATASET documentation there are 48 main Event Type.
storm_df$EVTYPE <- gsub("^(Astronomical Low Tide).*", "Astronomical Low Tide", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Avalanche).*", "Avalanche", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Blizzard).*", "Blizzard", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Coastal Flood).*", "Coastal Flood", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(CoastalFlood).*", "Coastal Flood", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Cold/Wind Chill).*", "Cold/Wind Chill", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Cold).*", "Cold/Wind Chill", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Wind Chill).*", "Cold/Wind Chill", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Debris Flow).*", "Debris Flow", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Dense Fog).*", "Dense Fog", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Fog).*", "Dense Fog", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Dense Smoke).*", "Dense Smoke", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Drought).*", "Drought", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Dust Devil).*", "Dust Devil", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Dust Storm).*", "Dust Storm", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Excessive Heat).*", "Excessive Heat", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Extreme Cold/Wind Chill).*", "Extreme Cold/Wind Chill", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Flash Flood).*", "Flash Flood", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Flood).*", "Flood", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Freezing Fog).*", "Freezing Fog", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Frost/Freeze).*", "Frost/Freeze", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Frost).*", "Frost/Freeze", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Freeze).*", "Frost/Freeze", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Funnel Cloud).*", "Funnel Cloud", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Hail).*", "Hail", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Heat).*", "Heat", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Heavy Rain).*", "Heavy Rain", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Heavy Snow).*", "Heavy Snow", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(High Surf).*", "High Surf", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(High Wind).*", "High Wind", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Hurricane).*", "Hurricane/Typhoon", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Typhoon).*", "Hurricane/Typhoon", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Ice Storm).*", "Ice Storm", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Lakeshore Flood).*", "Lakeshore Flood", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Lake-Effect Snow).*", "Lake-Effect Snow", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Lightning).*", "Lightning", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Marine Hail).*", "Marine Hail", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Marine High Wind).*", "Marine High Wind", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Marine Strong Wind).*", "Marine Strong Wind", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Marine Thunderstorm Wind).*", "Marine Thunderstorm Wind", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(MARINE TSTM WIND).*", "Marine Thunderstorm Wind", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Rip Current).*", "Rip Current", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Seiche).*", "Seiche", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Sleet).*", "Sleet", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Storm Tide).*", "Storm Tide", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Strong Wind).*", "Strong Wind", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Thunderstorm Wind).*", "Thunderstorm Wind", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(TSTM).*", "Thunderstorm Wind",x = storm_df$EVTYPE, ignore.case = TRUE) #926
storm_df$EVTYPE <- gsub("^(THUNDER*.STORM).*", "Thunderstorm Wind",x = storm_df$EVTYPE, ignore.case = TRUE) #853
storm_df$EVTYPE <- gsub("^(Tornado).*", "Tornado", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Tropical Depression).*", "Tropical Depression", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Tropical Storm).*", "Tropical Storm", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Tsunami).*", "Tsunami", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Volcanic Ash).*", "Volcanic Ash", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Waterspout).*", "Waterspout", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Wildfire).*", "Wildfire", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Winter Storm).*", "Winter Storm", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Winter Weather).*", "Winter Weather", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df <- storm_df[!grepl(pattern = "^(Summary).*", x = storm_df$EVTYPE,
ignore.case = TRUE),]
# updating the no unique event
uniqueEvent <- length(unique(storm_df$EVTYPE))
After performing the above correction the unique Event are now 544.
Sample of the dataset, showing Top 10 EVTYPE based on frequency (n) in the Storm_df.
head(x = select(storm_df, EVTYPE) %>%
group_by(EVTYPE) %>%
summarise(n = n()) %>%
arrange(desc(n)),
n = 10)
## # A tibble: 10 x 2
## EVTYPE n
## <chr> <int>
## 1 Thunderstorm Wind 324765
## 2 Hail 288776
## 3 Tornado 60686
## 4 Flash Flood 55038
## 5 Flood 26098
## 6 High Wind 21801
## 7 Heavy Snow 15792
## 8 Lightning 15766
## 9 Marine Thunderstorm Wind 11987
## 10 Heavy Rain 11801
population.health <- select(storm_df, Event.Type = EVTYPE, FATALITIES, INJURIES) %>%
mutate(Event.Type = as.factor(Event.Type)) %>%
group_by(Event.Type) %>%
summarise(Fatalities = sum(FATALITIES),
Injuries = sum(INJURIES),
Total = sum(FATALITIES, INJURIES)) %>%
arrange(desc(Total))
head(population.health, 10)
## # A tibble: 10 x 4
## Event.Type Fatalities Injuries Total
## <fct> <dbl> <dbl> <dbl>
## 1 Tornado 5658 91364 97022
## 2 Thunderstorm Wind 710 9508 10218
## 3 Excessive Heat 1903 6525 8428
## 4 Flood 495 6806 7301
## 5 Lightning 817 5232 6049
## 6 Heat 1118 2494 3612
## 7 Flash Flood 1018 1785 2803
## 8 Ice Storm 89 1977 2066
## 9 High Wind 293 1471 1764
## 10 Winter Storm 217 1353 1570
#subseting the first 5 based on Total variable
pivot_longer(data = population.health[1:5,1:3], Fatalities:Injuries, names_to = "Type", values_to = "Total") %>%
#plotting
ggplot() +
aes(x=reorder(Event.Type, +Total), y=Total, fill=Type) +
scale_fill_manual(values = c("#e41a11","#fa975080")) +
labs(title = "Graph Showing number of Casuality of Weather Events in U.S." ,
subtitle = "With Distintion of Whether event led to Fatility or Injury.")+
theme(plot.title = element_text(lineheight=0.8, face="bold")) +
xlab("Event Type") +
ylab("Total Casuality") +
geom_histogram(stat="identity", alpha=1) +
coord_flip()
## Warning: Ignoring unknown parameters: binwidth, bins, pad
economic_effect <- select(storm_df, Event.Type = EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
mutate(PROPDMGEXP = as.character(PROPDMGEXP), CROPDMGEXP = as.character(CROPDMGEXP))
economic_effect$PROPDMGEXP <- recode(economic_effect$PROPDMGEXP, "-" = 1, "?" = 1, "+" = 1, "0" = 1, "1" = 10, "2" = 10^2, "3" = 10^3, "4" = 10^4, "5" = 10^5, "6" = 10^6, "7" = 10^7, "8" = 10^8, "9" = 10^9, "h" = 10^2, "H" = 10^2, "k" = 10^3, "K" = 10^3, "m" = 10^6, "M" = 10^6, "b" = 10^9, "B" = 10^9, .default = 1, .missing = 1)
economic_effect$CROPDMGEXP <- recode(economic_effect$CROPDMGEXP, "FALSE" = 1 , .default = 1, .missing = 1)
economic_effect <- mutate(economic_effect,
Event.Type = Event.Type,
Property = PROPDMG * PROPDMGEXP,
Crop = CROPDMG * CROPDMGEXP,
Total.Damages = Property + Crop) %>%
mutate(Event.Type = as.factor(Event.Type)) %>%
group_by(Event.Type) %>%
summarise(Property = sum(Property),
Crop = sum(Crop),
Total.Damages = sum(Total.Damages)) %>%
arrange(desc(Total.Damages))
head(economic_effect, 10)
## # A tibble: 10 x 4
## Event.Type Property Crop Total.Damages
## <fct> <dbl> <dbl> <dbl>
## 1 Flood 144958136816 172989. 144958309805.
## 2 Hurricane/Typhoon 85356410010 11638. 85356421648.
## 3 Tornado 58552151876. 100029. 58552251906.
## 4 STORM SURGE 43323536000 5 43323536005
## 5 Flash Flood 17414731089. 185057. 17414916145.
## 6 Hail 15977470013. 579736. 15978049749.
## 7 Thunderstorm Wind 9970812300. 199294. 9971011594.
## 8 Tropical Storm 7714390550 6465. 7714397015.
## 9 Winter Storm 6748997251 2484. 6748999735.
## 10 High Wind 6003353043 21058. 6003374101.
#subseting the first 5 based on Total variable
pivot_longer(data = economic_effect[1:5,1:3], Property:Crop, names_to = "Damage", values_to = "Total") %>%
#plotting
ggplot()+
aes(x=reorder(Event.Type, +Total), y=Total, fill=Damage) +
scale_fill_manual(values = c("#039e16","#cf790b")) +
theme(plot.title = element_text(lineheight=0.8, face="bold")) +
xlab("Event Type") +
ylab("Total Value ($)")+
labs(title = "Graph Showing Economic Effect of Weather Events in U.S." ,
subtitle = "Focusing on the Total Property and Crop Damage Value.")+
facet_wrap(. ~ Damage, scales = "free", ncol = 1) +
geom_histogram(stat="identity", alpha=1) +
coord_flip()
## Warning: Ignoring unknown parameters: binwidth, bins, pad
TornadoFlood is the most harzardious event type that affect the economy in general.