This document adresses the analysis of North America Storm Database available from the U.S. national Oceanic and Atmospheric Administration (NOAA). The database comprises Storm events data from 1950 to 2011 and the goal is to make a simple analysis of the damage of the storms to the population. This analysis gets and insight of the harm suffered by the population (injured and deaths) and the economic losses derived from crop damage and property damage. The document includes the data upload from the available source and required transformation to get conclusions.
The analysis is processed in R and the session information is the following:
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19042)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] scales_1.1.1 forcats_0.5.1 ggplot2_3.3.5 dplyr_1.0.7
##
## loaded via a namespace (and not attached):
## [1] pillar_1.6.2 bslib_0.3.1 compiler_4.1.1 jquerylib_0.1.4
## [5] tools_4.1.1 digest_0.6.27 jsonlite_1.7.2 evaluate_0.14
## [9] lifecycle_1.0.0 tibble_3.1.3 gtable_0.3.0 pkgconfig_2.0.3
## [13] rlang_0.4.11 DBI_1.1.1 yaml_2.2.1 xfun_0.25
## [17] fastmap_1.1.0 withr_2.4.2 stringr_1.4.0 knitr_1.33
## [21] generics_0.1.0 vctrs_0.3.8 sass_0.4.0 grid_4.1.1
## [25] tidyselect_1.1.1 glue_1.4.2 R6_2.5.0 fansi_0.5.0
## [29] rmarkdown_2.10 purrr_0.3.4 magrittr_2.0.1 ellipsis_0.3.2
## [33] htmltools_0.5.2 assertthat_0.2.1 colorspace_2.0-2 utf8_1.2.2
## [37] stringi_1.7.3 munsell_0.5.0 crayon_1.4.1
The Data is uploaded from the provided link and the information is read into the dataframe StormData
if(!file.exists("StormData.bz2")){
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
"StormData.bz2")
}
if(!exists("StormData")){
StormData <- read.csv("StormData.bz2")
}
The total number of events recorded is 902297 with 985 storm types available. This is without a doubt a large number. The data is processed totaling the fatalities and Injuries for every event(storm) type:
StormAnalysis <- StormData %>% group_by(EVTYPE) %>% summarise(
Fatalities = sum(FATALITIES), Injuries = sum(INJURIES))
This allows to analyse both the Fatality and Injuries number with respect to every recorded storm classification. Since we do have a very large number of Storm types, only the top 10 events are compared.
Starting with the Fatalities information, we process and obtain the following plot:
StormFatalities <- StormAnalysis %>% arrange(desc(Fatalities))
TopStormFatalities <- transform(StormFatalities[1:10,], Fatalities = factor(Fatalities))
TopStormFatalities %>%
mutate(EVTYPE = fct_reorder(EVTYPE,desc(Fatalities))) %>% #orders it
ggplot(aes(x=EVTYPE, y=Fatalities)) +
geom_bar(stat="identity", fill = "orange", width = .6) +
coord_flip() +
theme_bw() +
xlab("")
It is easily seen that Tornados are the main cause of fatalities in the US for the 1950 ~ 2011 time period, totaling 5633 individuals. That might not be a surprise, but the second one is, Excessive Heat, following with 1903 cases.
Moving to the Injuries scenario and comparing with the fatalities also for the upper 10 events we get:
StormInjuries <- StormAnalysis %>% arrange(desc(Injuries))
TopStormInjuries <- transform(StormInjuries[1:10,], Injuries = factor(Injuries))
TopStormInjuries %>%
mutate(EVTYPE = fct_reorder(EVTYPE,desc(Injuries))) %>% #orders it
ggplot(aes(x=EVTYPE, y=Injuries)) +
geom_bar(stat="identity", fill = "orange", width = .6) +
coord_flip() +
theme_bw() +
xlab("")
In the Injuries analysis, Tornados keep leading the score, with 91346, followed by Thunderstom Winds, with 6957 cases.
The economic consequences are computed from the damage expenses data available for each of the events. This cost is expressed as an integer value column along with a side column showing the order of magnitude of the corresponding integer. Since the information dates back to 1950, there is no single form of expressing this value. It ranges from a exponential number (ie: 2 meaning 1+e2 or 100) to a characters (ie: M meaning 1000). For that matter a new column is created rearranging the values for both economic expenses avaliable: property losses and crop losses. The obtained values are divided by 1e+9 to make the analysis in billions of dollars and the total cost for each of the storms is calculated:
EXP <- function (x) {
if(x == "M"| x == "m") return(1e+6)
if(x == "k" | x == "K") return(1e+3)
if(x == "B") return(1e+9)
if(x == "H" | x == "h") return(1e+2)
if(x == "0") return(1)
if(x == "2") return(1e+2)
if(x == "1") return(1e+1)
if(x == "3") return(1e+3)
if(x == "4") return(1e+4)
if(x == "5") return(1e+5)
if(x == "6") return(1e+6)
if(x == "7") return(1e+7)
if(x == "8") return(1e+8)
else
return(1)
}
StormData$PROPDMGTOTAL_BB <- sapply(StormData$PROPDMGEXP, EXP)*StormData$PROPDMG/1e+9
StormData$CROPDMGTOTAL_BB <- sapply(StormData$CROPDMGEXP, EXP)*StormData$CROPDMG/1e+9
StormData$TOTALCOST_BB <- StormData$PROPDMGTOTAL_BB + StormData$CROPDMGTOTAL_BB
The cost is totalized per storm event, and plotted to compare for the top 10 events:
StormCostAnalysis <- StormData %>% group_by(EVTYPE) %>% summarise(
EconomicCost_BB = sum(TOTALCOST_BB))
StormCost <- StormCostAnalysis %>% arrange(desc(EconomicCost_BB))
TopStormCost <- transform(StormCost[1:10,], EconomicCost_BB = factor(EconomicCost_BB))
TopStormCost %>%
mutate(EVTYPE = fct_reorder(EVTYPE,desc(EconomicCost_BB))) %>% #orders it
ggplot(aes(x=EVTYPE, y=EconomicCost_BB)) +
scale_y_discrete() +
scale_x_discrete()+
geom_bar(stat="identity", fill = "green", width = .6) +
coord_flip() +
theme_bw() +
xlab("") +
ylab("Economic Cost in Billions") + theme(axis.text.x = element_text(angle =
90, vjust = 0.5, hjust=1))
From the barplot, it is clear that the economic impact differs from the population impact. Floodings is the Storm event affecting the most on the economy, followed by Hurricanes/Typhoons. Tornados, which has the most effect on population, comes on third.