The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database is an online database which stores data related to storms in the US.In particular for this analysis the health and ecomonic impacts from all storms will be evaluated to determine which event type generally cause most harm.
In terms of health impact tornado events were the most harmful for both fatalities and injuries with 96979 combined fatalities and injuries. Second most harmful are excessive heat events with a considerably lower 8428 combined fatalities and injuries. The economic impact was also assessed and floods caused most economic damage with a total of $150.32B of damage. Following this were hurricane/typhoon events with a total of $71.91B worth of damage.
The analysis uses the tidyverse library and the NOAA database from the online repository. This is then read in to a data frame called storm.
csvin <- "Storm_data.csv"
if(!file.exists(csvin)) {
csvurl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url=csvurl,destfile=csvin)
}
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(knitr)
storm <- read.csv(file="Storm_data.csv", header=TRUE)
The original database has 37 variables so to save memory first select
only the variables needed for analysis: event type, fatalities,
injuries, crop damage (and related exponent), property damage (and
related exponent). These column names are set to lower case for ease.
The EVTYPE variable for event types has a mix of cases, so this is set
to upper case. After this is done it is found there are 898 unique event
types.
A list of crop and property damage exponents is checked so that a
cost_factor function can be created, to create a factor of 10 ready to
be multiplied by the property/crop damage value in prop_tot/crop_tot
variables. The damage are so large these are then converted to billions
of dollars. The data is then ready to be analysed so first create the
pop_affect dataset which sums fatalities and injuries by event type.
This data is then sorted by descending order of combined fatalities and
injuries and the top 8 results are taken. Similarly a dmgval dataset is
created which sums crop and proerty damage by event type. This data is
then sorted by descending order of combined economic damage and the top
8 results are taken.
names(storm)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
storma<-storm %>%
select('EVTYPE','FATALITIES','INJURIES', 'PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP') %>%
rename_all(tolower)
storma$evtype <-toupper(storma$evtype)
str(storma)
## 'data.frame': 902297 obs. of 7 variables:
## $ evtype : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ fatalities: num 0 0 0 0 0 0 0 0 1 0 ...
## $ injuries : num 15 0 2 2 2 6 1 0 14 0 ...
## $ propdmg : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ propdmgexp: chr "K" "K" "K" "K" ...
## $ cropdmg : num 0 0 0 0 0 0 0 0 0 0 ...
## $ cropdmgexp: chr "" "" "" "" ...
length(unique(storma$evtype))
## [1] 898
unique(storma$propdmgexp)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
unique(storma$cropdmgexp)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
cost_factor <-function(x) {
if (toupper(x)=="M")
return(1000000)
else if (toupper(x)=="H")
return(100)
else if (toupper(x)=="K")
return(1000)
else if (toupper(x)=="B")
return(1000000000)
else if (!is.na(as.numeric(x)))
return(10^as.numeric(x))
else if (x %in% c('', '-', '?', '+'))
return(1)
else {
stop("Exponent to be added to list")
}
}
storma$prop_fact <- sapply(storma$propdmgexp, FUN=cost_factor)
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
storma$crop_fact <- sapply(storma$cropdmgexp, FUN=cost_factor)
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
storma$prop_tot <- (storma$propdmg * (storma$prop_fact))/1000000000
storma$crop_tot <- (storma$cropdmg * (storma$crop_fact))/1000000000
storma$evtype <-toupper(storma$evtype)
names(storma)
## [1] "evtype" "fatalities" "injuries" "propdmg" "propdmgexp"
## [6] "cropdmg" "cropdmgexp" "prop_fact" "crop_fact" "prop_tot"
## [11] "crop_tot"
pop_affect<-storma %>%
filter(fatalities>0 | injuries>0) %>%
group_by(evtype) %>%
summarise(totfat=sum(fatalities, na.rm=TRUE), totinj=sum(injuries, na.rm=TRUE),
totboth=sum(injuries+fatalities, na.rm=TRUE)) %>%
arrange(desc(totboth), evtype) %>%
slice_head(n=8) %>%
pivot_longer(totfat:totinj, names_to = "Type")
dmgval<-storma %>%
filter(prop_tot>0 | crop_tot>0) %>%
group_by(evtype) %>%
summarise(prop_dmg_val=sum(prop_tot, na.rm=TRUE), crop_dmg_val=sum(crop_tot, na.rm=TRUE),
bothdmgv=sum(crop_tot+prop_tot, na.rm=TRUE)) %>%
arrange(desc(bothdmgv), evtype) %>%
slice_head(n=8) %>%
pivot_longer(prop_dmg_val:crop_dmg_val, names_to = "Type")
Plot a graph of the top 8 events by health impact
ggplot(data=pop_affect, aes(x=reorder(evtype,-value), y=value, fill=Type)) +
geom_bar(position = "dodge", stat="identity") +
labs(x="Event Type", y="Frequency")+
ggtitle("Total Number of Fatalities and Injuries for top 8 worst events") +
theme_bw() +
theme(axis.text.x = element_text(angle = 20, vjust=0.7)) +
scale_fill_manual(values=c("red", "yellow"))
The plot clearly shows that tornado is the event type with most impact on peoples health.
Plot a graph of the top 8 events by economic impact
ggplot(data=dmgval, aes(x=reorder(evtype,-value), y=value, fill=Type)) +
geom_bar(position = "dodge", stat="identity") +
labs(x="Event Type", y="Cost of damage (Billions of $)")+
ggtitle("Events that caused most economic damage") +
theme_bw() +
theme(axis.text.x = element_text(angle = 20, vjust=0.7)) +
scale_fill_manual(values=c("blue", "green"))
The plot clearly shows that flood is the event type with most economic damage.