The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database is an online database which stores data related to storms in the US.In particular for this analysis the health and ecomonic impacts from all storms will be evaluated to determine which event type generally cause most harm.

Synopsis

In terms of health impact tornado events were the most harmful for both fatalities and injuries with 96979 combined fatalities and injuries. Second most harmful are excessive heat events with a considerably lower 8428 combined fatalities and injuries. The economic impact was also assessed and floods caused most economic damage with a total of $150.32B of damage. Following this were hurricane/typhoon events with a total of $71.91B worth of damage.

Setup

The analysis uses the tidyverse library and the NOAA database from the online repository. This is then read in to a data frame called storm.

csvin <- "Storm_data.csv"
if(!file.exists(csvin)) {
  csvurl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
  download.file(url=csvurl,destfile=csvin)
}


library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(knitr)
storm <- read.csv(file="Storm_data.csv", header=TRUE)

Data Processing

The original database has 37 variables so to save memory first select only the variables needed for analysis: event type, fatalities, injuries, crop damage (and related exponent), property damage (and related exponent). These column names are set to lower case for ease. The EVTYPE variable for event types has a mix of cases, so this is set to upper case. After this is done it is found there are 898 unique event types.
A list of crop and property damage exponents is checked so that a cost_factor function can be created, to create a factor of 10 ready to be multiplied by the property/crop damage value in prop_tot/crop_tot variables. The damage are so large these are then converted to billions of dollars. The data is then ready to be analysed so first create the pop_affect dataset which sums fatalities and injuries by event type. This data is then sorted by descending order of combined fatalities and injuries and the top 8 results are taken. Similarly a dmgval dataset is created which sums crop and proerty damage by event type. This data is then sorted by descending order of combined economic damage and the top 8 results are taken.

names(storm)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
storma<-storm %>%
        select('EVTYPE','FATALITIES','INJURIES', 'PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP') %>%
        rename_all(tolower) 


storma$evtype <-toupper(storma$evtype)

str(storma)
## 'data.frame':    902297 obs. of  7 variables:
##  $ evtype    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ fatalities: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ injuries  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ propdmg   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ propdmgexp: chr  "K" "K" "K" "K" ...
##  $ cropdmg   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ cropdmgexp: chr  "" "" "" "" ...
length(unique(storma$evtype))
## [1] 898
unique(storma$propdmgexp)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
unique(storma$cropdmgexp)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"
cost_factor <-function(x) {
              if (toupper(x)=="M")
                return(1000000)
              else if (toupper(x)=="H")
                return(100)
              else if (toupper(x)=="K")
                 return(1000)
               else if (toupper(x)=="B")
                 return(1000000000)
              else if (!is.na(as.numeric(x)))
                return(10^as.numeric(x))
              else if (x %in% c('', '-', '?', '+'))
                return(1)
              else {
                stop("Exponent to be added to list")
}
}


storma$prop_fact <- sapply(storma$propdmgexp, FUN=cost_factor)
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
storma$crop_fact <- sapply(storma$cropdmgexp, FUN=cost_factor)
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
storma$prop_tot <- (storma$propdmg * (storma$prop_fact))/1000000000
storma$crop_tot <- (storma$cropdmg * (storma$crop_fact))/1000000000
storma$evtype <-toupper(storma$evtype)
names(storma)
##  [1] "evtype"     "fatalities" "injuries"   "propdmg"    "propdmgexp"
##  [6] "cropdmg"    "cropdmgexp" "prop_fact"  "crop_fact"  "prop_tot"  
## [11] "crop_tot"
pop_affect<-storma %>%
            filter(fatalities>0 | injuries>0) %>%
            group_by(evtype) %>%
            summarise(totfat=sum(fatalities, na.rm=TRUE), totinj=sum(injuries, na.rm=TRUE), 
                      totboth=sum(injuries+fatalities, na.rm=TRUE)) %>%
            arrange(desc(totboth), evtype) %>%
            slice_head(n=8) %>%
              pivot_longer(totfat:totinj, names_to = "Type") 

dmgval<-storma %>%
  filter(prop_tot>0 | crop_tot>0) %>%
  group_by(evtype) %>%
  summarise(prop_dmg_val=sum(prop_tot, na.rm=TRUE), crop_dmg_val=sum(crop_tot, na.rm=TRUE), 
                                            bothdmgv=sum(crop_tot+prop_tot, na.rm=TRUE)) %>%
  arrange(desc(bothdmgv), evtype) %>%
  slice_head(n=8) %>%
  pivot_longer(prop_dmg_val:crop_dmg_val, names_to = "Type") 

Results

Health impact

Plot a graph of the top 8 events by health impact

ggplot(data=pop_affect, aes(x=reorder(evtype,-value), y=value, fill=Type)) +
  geom_bar(position = "dodge", stat="identity") + 
  labs(x="Event Type", y="Frequency")+ 
  ggtitle("Total Number of Fatalities and Injuries for top 8 worst events") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 20, vjust=0.7)) + 
  scale_fill_manual(values=c("red", "yellow"))

The plot clearly shows that tornado is the event type with most impact on peoples health.

Economic impact

Plot a graph of the top 8 events by economic impact

ggplot(data=dmgval, aes(x=reorder(evtype,-value), y=value, fill=Type)) +
  geom_bar(position = "dodge", stat="identity") + 
  labs(x="Event Type", y="Cost of damage (Billions of $)")+ 
  ggtitle("Events that caused most economic damage") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 20, vjust=0.7)) + 
  scale_fill_manual(values=c("blue", "green"))

The plot clearly shows that flood is the event type with most economic damage.