Synopsis:

Weather Event like Lightning, Hurricane, Flood, Tornado and more, can pose danger to life and properties. This paper gives a clear cut or easily reproduciable experiment that answer the following question:
1. Across the United States, which types of events are most harmful with respect to population health?.
2. Across the United States, which types of events have the greatest economic consequences?
Data used for this experiment was goten form The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.
In this paper I was able conclude that the event type mostly associated to human health in the United State is Tornado and for the economy is Flood.

Data Processing

1. Required Libraries

library(readr, quietly = T, warn.conflicts = F)
library(dplyr, quietly = T, warn.conflicts = F)
library(tidyr, quietly = T, warn.conflicts = F)
library(ggplot2, quietly = T, warn.conflicts = F)

2. Hardware and Software description

sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18362)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United Kingdom.1252 
## [2] LC_CTYPE=English_United Kingdom.1252   
## [3] LC_MONETARY=English_United Kingdom.1252
## [4] LC_NUMERIC=C                           
## [5] LC_TIME=English_United Kingdom.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_3.2.1 tidyr_1.0.0   dplyr_0.8.3   readr_1.3.1  
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.2       knitr_1.25       magrittr_1.5     hms_0.5.2       
##  [5] munsell_0.5.0    tidyselect_0.2.5 colorspace_1.4-1 R6_2.4.0        
##  [9] rlang_0.4.1      stringr_1.4.0    tools_3.6.1      grid_3.6.1      
## [13] gtable_0.3.0     xfun_0.10        withr_2.1.2      htmltools_0.4.0 
## [17] lazyeval_0.2.2   yaml_2.2.0       digest_0.6.22    assertthat_0.2.1
## [21] lifecycle_0.1.0  tibble_2.1.3     crayon_1.3.4     purrr_0.3.3     
## [25] vctrs_0.2.0      zeallot_0.1.0    glue_1.3.1       evaluate_0.14   
## [29] rmarkdown_1.16   stringi_1.4.3    compiler_3.6.1   pillar_1.4.2    
## [33] scales_1.0.0     backports_1.1.5  pkgconfig_2.0.3

3. Dataset Sources:

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. STORM DATASET

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
1. National Weather Service Storm Data Documentation.
2. National Climatic Data Center Storm Events FAQ.

Downloading and reading the STORM Dataset:

dataset.url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
dataset.dir <- "data"
dataset.name <- "repdata_data_StormData.csv.bz2"
dataset.destfile <- file.path(dataset.dir, dataset.name)

if(!file.exists(dataset.dir)){
  dir.create(path = dataset.dir)
}

if(!file.exists(dataset.destfile)){
  download.file(url = dataset.url,
                destfile = dataset.destfile)
}

storm_df <- read_csv(dataset.destfile)

Selecting the column required for analysis

storm_df <- storm_df %>%
  select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

Regularising Event type (EVTYPE) Variable

The Event types variable (EVTYPE) have some input error resulting from spellings errors, case, etc. (i.e “THUNDERSTORM WIND”, “THUNDERSTORM WINDS”, “TSTM”).

uniqueEvent <- length(unique(storm_df$EVTYPE))

There are currently 977 unique entry in the EVTYPE variable whereass, according to the Chapter 7.1 - 7.48 of the STORM DATASET documentation there are 48 main Event Type.

Regularising based on the expected 48 Event Type.
storm_df$EVTYPE <- gsub("^(Astronomical Low Tide).*", "Astronomical Low Tide", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Avalanche).*", "Avalanche", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Blizzard).*", "Blizzard", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Coastal Flood).*", "Coastal Flood", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(CoastalFlood).*", "Coastal Flood", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Cold/Wind Chill).*", "Cold/Wind Chill", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Cold).*", "Cold/Wind Chill", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Wind Chill).*", "Cold/Wind Chill", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Debris Flow).*", "Debris Flow", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Dense Fog).*", "Dense Fog", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Fog).*", "Dense Fog", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Dense Smoke).*", "Dense Smoke", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Drought).*", "Drought", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Dust Devil).*", "Dust Devil", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Dust Storm).*", "Dust Storm", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Excessive Heat).*", "Excessive Heat", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Extreme Cold/Wind Chill).*", "Extreme Cold/Wind Chill", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Flash Flood).*", "Flash Flood", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Flood).*", "Flood", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Freezing Fog).*", "Freezing Fog", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Frost/Freeze).*", "Frost/Freeze", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Frost).*", "Frost/Freeze", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Freeze).*", "Frost/Freeze", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Funnel Cloud).*", "Funnel Cloud", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Hail).*", "Hail", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Heat).*", "Heat", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Heavy Rain).*", "Heavy Rain", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Heavy Snow).*", "Heavy Snow", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(High Surf).*", "High Surf", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(High Wind).*", "High Wind", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Hurricane).*", "Hurricane/Typhoon", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Typhoon).*", "Hurricane/Typhoon", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Ice Storm).*", "Ice Storm", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Lakeshore Flood).*", "Lakeshore Flood", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Lake-Effect Snow).*", "Lake-Effect Snow", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Lightning).*", "Lightning", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Marine Hail).*", "Marine Hail", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Marine High Wind).*", "Marine High Wind", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Marine Strong Wind).*", "Marine Strong Wind", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Marine Thunderstorm Wind).*", "Marine Thunderstorm Wind", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(MARINE TSTM WIND).*", "Marine Thunderstorm Wind", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Rip Current).*", "Rip Current", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Seiche).*", "Seiche", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Sleet).*", "Sleet", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Storm Tide).*", "Storm Tide", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Strong Wind).*", "Strong Wind", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Thunderstorm Wind).*", "Thunderstorm Wind", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(TSTM).*", "Thunderstorm Wind",x = storm_df$EVTYPE, ignore.case = TRUE) #926
storm_df$EVTYPE <- gsub("^(THUNDER*.STORM).*", "Thunderstorm Wind",x = storm_df$EVTYPE, ignore.case = TRUE) #853
storm_df$EVTYPE <- gsub("^(Tornado).*", "Tornado", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Tropical Depression).*", "Tropical Depression", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Tropical Storm).*", "Tropical Storm", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Tsunami).*", "Tsunami", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Volcanic Ash).*", "Volcanic Ash", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Waterspout).*", "Waterspout", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Wildfire).*", "Wildfire", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Winter Storm).*", "Winter Storm", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df$EVTYPE <- gsub("^(Winter Weather).*", "Winter Weather", x = storm_df$EVTYPE, ignore.case = TRUE)
storm_df <- storm_df[!grepl(pattern = "^(Summary).*", x = storm_df$EVTYPE,
                            ignore.case = TRUE),]
# updating the no unique event
uniqueEvent <- length(unique(storm_df$EVTYPE))

After performing the above correction the unique Event are now 544.

Regularising based observed anomalies

Sample of the dataset, showing Top 10 EVTYPE based on frequency (n) in the Storm_df.

head(x = select(storm_df, EVTYPE) %>%
       group_by(EVTYPE) %>%
       summarise(n = n()) %>%
       arrange(desc(n)), 
     n = 10)
## # A tibble: 10 x 2
##    EVTYPE                        n
##    <chr>                     <int>
##  1 Thunderstorm Wind        324765
##  2 Hail                     288776
##  3 Tornado                   60686
##  4 Flash Flood               55038
##  5 Flood                     26098
##  6 High Wind                 21801
##  7 Heavy Snow                15792
##  8 Lightning                 15766
##  9 Marine Thunderstorm Wind  11987
## 10 Heavy Rain                11801

Analysis questions:

1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

population.health <- select(storm_df, Event.Type = EVTYPE, FATALITIES, INJURIES) %>%
  mutate(Event.Type = as.factor(Event.Type)) %>%
  group_by(Event.Type) %>%
  summarise(Fatalities = sum(FATALITIES),
            Injuries = sum(INJURIES),
            Total = sum(FATALITIES, INJURIES)) %>%
  arrange(desc(Total))

head(population.health, 10)
## # A tibble: 10 x 4
##    Event.Type        Fatalities Injuries Total
##    <fct>                  <dbl>    <dbl> <dbl>
##  1 Tornado                 5658    91364 97022
##  2 Thunderstorm Wind        710     9508 10218
##  3 Excessive Heat          1903     6525  8428
##  4 Flood                    495     6806  7301
##  5 Lightning                817     5232  6049
##  6 Heat                    1118     2494  3612
##  7 Flash Flood             1018     1785  2803
##  8 Ice Storm                 89     1977  2066
##  9 High Wind                293     1471  1764
## 10 Winter Storm             217     1353  1570
#subseting the first 5 based on Total variable
pivot_longer(data = population.health[1:5,1:3], Fatalities:Injuries, names_to = "Type", values_to = "Total") %>%
  
  #plotting
  ggplot() + 
  aes(x=reorder(Event.Type, +Total), y=Total, fill=Type) +
  scale_fill_manual(values = c("#e41a11","#fa975080")) +
  labs(title = "Graph Showing number of Casuality of Weather Events in U.S." ,
       subtitle = "With Distintion of Whether event led to Fatility or Injury.")+ 
  theme(plot.title = element_text(lineheight=0.8, face="bold")) +
  xlab("Event Type") +
  ylab("Total Casuality") +
  geom_histogram(stat="identity", alpha=1) + 
  coord_flip()
## Warning: Ignoring unknown parameters: binwidth, bins, pad

2: Across the United States, which types of events have the greatest economic consequences?

economic_effect <- select(storm_df, Event.Type = EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
  mutate(PROPDMGEXP = as.character(PROPDMGEXP), CROPDMGEXP = as.character(CROPDMGEXP))

economic_effect$PROPDMGEXP <- recode(economic_effect$PROPDMGEXP, "-" = 1, "?" = 1, "+" = 1, "0" = 1, "1" = 10, "2" = 10^2, "3" = 10^3, "4" = 10^4, "5" = 10^5, "6" = 10^6, "7" = 10^7, "8" = 10^8, "9" = 10^9, "h" = 10^2, "H" = 10^2, "k" = 10^3, "K" = 10^3, "m" = 10^6, "M" = 10^6, "b" = 10^9, "B" = 10^9, .default = 1, .missing = 1) 

economic_effect$CROPDMGEXP <- recode(economic_effect$CROPDMGEXP, "FALSE" = 1 , .default = 1, .missing = 1)

economic_effect <- mutate(economic_effect, 
                          Event.Type = Event.Type,
                          Property = PROPDMG * PROPDMGEXP,
                          Crop = CROPDMG * CROPDMGEXP,
                          Total.Damages = Property + Crop) %>%
  mutate(Event.Type = as.factor(Event.Type)) %>%
  group_by(Event.Type) %>%
  summarise(Property = sum(Property),
            Crop = sum(Crop),
            Total.Damages = sum(Total.Damages)) %>%
  arrange(desc(Total.Damages))

head(economic_effect, 10)
## # A tibble: 10 x 4
##    Event.Type             Property    Crop Total.Damages
##    <fct>                     <dbl>   <dbl>         <dbl>
##  1 Flood             144958136816  172989. 144958309805.
##  2 Hurricane/Typhoon  85356410010   11638.  85356421648.
##  3 Tornado            58552151876. 100029.  58552251906.
##  4 STORM SURGE        43323536000       5   43323536005 
##  5 Flash Flood        17414731089. 185057.  17414916145.
##  6 Hail               15977470013. 579736.  15978049749.
##  7 Thunderstorm Wind   9970812300. 199294.   9971011594.
##  8 Tropical Storm      7714390550    6465.   7714397015.
##  9 Winter Storm        6748997251    2484.   6748999735.
## 10 High Wind           6003353043   21058.   6003374101.
#subseting the first 5 based on Total variable
pivot_longer(data = economic_effect[1:5,1:3], Property:Crop, names_to = "Damage", values_to = "Total") %>%
  #plotting
  ggplot()+
  aes(x=reorder(Event.Type, +Total), y=Total, fill=Damage) +
  scale_fill_manual(values = c("#039e16","#cf790b")) +
  theme(plot.title = element_text(lineheight=0.8, face="bold")) +
  xlab("Event Type") +
  ylab("Total Value ($)")+
  labs(title = "Graph Showing Economic Effect of Weather Events in U.S." ,
       subtitle = "Focusing on the Total Property and Crop Damage Value.")+
  facet_wrap(. ~ Damage, scales = "free", ncol = 1) +
  geom_histogram(stat="identity", alpha=1) + 
  coord_flip()
## Warning: Ignoring unknown parameters: binwidth, bins, pad

Results

  1. From the above analysis i can conclude that the most harzardious event type associated to human health in the United State is Tornado
  2. From the last graph and table one can easily infer that Flood is the most harzardious event type that affect the economy in general.
    Also from the analysis2 graph facets you can see the strong points of each of these weather event on Crop and Properties specifically