Synopsis

The goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. Specifically, we are going to try to answer two questions:

Data Processing

Load packages necessary for the analysis and show session info

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

sessionInfo()
## R version 4.0.5 (2021-03-31)
## Platform: i386-w64-mingw32/i386 (32-bit)
## Running under: Windows 10 x64 (build 19043)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252   
## [3] LC_MONETARY=English_Canada.1252 LC_NUMERIC=C                   
## [5] LC_TIME=English_Canada.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_3.3.5 dplyr_1.0.7  
## 
## loaded via a namespace (and not attached):
##  [1] knitr_1.32        magrittr_2.0.1    munsell_0.5.0     tidyselect_1.1.0 
##  [5] colorspace_2.0-0  R6_2.5.0          rlang_0.4.10      fansi_0.4.2      
##  [9] stringr_1.4.0     tools_4.0.5       grid_4.0.5        gtable_0.3.0     
## [13] xfun_0.22         utf8_1.2.1        DBI_1.1.1         withr_2.4.1      
## [17] jquerylib_0.1.4   htmltools_0.5.1.1 ellipsis_0.3.1    assertthat_0.2.1 
## [21] yaml_2.2.1        digest_0.6.27     tibble_3.1.0      lifecycle_1.0.0  
## [25] crayon_1.4.1      purrr_0.3.4       vctrs_0.3.7       glue_1.4.2       
## [29] evaluate_0.14     rmarkdown_2.11    stringi_1.7.6     compiler_4.0.5   
## [33] pillar_1.6.0      scales_1.1.1      generics_0.1.0    pkgconfig_2.0.3

Download, read and load file

url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"

download.file(url, dest = "dataset.csv")

dataset_NOAA_original <- read.csv("dataset.csv", na.strings="?")

The variables in the dataset that are relevant to the analysis questions are:

EVTYPE: Event types e.g. Tornado, Flood.
FATALITIES: Number of fatalities caused.
INJURIES: Number of injuries caused.
PROPDMG: Property damage caused.
PROPDMGEXP: Alphabetical characters used to signify magnitude for property damage include “K” for thousands, “M” for millions, and “B” for billions.
CROPDMG: Crop damage caused.
CROPDMGEXP: Alphabetical characters used to signify magnitude for crop damage.

Therefore, I created a subset of the original data with these variables.

NOAA_clean <- dataset_NOAA_original[,c('EVTYPE','FATALITIES','INJURIES', 'PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')]

Structure and cleaning

Check the structure of the data set and prepare it for analysis.

There is some inconsistencies in the variable ETYPE, sometimes is type Tornado and sometimes Torndao. I modify some of these inconsistencies adding a new variable CATEGORY so we still have the original data in ETYPE.

str(NOAA_clean)
## 'data.frame':    902297 obs. of  7 variables:
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
# Covert to upper case for intepretation
NOAA_clean$EVTYPE <- toupper(NOAA_clean$EVTYPE)

# Remove extra spaces at the begging and end of the string of events type
NOAA_clean$EVTYPE <- trimws(NOAA_clean$EVTYPE) 

# Remove row with "summary" as of type of event
NOAA_clean <- NOAA_clean[- grep("SUMMARY", NOAA_clean$EVTYPE),]

# Add variable category
NOAA_clean$CATEGORY <- NOAA_clean$EVTYPE
NOAA_clean$CATEGORY[grepl("VOLCANIC", NOAA_clean$CATEGORY)] <- "VOLCANIC" 
NOAA_clean$CATEGORY[grepl("DROUGHT|DRY|DRIEST", NOAA_clean$CATEGORY)] <- "DROUGHT"
NOAA_clean$CATEGORY[grepl("TSTM|THUNDERSTORM", NOAA_clean$CATEGORY)] <- "THUNDERSTORM"
NOAA_clean$CATEGORY[grepl("TORNADO|TORNDAO", NOAA_clean$CATEGORY)] <- "TORNADO"
NOAA_clean$CATEGORY[grepl("WATERSPOUT|WAYTERSPOUT", NOAA_clean$CATEGORY)] <- "WATERSPOUT" 
NOAA_clean$CATEGORY[grepl("FLO|RAIN|PRECIP|DROWNING|SHOWER|WATER", NOAA_clean$CATEGORY)] <- "FLOODING" 
NOAA_clean$CATEGORY[grepl("SLIDE|AVALANCE|AVALANCHE|LANDSLUMP", NOAA_clean$CATEGORY)] <- "LANDSLIDE"
NOAA_clean$CATEGORY[grepl("HEAT|HOT|EXCESSIVE|WARM|HIGH|RECORD|HYPERTHERMIA", NOAA_clean$CATEGORY)] <- "HEAT"
NOAA_clean$CATEGORY[grepl("LOW|SNOW|ICE|FROST|BLIZZARD|HAIL|COLD|COOL|ICY|HYPOTHERMIA|CHILL|FREEZE|WINTER|FREEZING", NOAA_clean$CATEGORY)] <- "COLD"
NOAA_clean$CATEGORY[grepl("WILDFIRE|FIRE", NOAA_clean$CATEGORY)] <- "WILDFIRE"
NOAA_clean$CATEGORY[grepl("LIGHTNING|LIGHTING|LIGNTNING", NOAA_clean$CATEGORY)] <- "LIGTHNING"
NOAA_clean$CATEGORY[grepl("FOG|VOG|DUST|SMOKE", NOAA_clean$CATEGORY)] <- "VISIBILITY"
NOAA_clean$CATEGORY <- factor(NOAA_clean$CATEGORY)

# Calculation of Property and Corp Damage
NOAA_clean$PROPDMGEXP <- recode(NOAA_clean$PROPDMGEXP,'K' = 1000, 'M' = 1000000, 'B' = 1000000000, .default=1)
NOAA_clean$CROPDMGEXP <- recode(NOAA_clean$CROPDMGEXP,'K' = 1000, 'M' = 1000000, 'B' = 1000000000, .default=1)
NOAA_clean$PROPDMGVALUE <- NOAA_clean$PROPDMG * NOAA_clean$PROPDMGEXP
NOAA_clean$CROPDMGVALUE <- NOAA_clean$CROPDMG * NOAA_clean$CROPDMGEXP

Calculations

Health related calculations

  • First, I calculate how many fatalities each of the categories have and then the injuries.
  • Then, added the 10 most devastating to another data set.
  • Finally, change FATALITIES/INJURIES variable name to DAMAGE and added a new variable with the type of damaged, either fatalities or injuries.

Economic related calculations

  • First, I calculate how many crop damage each of the categories have and then the property damage.
  • Then, added the 10 most devastating to another data set.
  • Finally, change CROPDMGVALUE/PROPDMGVALUE variable name to DAMAGE and added a new variable with the type of damaged, either crop damage or property damage.
# Most devastated weather event overall by fatalities
sum_death_by_category <- aggregate(FATALITIES ~ CATEGORY, NOAA_clean, sum) 
sum_death_by_category <- head(sum_death_by_category[order(-sum_death_by_category$FATALITIES), ], 10) %>% 
        rename(DAMAGE = FATALITIES) %>% 
        mutate(TYPE = "FATALITIES")

# Most devastated weather event overall by injuries
sum_injury_by_category <- aggregate(INJURIES ~ CATEGORY, NOAA_clean, sum)
sum_injury_by_category <- head(sum_injury_by_category[order(-sum_injury_by_category$INJURIES), ], 10) %>% 
        rename(DAMAGE = INJURIES) %>% 
        mutate(TYPE = "INJURIES")

# Add Fatalities and Injuries in one dataset
harmful_category <- bind_rows(sum_death_by_category, sum_injury_by_category)

# Most devastated crop damage by category
sum_crop_by_category <- aggregate(CROPDMGVALUE ~ CATEGORY, NOAA_clean, sum)
sum_crop_by_category <- head(sum_crop_by_category[order(-sum_crop_by_category$CROPDMGVALUE), ], 10) %>% 
        rename(DAMAGE = CROPDMGVALUE) %>% 
        mutate(TYPE = "CROP DAMAGE")

# Most devastated property damage by category 
sum_property_by_category <- aggregate(PROPDMGVALUE ~ CATEGORY, NOAA_clean, sum)
sum_property_by_category <- head(sum_property_by_category[order(-sum_property_by_category$PROPDMGVALUE), ], 10) %>% 
        rename(DAMAGE = PROPDMGVALUE) %>% 
        mutate(TYPE = "PROPERTY DAMAGE")

# Add Property and crop damage in one dataset
economic_category <- bind_rows(sum_property_by_category, sum_crop_by_category)

Results

According to graphs and the calculations done to the dataset we can determine that:

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

The three top types of events that are the most harmful with respect to population health are:
1. TORNADOS
2. HEAT
3. FLOODING

Across the United States, which types of events have the greatest economic consequences?

The three top types of events that have the greatest economic consequences are:
1. FLOODING
2. STORM SURGE
3. HURRICANE/TYPHOON

# Question 1 plot
ggplot(harmful_category, aes(x = reorder(CATEGORY, -DAMAGE), y = DAMAGE, fill = TYPE))+
        geom_bar(stat = "identity")+
        labs(title = "Most harmful Events to Population Health", 
             x = "Category", 
             y = "Count") +
        theme(axis.text.x = element_text(angle = 90), plot.title = element_text(hjust = 0.5))

# Question 2 plot
ggplot(economic_category, aes(x = reorder(CATEGORY,-DAMAGE), y = DAMAGE, fill = TYPE))+
        geom_bar(stat = "identity")+
        labs(title = "Most Economical devastated Events to Population", 
             x = "Category", 
             y = "Count") +
        theme(axis.text.x = element_text(angle = 90), plot.title = element_text(hjust = 0.5))