Synopsis

The goal of this analysis is to explore the NOAA Storm Database to address the following questions:
- What type of events are most harmful to population health across the U.S.?
- What type of events have the greatest economic consequences across the U.S.?

Data Processing

The data for this analysis were downloaded from Storm Data and the bzip2 csv file loaded as a data frame. Relevant documentation for the database are available:
    National Weather Service
    National Climatic Data Center Storm Events

knitr::opts_chunk$set(echo = TRUE, warning = F, message = F)
library(tidyverse)

if(!file.exists("./repdata_data_StormData.csv.bz2")){
  download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", 
                destfile = "./repdata_data_StormData.csv.bz2",
                method = "curl", mode = "wb")
}

storm <- read.csv("./repdata_data_StormData.csv.bz2", stringsAsFactors = T)

# inspect data
str(storm)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels ""," Christiansburg",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels ""," CANTON"," TULIA",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","%SD",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","\t","\t\t",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Subset the data

For this analysis we are interested in the effects on population health and economic consequences of storm events. Therefore, we will subset the data for the following information: storm event types, fatalities, injuries, property damage, and crop damage.

Definition of variables used in this analysis:
EVTYPE: Factor. The original event type description as recorded in the NOAA Storm Database.
INJURIES: Num. The number of non-fatal injuries caused by the event. Includes physical injuries to people, not property.
FATALITIES: Num. The number of deaths directly caused by the death.
PROPDMG: Num. The numeric value of property damage caused by the event. Must be multipled by the PROPDMGEXP to obtain the U.S. Dollar amount.
PROPDMGEXP: Factor. The exponent/magnitude code for the property damage. Valid codes are: K: thousand, M: million, B: billion, H: hundred, 0-9: 10^n. Everything else will have no multiplier.
CROPDMG: Num. The numeric value of crop damage caused by the event. Must be multiplied by the CROPDMGEXP to obtain the U.S. Dollar amount.
CROPDMGEXP: Factor. The exponent/magnitude code for cop damage. Has the same valid codes as the PROPDMGEXP.

names(storm)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
storm <- storm %>% dplyr::select("EVTYPE", "INJURIES", "FATALITIES", "PROPDMG", "PROPDMGEXP",
                        "CROPDMG", "CROPDMGEXP")

Data Cleaning

The EVTYPE column contains the storm event types. There are currently 985 present. These events should be grouped according to the allowed events listed in the documentation. The EVTYPE_clean column was created to contain the cleaned storm events that match the allowed list and, in the case of no match, the event was assigned “OTHER”.

Following classification of events to match the allowed events list, the PROPDMGEXP and CROPDMGEXP columns were formatted to contain numeric multipliers according to their valid codes (see above). Two new columns were created, PROP_total and CROP_total, which contain the total property damage or crop damage in U.S. Dollars, respectively. These were derived from multiplying PROPDMG by the numeric PROPDMGEXP or CROPDMG by the numeric CROPEDMGEXP, respectively.

# cleaning the EVTYPE col
length(levels(storm$EVTYPE))
## [1] 985
storm <- storm %>% mutate(
  EVTYPE = toupper(EVTYPE) %>% trimws(), # make all uppercase and trim white space
  EVTYPE = str_replace_all(EVTYPE, " {2,}", " "), # replace multiple space with single space
  EVTYPE = str_replace_all(EVTYPE, "[[:punct:]]+", " ") # remove punctuation
) %>% 
  filter(!str_detect(pattern = "^SUMMARY", EVTYPE)) %>%  # remove summary rows
  mutate(EVTYPE = factor(EVTYPE))

length(levels(storm$EVTYPE)) # reduces factors to 792
## [1] 792
allowed_events <- toupper(c("Astronomical Low Tide", "Avalanche", "Blizzard",
                            "Coastal Flood", "Cold/Wind Chill", "Debris Flow",
                            "Dense Fog", "Dense Smoke", "Drought", "Dust Devil",
                            "Dust Storm", "Excessive Heat", 
                            "Extreme Cold/Wind Chill", "Flash Flood", "Flood", 
                            "Frost/Freeze", "Funnel Cloud", "Freezing Fog", 
                            "Hail", "Heat", "Heavy Rain", "Heavy Snow", 
                            "High Surf", "High Wind", "Hurricane (Typhoon)",
                            "Ice Storm", "Lake-Effect Snow", "Lakeshore Flood",
                            "Lightning", "Marine Hail", "Marine High Wind", 
                            "Marine Strong Wind", "Marine Thunderstorm Wind", 
                            "Rip Current", "Seiche", "Sleet", "Storm Surge/Tide",
                            "Strong Wind", "Thunderstorm Wind", "Tornado", 
                            "Tropical Depression", "Tropical Storm", "Tsunami", 
                            "Volcanic Ash", "Waterspout", "Wildfire", 
                            "Winter Storm", "Winter Weather")) %>% 
  .[order(nchar(.), decreasing = T)]
allowed_events2 <- allowed_events %>% str_replace_all("[[:punct:]]+", " ")
allowed_lookup <- setNames(allowed_events, allowed_events2)

# assign events which match the approved list exactly
storm <- storm %>% mutate(EVTYPE_clean = if_else(EVTYPE %in% allowed_events, EVTYPE, NA_character_))

# find the most common remaining events
remaining <- storm %>% filter(is.na(EVTYPE_clean)) %>% 
  count(EVTYPE, sort = T)
head(remaining, n=20)
##                     EVTYPE      n
## 1                TSTM WIND 219946
## 2       THUNDERSTORM WINDS  20850
## 3         MARINE TSTM WIND   6175
## 4     URBAN SML STREAM FLD   3392
## 5               HIGH WINDS   1534
## 6         WILD FOREST FIRE   1457
## 7             FROST FREEZE   1344
## 8       WINTER WEATHER MIX   1110
## 9           TSTM WIND HAIL   1028
## 10 EXTREME COLD WIND CHILL   1002
## 11          FLASH FLOODING    682
## 12        LAKE EFFECT SNOW    659
## 13            EXTREME COLD    657
## 14       FLOOD FLASH FLOOD    626
## 15                    SNOW    617
## 16               LANDSLIDE    600
## 17         COLD WIND CHILL    539
## 18                     FOG    538
## 19                    WIND    347
## 20            RIP CURRENTS    304
# assign some common events
storm <- storm %>% mutate(
  EVTYPE = as.character(EVTYPE),
  EVTYPE_clean = allowed_lookup[EVTYPE],
  EVTYPE_clean = case_when(
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"^(TSTM|THUNDER\\w*|TSTMW)")~"THUNDERSTORM WIND",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"MICROBURST|DOWNBURST|GUSTNADO")~"THUNDERSTORM WIND",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"^MARINE (TSTM|THUNDER)")~"MARINE THUNDERSTORM WIND",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"TORNADO|FUNNEL|LANDSPOUT|WALL CLOUD")~"TORNADO",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"WATER ?SPOUT|WAYTERSPOUT")~"WATERSPOUT",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"STORM SURGE|HIGH TIDE|COASTAL SURGE")~"STORM SURGE/TIDE",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"LAKE.*FLOOD")~"LAKESHORE FLOOD",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"RAPIDLY RISING WATER|FLASH")~"FLASH FLOOD",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"COAST\\w*\\s*FLOOD|BEACH EROS")~"COASTAL FLOOD",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"URBAN|SMALL STREAM|FLD|FLOOD")~"FLOOD",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"RAIN|TORRENTIAL|PRECIP")~"HEAVY RAIN",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"HAIL")~"HAIL",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"WINTER STORM")~"WINTER STORM",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"SNOW|WINT|GLAZE|HEAVY MIX")~"WINTER WEATHER",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"SLEET|MIXED PRECIP")~"SLEET",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"BLIZZARD")~"BLIZZARD",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"DENSE FOG|VOG")~"DENSE FOG",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"FOG")~"FREEZING FOG",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"SMOKE")~"DENSE SMOKE",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"SLIDE|SLUMP")~"DEBRIS FLOW",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"DUST STORM|SAHARAN DUST|BLOWING DUST")~"DUST STORM",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"DUST DEV")~"DUST DEVIL",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"FIRE")~"WILDFIRE",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"EXTREME COLD|BITTER|HYPOTHERM|LOW TEMP")~"EXTREME COLD/WIND CHILL",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"COLD|CHILL|COOL")~"COLD/WIND CHILL",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"EXCESSIVE HEAT|HEAT WAVE")~"EXCESSIVE HEAT",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"HEAT|HOT|WARM")~"HEAT",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"^HIGH WIND")~"HIGH WIND",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"GUSTY|STRONG WIND|GRADIENT WIND|WND")~"STRONG WIND",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"SURF|SEAS|SWELLS|WAVE")~"HIGH SURF",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"VOLCA")~"VOLCANIC ASH",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"TYPHOON")~"HURRICANE (TYPHOON)",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"TROPICAL STORM")~"TROPICAL STORM",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"REMNANTS")~"TROPICAL DEPRESSION",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"DRY|DROUGHT")~"DROUGHT",
    is.na(EVTYPE_clean) & str_detect(EVTYPE,"RIP")~"RIP CURRENT",
    is.na(EVTYPE_clean)~"OTHER",
    TRUE~EVTYPE_clean
    )
  ) %>% 
  mutate(EVTYPE_clean = factor(EVTYPE_clean))

# cleaning the PROPDMGEXP and CROPDMGEXP cols
storm <- storm %>% mutate(
  # make text upper case
  PROPDMGEXP = toupper(PROPDMGEXP),
  CROPDMGEXP = toupper(CROPDMGEXP),
  
  # multiply by the designated multiplier
  PROP_total = case_when(
    PROPDMGEXP == "H"~PROPDMG*1e2,
    PROPDMGEXP == "K"~PROPDMG*1e3,
    PROPDMGEXP == "M"~PROPDMG*1e6,
    PROPDMGEXP == "B"~PROPDMG*1e9,
    grepl("^[0-9]", PROPDMGEXP)~PROPDMG*10^(as.numeric(PROPDMGEXP)),
    TRUE~PROPDMG*1),
  
  CROP_total = case_when(
    CROPDMGEXP == "H"~CROPDMG*1e2,
    CROPDMGEXP == "K"~CROPDMG*1e3,
    CROPDMGEXP == "M"~CROPDMG*1e6,
    CROPDMGEXP == "B"~CROPDMG*1e9,
    grepl("^[0-9]", CROPDMGEXP)~CROPDMG*10^(as.numeric(CROPDMGEXP)),
    TRUE~CROPDMG*1),
)

Note: 1205 observations remained unclassified after cleaning and were designated as “OTHER”.

Results

Which types of events are most harmful to population health?

  • To investigate which types of storm events are most harmful to population health, we will look at both total injuries as a result of the event and total fatalities as a result of the event.
  • Data were filtered to retain the top 10 eventswith the highest total injuries and the top 10 events with the highest total fatalities.
  • Data are presented as a barchart of the total occurrences of either fatalities or injuries for each storm category. The bars are ordered by the highest total fatalities at the top.
health_impact <- storm %>%
  group_by(EVTYPE_clean) %>%
  summarise(total_injuries = sum(INJURIES, na.rm = TRUE),
            total_fatalities = sum(FATALITIES, na.rm = TRUE)) %>%
  filter(total_injuries>0 | total_fatalities>0) %>% 
  pivot_longer(-EVTYPE_clean, names_to = "type", values_to = "number") %>% 
  group_by(type) %>% 
  mutate(EVTYPE_reordered = reorder(EVTYPE_clean, number))

top_events <- health_impact %>% group_by(type) %>% 
  slice_max(number, n=10, with_ties = F) %>% 
  mutate(EVTYPE_clean = as.character(EVTYPE_clean)) %>% 
  pull(EVTYPE_clean)

health_impact %>% filter(EVTYPE_clean %in% top_events) %>% 
  ggplot(., aes(x = EVTYPE_reordered, y = number, group = type)) +
  geom_col() +
  facet_wrap(~type, scales = "free_x",
             labeller = labeller(
               type = c(total_fatalities = "Fatalities", 
                        total_injuries = "Injuries")
             ))+
  coord_flip() +
  labs(x = "Event Type", y = "Total Occurences")+
  theme_bw()

Which types of events have the greatest economic consequences?

  • To address this question, we looked at both property damage and crop damage caused by each event.
  • Data were filtered for the top 10 events with the highest property damage and the top 10 events with the highest crop damage.
  • Data are plotted as barchart ordered by top cost in crop damage. The x-axis is scaled in billions of U.S. dollars.
# calculate the total sum cost of damage to either crops or property
economic_impact <- storm %>%
  group_by(EVTYPE_clean) %>%
  summarise(total_prop_damage = sum(PROP_total, na.rm = TRUE),
            total_crop_damage = sum(CROP_total, na.rm = TRUE)) %>%
  filter(total_prop_damage>0 | total_crop_damage>0) %>% 
  pivot_longer(-EVTYPE_clean, names_to = "type", values_to = "number") %>% 
  group_by(type) %>% 
  mutate(EVTYPE_reordered = reorder(EVTYPE_clean, number))

# get the top 10 for each category
top_events <- economic_impact %>% group_by(type) %>% 
  slice_max(number, n=10, with_ties = F) %>% 
  mutate(EVTYPE_clean = as.character(EVTYPE_clean)) %>% 
  pull(EVTYPE_clean)

economic_impact %>% filter(EVTYPE_clean %in% top_events) %>% 
  ggplot(., aes(x = EVTYPE_reordered, y = number, group = type)) +
  geom_col() +
  facet_wrap(~type, scales = "free_x",
             labeller = labeller(
               type = c(total_crop_damage = "Crop Damage", 
                        total_prop_damage = "Property Damage")
             ))+
  coord_flip() +
  labs(x = "Event Type", y = "Total Cost (USD) in Billions (B)")+
  scale_y_continuous(labels = scales::label_number(scale = 1e-9, suffix="B"))+
  theme_bw()

Conclusion

The top events with highest number of injuries were tornadoes, thunderstorm wind, excessive heat, flood, and lightning. The top events with highest number of fatalities were tornado, excessive heat, heat, flash flood, and lightning.

The top events with highest impact on property damage cost were flood, hurricance, tornado, storm surge/tide, and flash flood. The top events with highest impact on crop damage cost were drought, flood, ice storm, other - which represents summation of several uncategorized events, and hail.

Flooding and tornadoes seem to have a significant impact to both human life and property.