The goal of this analysis is to explore the NOAA Storm Database to
address the following questions:
- What type of events are most harmful to population health across the
U.S.?
- What type of events have the greatest economic consequences across the
U.S.?
The data for this analysis were downloaded from Storm
Data and the bzip2 csv file loaded as a data frame. Relevant
documentation for the database are available:
National
Weather Service
National
Climatic Data Center Storm Events
knitr::opts_chunk$set(echo = TRUE, warning = F, message = F)
library(tidyverse)
if(!file.exists("./repdata_data_StormData.csv.bz2")){
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
destfile = "./repdata_data_StormData.csv.bz2",
method = "curl", mode = "wb")
}
storm <- read.csv("./repdata_data_StormData.csv.bz2", stringsAsFactors = T)
# inspect data
str(storm)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels ""," Christiansburg",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels ""," CANTON"," TULIA",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","%SD",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436781 levels "","\t","\t\t",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
For this analysis we are interested in the effects on population health and economic consequences of storm events. Therefore, we will subset the data for the following information: storm event types, fatalities, injuries, property damage, and crop damage.
Definition of variables used in this analysis:
EVTYPE: Factor. The original event type description as
recorded in the NOAA Storm Database.
INJURIES: Num. The number of non-fatal injuries caused by
the event. Includes physical injuries to people, not property.
FATALITIES: Num. The number of deaths directly caused by
the death.
PROPDMG: Num. The numeric value of property damage caused
by the event. Must be multipled by the PROPDMGEXP to obtain the U.S.
Dollar amount.
PROPDMGEXP: Factor. The exponent/magnitude code for the
property damage. Valid codes are: K: thousand, M: million, B: billion,
H: hundred, 0-9: 10^n. Everything else will have no multiplier.
CROPDMG: Num. The numeric value of crop damage caused by
the event. Must be multiplied by the CROPDMGEXP to obtain the U.S.
Dollar amount.
CROPDMGEXP: Factor. The exponent/magnitude code for cop
damage. Has the same valid codes as the PROPDMGEXP.
names(storm)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
storm <- storm %>% dplyr::select("EVTYPE", "INJURIES", "FATALITIES", "PROPDMG", "PROPDMGEXP",
"CROPDMG", "CROPDMGEXP")
The EVTYPE column contains the storm event types. There
are currently 985 present. These events should be grouped according to
the allowed events listed in the documentation.
The EVTYPE_clean column was created to contain the cleaned
storm events that match the allowed list and, in the case of no match,
the event was assigned “OTHER”.
Following classification of events to match the allowed events list,
the PROPDMGEXP and CROPDMGEXP columns were
formatted to contain numeric multipliers according to their valid codes
(see above). Two new columns were created, PROP_total and
CROP_total, which contain the total property damage or crop
damage in U.S. Dollars, respectively. These were derived from
multiplying PROPDMG by the numeric PROPDMGEXP
or CROPDMG by the numeric CROPEDMGEXP,
respectively.
# cleaning the EVTYPE col
length(levels(storm$EVTYPE))
## [1] 985
storm <- storm %>% mutate(
EVTYPE = toupper(EVTYPE) %>% trimws(), # make all uppercase and trim white space
EVTYPE = str_replace_all(EVTYPE, " {2,}", " "), # replace multiple space with single space
EVTYPE = str_replace_all(EVTYPE, "[[:punct:]]+", " ") # remove punctuation
) %>%
filter(!str_detect(pattern = "^SUMMARY", EVTYPE)) %>% # remove summary rows
mutate(EVTYPE = factor(EVTYPE))
length(levels(storm$EVTYPE)) # reduces factors to 792
## [1] 792
allowed_events <- toupper(c("Astronomical Low Tide", "Avalanche", "Blizzard",
"Coastal Flood", "Cold/Wind Chill", "Debris Flow",
"Dense Fog", "Dense Smoke", "Drought", "Dust Devil",
"Dust Storm", "Excessive Heat",
"Extreme Cold/Wind Chill", "Flash Flood", "Flood",
"Frost/Freeze", "Funnel Cloud", "Freezing Fog",
"Hail", "Heat", "Heavy Rain", "Heavy Snow",
"High Surf", "High Wind", "Hurricane (Typhoon)",
"Ice Storm", "Lake-Effect Snow", "Lakeshore Flood",
"Lightning", "Marine Hail", "Marine High Wind",
"Marine Strong Wind", "Marine Thunderstorm Wind",
"Rip Current", "Seiche", "Sleet", "Storm Surge/Tide",
"Strong Wind", "Thunderstorm Wind", "Tornado",
"Tropical Depression", "Tropical Storm", "Tsunami",
"Volcanic Ash", "Waterspout", "Wildfire",
"Winter Storm", "Winter Weather")) %>%
.[order(nchar(.), decreasing = T)]
allowed_events2 <- allowed_events %>% str_replace_all("[[:punct:]]+", " ")
allowed_lookup <- setNames(allowed_events, allowed_events2)
# assign events which match the approved list exactly
storm <- storm %>% mutate(EVTYPE_clean = if_else(EVTYPE %in% allowed_events, EVTYPE, NA_character_))
# find the most common remaining events
remaining <- storm %>% filter(is.na(EVTYPE_clean)) %>%
count(EVTYPE, sort = T)
head(remaining, n=20)
## EVTYPE n
## 1 TSTM WIND 219946
## 2 THUNDERSTORM WINDS 20850
## 3 MARINE TSTM WIND 6175
## 4 URBAN SML STREAM FLD 3392
## 5 HIGH WINDS 1534
## 6 WILD FOREST FIRE 1457
## 7 FROST FREEZE 1344
## 8 WINTER WEATHER MIX 1110
## 9 TSTM WIND HAIL 1028
## 10 EXTREME COLD WIND CHILL 1002
## 11 FLASH FLOODING 682
## 12 LAKE EFFECT SNOW 659
## 13 EXTREME COLD 657
## 14 FLOOD FLASH FLOOD 626
## 15 SNOW 617
## 16 LANDSLIDE 600
## 17 COLD WIND CHILL 539
## 18 FOG 538
## 19 WIND 347
## 20 RIP CURRENTS 304
# assign some common events
storm <- storm %>% mutate(
EVTYPE = as.character(EVTYPE),
EVTYPE_clean = allowed_lookup[EVTYPE],
EVTYPE_clean = case_when(
is.na(EVTYPE_clean) & str_detect(EVTYPE,"^(TSTM|THUNDER\\w*|TSTMW)")~"THUNDERSTORM WIND",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"MICROBURST|DOWNBURST|GUSTNADO")~"THUNDERSTORM WIND",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"^MARINE (TSTM|THUNDER)")~"MARINE THUNDERSTORM WIND",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"TORNADO|FUNNEL|LANDSPOUT|WALL CLOUD")~"TORNADO",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"WATER ?SPOUT|WAYTERSPOUT")~"WATERSPOUT",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"STORM SURGE|HIGH TIDE|COASTAL SURGE")~"STORM SURGE/TIDE",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"LAKE.*FLOOD")~"LAKESHORE FLOOD",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"RAPIDLY RISING WATER|FLASH")~"FLASH FLOOD",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"COAST\\w*\\s*FLOOD|BEACH EROS")~"COASTAL FLOOD",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"URBAN|SMALL STREAM|FLD|FLOOD")~"FLOOD",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"RAIN|TORRENTIAL|PRECIP")~"HEAVY RAIN",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"HAIL")~"HAIL",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"WINTER STORM")~"WINTER STORM",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"SNOW|WINT|GLAZE|HEAVY MIX")~"WINTER WEATHER",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"SLEET|MIXED PRECIP")~"SLEET",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"BLIZZARD")~"BLIZZARD",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"DENSE FOG|VOG")~"DENSE FOG",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"FOG")~"FREEZING FOG",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"SMOKE")~"DENSE SMOKE",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"SLIDE|SLUMP")~"DEBRIS FLOW",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"DUST STORM|SAHARAN DUST|BLOWING DUST")~"DUST STORM",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"DUST DEV")~"DUST DEVIL",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"FIRE")~"WILDFIRE",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"EXTREME COLD|BITTER|HYPOTHERM|LOW TEMP")~"EXTREME COLD/WIND CHILL",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"COLD|CHILL|COOL")~"COLD/WIND CHILL",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"EXCESSIVE HEAT|HEAT WAVE")~"EXCESSIVE HEAT",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"HEAT|HOT|WARM")~"HEAT",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"^HIGH WIND")~"HIGH WIND",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"GUSTY|STRONG WIND|GRADIENT WIND|WND")~"STRONG WIND",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"SURF|SEAS|SWELLS|WAVE")~"HIGH SURF",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"VOLCA")~"VOLCANIC ASH",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"TYPHOON")~"HURRICANE (TYPHOON)",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"TROPICAL STORM")~"TROPICAL STORM",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"REMNANTS")~"TROPICAL DEPRESSION",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"DRY|DROUGHT")~"DROUGHT",
is.na(EVTYPE_clean) & str_detect(EVTYPE,"RIP")~"RIP CURRENT",
is.na(EVTYPE_clean)~"OTHER",
TRUE~EVTYPE_clean
)
) %>%
mutate(EVTYPE_clean = factor(EVTYPE_clean))
# cleaning the PROPDMGEXP and CROPDMGEXP cols
storm <- storm %>% mutate(
# make text upper case
PROPDMGEXP = toupper(PROPDMGEXP),
CROPDMGEXP = toupper(CROPDMGEXP),
# multiply by the designated multiplier
PROP_total = case_when(
PROPDMGEXP == "H"~PROPDMG*1e2,
PROPDMGEXP == "K"~PROPDMG*1e3,
PROPDMGEXP == "M"~PROPDMG*1e6,
PROPDMGEXP == "B"~PROPDMG*1e9,
grepl("^[0-9]", PROPDMGEXP)~PROPDMG*10^(as.numeric(PROPDMGEXP)),
TRUE~PROPDMG*1),
CROP_total = case_when(
CROPDMGEXP == "H"~CROPDMG*1e2,
CROPDMGEXP == "K"~CROPDMG*1e3,
CROPDMGEXP == "M"~CROPDMG*1e6,
CROPDMGEXP == "B"~CROPDMG*1e9,
grepl("^[0-9]", CROPDMGEXP)~CROPDMG*10^(as.numeric(CROPDMGEXP)),
TRUE~CROPDMG*1),
)
Note: 1205 observations remained unclassified after cleaning and were designated as “OTHER”.
health_impact <- storm %>%
group_by(EVTYPE_clean) %>%
summarise(total_injuries = sum(INJURIES, na.rm = TRUE),
total_fatalities = sum(FATALITIES, na.rm = TRUE)) %>%
filter(total_injuries>0 | total_fatalities>0) %>%
pivot_longer(-EVTYPE_clean, names_to = "type", values_to = "number") %>%
group_by(type) %>%
mutate(EVTYPE_reordered = reorder(EVTYPE_clean, number))
top_events <- health_impact %>% group_by(type) %>%
slice_max(number, n=10, with_ties = F) %>%
mutate(EVTYPE_clean = as.character(EVTYPE_clean)) %>%
pull(EVTYPE_clean)
health_impact %>% filter(EVTYPE_clean %in% top_events) %>%
ggplot(., aes(x = EVTYPE_reordered, y = number, group = type)) +
geom_col() +
facet_wrap(~type, scales = "free_x",
labeller = labeller(
type = c(total_fatalities = "Fatalities",
total_injuries = "Injuries")
))+
coord_flip() +
labs(x = "Event Type", y = "Total Occurences")+
theme_bw()
# calculate the total sum cost of damage to either crops or property
economic_impact <- storm %>%
group_by(EVTYPE_clean) %>%
summarise(total_prop_damage = sum(PROP_total, na.rm = TRUE),
total_crop_damage = sum(CROP_total, na.rm = TRUE)) %>%
filter(total_prop_damage>0 | total_crop_damage>0) %>%
pivot_longer(-EVTYPE_clean, names_to = "type", values_to = "number") %>%
group_by(type) %>%
mutate(EVTYPE_reordered = reorder(EVTYPE_clean, number))
# get the top 10 for each category
top_events <- economic_impact %>% group_by(type) %>%
slice_max(number, n=10, with_ties = F) %>%
mutate(EVTYPE_clean = as.character(EVTYPE_clean)) %>%
pull(EVTYPE_clean)
economic_impact %>% filter(EVTYPE_clean %in% top_events) %>%
ggplot(., aes(x = EVTYPE_reordered, y = number, group = type)) +
geom_col() +
facet_wrap(~type, scales = "free_x",
labeller = labeller(
type = c(total_crop_damage = "Crop Damage",
total_prop_damage = "Property Damage")
))+
coord_flip() +
labs(x = "Event Type", y = "Total Cost (USD) in Billions (B)")+
scale_y_continuous(labels = scales::label_number(scale = 1e-9, suffix="B"))+
theme_bw()
The top events with highest number of injuries were tornadoes, thunderstorm wind, excessive heat, flood, and lightning. The top events with highest number of fatalities were tornado, excessive heat, heat, flash flood, and lightning.
The top events with highest impact on property damage cost were flood, hurricance, tornado, storm surge/tide, and flash flood. The top events with highest impact on crop damage cost were drought, flood, ice storm, other - which represents summation of several uncategorized events, and hail.
Flooding and tornadoes seem to have a significant impact to both human life and property.