require(knitr)
## Loading required package: knitr
opts_chunk$set(echo=TRUE)
Identification of Storm Event Types with the Most Devastating Public Health and Economic Consequences in the United States
This analysis processed storm data downloaded from the United States (US) National Oceanic and Atmospheric Administration’s (NOAA) database, evaluated public healthy and economical impact for each event type. The first part of the analysis involved vigorous data pre-processing, removing records that are irrelvant for the analysis and cleaning event types such that they all belonged to the 48 event types as specified in the storm data documentation. The second part of the analysis consists of 2 parts: 1) summing up fatalities and injuries for each event type and identifying the top 5 event types with the most impact to public health, and 2) summing up crop and property damages for each event type and identifying the top 5 event types with the most impact in terms of economic consequences. The analysis results have indicated that the most fatalities were caused by excessive heat and the most injuries were caused by tornados. Economy-wise, flood caused the most property damages while drought resulted in the most crop damages.
Download file, unzip and read data
link <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(link, "data.bz2", method="curl")
raw <- read.csv("data.bz2", stringsAsFactors = FALSE)
str(raw)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
# Filter out records prior to 1996
# Filter out records with 0 impact for economy and population health
# - Check to see if there are any NA entries in these four columns:
sum(is.na(raw$FATALITIES)) # no NA entries
## [1] 0
sum(is.na(raw$INJURIES)) # no NA entries
## [1] 0
sum(is.na(raw$PROPDMG)) # no NA entries
## [1] 0
sum(is.na(raw$CROPDMG)) # no NA entries
## [1] 0
a <- raw %>%
filter(year(mdy_hms(BGN_DATE)) > 1995) %>%
filter(FATALITIES+INJURIES+PROPDMG+CROPDMG > 0)
# Clean up EVTYPE variable - Officially there are only 48 types
# Step #1: Change all text to lower case letters and remove leading + trailing
# spaces
a$EVTYPE <- trimws(tolower(a$EVTYPE))
# Arrange records in alphabetical order of EVTYPE
a <- arrange(a, EVTYPE)
# Preliminary inspection of the unique event types
unique(a$EVTYPE) # 183 unique event types to clean up
## [1] "agricultural freeze" "astronomical high tide"
## [3] "astronomical low tide" "avalanche"
## [5] "beach erosion" "black ice"
## [7] "blizzard" "blowing dust"
## [9] "blowing snow" "brush fire"
## [11] "coastal flooding/erosion" "coastal erosion"
## [13] "coastal flood" "coastal flooding"
## [15] "coastal flooding/erosion" "coastal storm"
## [17] "coastalstorm" "cold"
## [19] "cold and snow" "cold temperature"
## [21] "cold weather" "cold/wind chill"
## [23] "dam break" "damaging freeze"
## [25] "dense fog" "dense smoke"
## [27] "downburst" "drought"
## [29] "drowning" "dry microburst"
## [31] "dust devil" "dust storm"
## [33] "early frost" "erosion/cstl flood"
## [35] "excessive heat" "excessive snow"
## [37] "extended cold" "extreme cold"
## [39] "extreme cold/wind chill" "extreme windchill"
## [41] "falling snow/ice" "flash flood"
## [43] "flash flood/flood" "flood"
## [45] "flood/flash/flood" "fog"
## [47] "freeze" "freezing drizzle"
## [49] "freezing fog" "freezing rain"
## [51] "freezing spray" "frost"
## [53] "frost/freeze" "funnel cloud"
## [55] "glaze" "gradient wind"
## [57] "gusty wind" "gusty wind/hail"
## [59] "gusty wind/hvy rain" "gusty wind/rain"
## [61] "gusty winds" "hail"
## [63] "hard freeze" "hazardous surf"
## [65] "heat" "heat wave"
## [67] "heavy rain" "heavy rain/high surf"
## [69] "heavy seas" "heavy snow"
## [71] "heavy snow shower" "heavy surf"
## [73] "heavy surf and wind" "heavy surf/high surf"
## [75] "high seas" "high surf"
## [77] "high surf advisory" "high swells"
## [79] "high water" "high wind"
## [81] "high wind (g40)" "high winds"
## [83] "hurricane" "hurricane edouard"
## [85] "hurricane/typhoon" "hyperthermia/exposure"
## [87] "hypothermia/exposure" "ice jam flood (minor"
## [89] "ice on road" "ice roads"
## [91] "ice storm" "icy roads"
## [93] "lake effect snow" "lake-effect snow"
## [95] "lakeshore flood" "landslide"
## [97] "landslides" "landslump"
## [99] "landspout" "late season snow"
## [101] "light freezing rain" "light snow"
## [103] "light snowfall" "lightning"
## [105] "marine accident" "marine hail"
## [107] "marine high wind" "marine strong wind"
## [109] "marine thunderstorm wind" "marine tstm wind"
## [111] "microburst" "mixed precip"
## [113] "mixed precipitation" "mud slide"
## [115] "mudslide" "mudslides"
## [117] "non tstm wind" "non-severe wind damage"
## [119] "non-tstm wind" "other"
## [121] "rain" "rain/snow"
## [123] "record heat" "rip current"
## [125] "rip currents" "river flood"
## [127] "river flooding" "rock slide"
## [129] "rogue wave" "rough seas"
## [131] "rough surf" "seiche"
## [133] "small hail" "snow"
## [135] "snow and ice" "snow squall"
## [137] "snow squalls" "storm surge"
## [139] "storm surge/tide" "strong wind"
## [141] "strong winds" "thunderstorm"
## [143] "thunderstorm wind" "thunderstorm wind (g40)"
## [145] "tidal flooding" "tornado"
## [147] "torrential rainfall" "tropical depression"
## [149] "tropical storm" "tstm wind"
## [151] "tstm wind (g45)" "tstm wind (41)"
## [153] "tstm wind (g35)" "tstm wind (g40)"
## [155] "tstm wind (g45)" "tstm wind 40"
## [157] "tstm wind 45" "tstm wind and lightning"
## [159] "tstm wind g45" "tstm wind/hail"
## [161] "tsunami" "typhoon"
## [163] "unseasonable cold" "unseasonably cold"
## [165] "unseasonably warm" "unseasonal rain"
## [167] "urban/sml stream fld" "volcanic ash"
## [169] "warm weather" "waterspout"
## [171] "wet microburst" "whirlwind"
## [173] "wild/forest fire" "wildfire"
## [175] "wind" "wind and wave"
## [177] "wind damage" "winds"
## [179] "winter storm" "winter weather"
## [181] "winter weather mix" "winter weather/mix"
## [183] "wintry mix"
# Step #2: Obvious substitutions to reduce number of unique event types
# Replace all occurrences of "tstm" with "thunderstorm"
a$EVTYPE <- gsub("tstm", "thunderstorm", a$EVTYPE)
table(a$EVTYPE[grep("tstm",a$EVTYPE)]) # 0 matches after replacement
## < table of extent 0 >
length(unique(a$EVTYPE)) # 180 unique event types
## [1] 180
# Replace all occurrences of " " with " "
a$EVTYPE <- gsub(" ", " ", a$EVTYPE)
length(unique(a$EVTYPE)) # 178 unique event types
## [1] 178
# Replace all occurrences of "-" with " "
a$EVTYPE <- gsub("-", " ", a$EVTYPE)
length(unique(a$EVTYPE)) # 176 unique event types
## [1] 176
# Replace all occurrences of "landslides" with "landslide"
a$EVTYPE <- gsub("landslides", "landslide", a$EVTYPE)
length(unique(a$EVTYPE)) # 175 unique event types
## [1] 175
# Replace all occurrences of "gusty " with "" (i.e.: "gusty wind" -> "wind")
a$EVTYPE <- gsub("gusty ", "", a$EVTYPE)
length(unique(a$EVTYPE)) # 173 unique event types
## [1] 173
# Replace all occurrences of "winds" with "wind"
a$EVTYPE <- gsub("winds", "wind", a$EVTYPE)
length(unique(a$EVTYPE)) # 170 unique event types
## [1] 170
# Replace all occurrences of "coastalstorm" with "coastal storm"
a$EVTYPE <- gsub("coastalstorm", "coastal storm", a$EVTYPE)
length(unique(a$EVTYPE)) # 169 unique event types
## [1] 169
# Step #3: Manual replacements to reduce number of unique event types
# Note: A lot of times REMARKS column has to be manually reviewed for
# event types (for the ones that do not obviously fall into 1 of the 48
# categories).
# Inspect REMARKS for "thunderstorm wind and lightning" events
a[a$EVTYPE == "thunderstorm wind and lightning",]
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME
## 196796 46 10/8/1997 0:00:00 06:25:00 PM CST 57 HAMLIN
## STATE EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI
## 196796 SD thunderstorm wind and lightning 3 W CASTLEWOOD
## END_DATE END_TIME COUNTY_END COUNTYENDN END_RANGE
## 196796 10/8/1997 0:00:00 06:30:00 PM 0 NA 3
## END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 196796 W CASTLEWOOD 0 0 NA 65 0 0 80
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC
## 196796 K 0 ABR SOUTH DAKOTA, Central and North
## ZONENAMES LATITUDE LONGITUDE LATITUDE_E LONGITUDE_
## 196796 0 0 0 0
## REMARKS
## 196796 A ration mixing wagon, a front end loading driver, and a shelter were damaged by the strong winds. \nLightning struck and burned some hay bales.\n\n
## REFNUM
## 196796 302755
# There is only one event and major cause of damage is from the wind
# Hence this will be classified as "thunderstorm winds"
# In addition, classify all "thunderstorm wind..." events as "thunderstorm wind"
a$EVTYPE[grep("^thunderstorm wind",a$EVTYPE)] <- "thunderstorm wind"
length(unique(a$EVTYPE)) # 160 unique event types
## [1] 160
# Inspect REMARKS of "mud slide", "mudslide", "mudslides" events to identify
# whether root cause is rainfall or volcanic activities. If they are caused
# by rainfall, document them as "flash flood". Otherwise, document them as
# "debris flow"
# Classify them all under "mudslide" for the time being
a$EVTYPE[grep("mud",a$EVTYPE)] <- "mudslide"
# Search for "rain" inside REMARKS - If there is "rain" mentioned, classify
# the event as "flash flood"
a$EVTYPE[(a$EVTYPE == "mudslide")&(grepl("rain",a$REMARKS))] <- "flash flood"
# Otherwise, classify the remaining as "debris flow"
a$EVTYPE[grep("mudslide",a$EVTYPE)] <- "debris flow"
length(unique(a$EVTYPE)) # 158 unique event types
## [1] 158
# EVTYPE containing "hurricane" to be grouped under "hurricane (tyhpoon)"
a$EVTYPE[grep("hurricane",a$EVTYPE)] <- "hurricane (typhoon)"
# EVTYPE starting with "typhoon" to be grouped under "hurricane (tyhpoon)"
a$EVTYPE[grep("^typhoon",a$EVTYPE)] <- "hurricane (typhoon)"
length(unique(a$EVTYPE)) # 155 unique event types
## [1] 155
# "winter weather mix" -> "winter weather"
a$EVTYPE[grep("winter weather mix",a$EVTYPE)] <- "winter weather"
# "wintry mix" events -> "winter weather"
a$EVTYPE[grep("wintry mix",a$EVTYPE)] <- "winter weather"
# "winter weather/mix" -> "winter weather"
a$EVTYPE[grep("winter weather/mix",a$EVTYPE)] <- "winter weather"
length(unique(a$EVTYPE)) # 152 unique event types
## [1] 152
# "wild/forest fire" -> "wildfire"
a$EVTYPE[grep("wild/forest fire",a$EVTYPE)] <- "wildfire"
length(unique(a$EVTYPE)) # 151 unique event types
## [1] 151
# "wind damage" -> "strong wind"
a$EVTYPE[grep("^wind damage",a$EVTYPE)] <- "strong wind"
length(unique(a$EVTYPE)) # 150 unique event types
## [1] 150
# "astronomical high tide" -> "coastal flood" or "storm surge/tide"
a$EVTYPE[(a$EVTYPE == "astronomical high tide")&(grepl("flood",a$REMARKS))] <-
"coastal flood"
# Otherwise, classify the remaining as "storm surge/tide"
a$EVTYPE[grep("^astronomical high tide",a$EVTYPE)] <- "storm surge/tide"
length(unique(a$EVTYPE)) # 149 unique event types
## [1] 149
# "coastal flooding" -> "coastal flood"
# "coastal flooding/erosion" -> "coastal flood"
# "coastal erosion" -> "coastal flood"
# "erosion/cstl flood" -> "coastal flood"
a$EVTYPE[grep("^coastal flood",a$EVTYPE)] <- "coastal flood"
a$EVTYPE[grep("^coastal erosion",a$EVTYPE)] <- "coastal flood"
a$EVTYPE[grep("^erosion/cstl flood",a$EVTYPE)] <- "coastal flood"
length(unique(a$EVTYPE)) # 145 unique event types
## [1] 145
# "beach erosion" -> "high surf"
a$EVTYPE[grep("^beach erosion",a$EVTYPE)] <- "high surf"
length(unique(a$EVTYPE)) # 144 unique event types
## [1] 144
# "black ice" -> "winter weather"
a$EVTYPE[grep("^black ice",a$EVTYPE)] <- "winter weather"
length(unique(a$EVTYPE)) # 143 unique event types
## [1] 143
# "blowing dust" -> "dust storm"
a$EVTYPE[grep("^blowing dust",a$EVTYPE)] <- "dust storm"
length(unique(a$EVTYPE)) # 142 unique event types
## [1] 142
# "blowing snow" -> "dust storm"
a$EVTYPE[grep("^blowing snow",a$EVTYPE)] <- "blizzard"
length(unique(a$EVTYPE)) # 141 unique event types
## [1] 141
# "brush fire" -> "wildfire"
a$EVTYPE[grep("^brush fire",a$EVTYPE)] <- "wildfire"
length(unique(a$EVTYPE)) # 140 unique event types
## [1] 140
# "agricultural freeze" -> "frost/freeze"
# "damaging freeze" -> "frost/freeze"
# "hard freeze" -> "frost/freeze"
# "freeze" -> "frost/freeze"
a$EVTYPE[grep("^agricultural freeze",a$EVTYPE)] <- "frost/freeze"
a$EVTYPE[grep("^damaging freeze",a$EVTYPE)] <- "frost/freeze"
a$EVTYPE[grep("^hard freeze",a$EVTYPE)] <- "frost/freeze"
a$EVTYPE[grep("^freeze",a$EVTYPE)] <- "frost/freeze"
length(unique(a$EVTYPE)) # 136 unique event types
## [1] 136
# "downburst", "microburst", "dry microburst", "wet microburst"
# -> "thunderstorm wind"
a$EVTYPE[grep("burst",a$EVTYPE)] <- "thunderstorm wind"
length(unique(a$EVTYPE)) # 132 unique event types
## [1] 132
# "extreme cold" -> "extreme cold/wind chill"
# "extreme windchill" -> "extreme cold/wind chill"
a$EVTYPE[grep("extreme",a$EVTYPE)] <- "extreme cold/wind chill"
length(unique(a$EVTYPE)) # 130 unique event types
## [1] 130
# "rip currents" -> "rip current"
a$EVTYPE[grep("rip",a$EVTYPE)] <- "rip current"
length(unique(a$EVTYPE)) # 129 unique event types
## [1] 129
# "river flood", "river flooding" -> "flood"
a$EVTYPE[grep("river",a$EVTYPE)] <- "flood"
length(unique(a$EVTYPE)) # 127 unique event types
## [1] 127
# "landslide" -> "debris flow"
# "landslump" -> "debris flow"
# "rock slide"
a$EVTYPE[grep("^landslide",a$EVTYPE)] <- "debris flow"
a$EVTYPE[grep("^landslump",a$EVTYPE)] <- "debris flow"
a$EVTYPE[grep("^rock slide",a$EVTYPE)] <- "debris flow"
length(unique(a$EVTYPE)) # 124 unique event types
## [1] 124
# "hypothermia/exposure" -> "cold/wind chill"
# Inspect REMARKS of these entries and relevant temperatures were not in the
# extreme cold/wind chill category
a$EVTYPE[grep("hypothermia",a$EVTYPE)] <- "cold/wind chill"
# "hyperthermia/exposure" -> "cold/wind chill"
# Inspect REMARKS of the only entry under this category and relevant
# temperatures, surprisingly, were on the low side (not "heat")
a$EVTYPE[grep("hyperthermia",a$EVTYPE)] <- "cold/wind chill"
length(unique(a$EVTYPE)) # 122 unique even types
## [1] 122
# "non thunderstorm wind" -> "strong wind"
# Inspect REMARKS of these entries and no exceptionally high wind speeds
a$EVTYPE[grep("non thunderstorm wind",a$EVTYPE)] <- "strong wind"
# "high wind (g40)" -> "high wind"
a$EVTYPE[grep("^high wind",a$EVTYPE)] <- "high wind"
# "ice jam flood (minor" -> "flash flood"
# This is based on event types documentation
a$EVTYPE[grep("^ice jam flood",a$EVTYPE)] <- "flash flood"
# "high swells"
# If REMARKS contains the word "current" -> "rip current"
# Otherwise -> "high surf"
a$EVTYPE[(a$EVTYPE == "high swells")&(grepl("current",a$REMARKS))] <-
"rip current"
a$EVTYPE[(a$EVTYPE == "high swells")] <- "high surf"
# "rogue wave" -> "high surf"
# "rough seas" -> "high surf"
# "rough surf" -> "high surf"
a$EVTYPE[(a$EVTYPE == "rogue wave")] <- "high surf"
a$EVTYPE[(a$EVTYPE == "rough seas")] <- "high surf"
a$EVTYPE[(a$EVTYPE == "rough surf")] <- "high surf"
length(unique(a$EVTYPE)) # 115 unique event types
## [1] 115
# "drowning" -> "heavy rain"
# Inspect REMARKS of the single event and the cause is rapid water level rise
# due to heavy rainfall
a$EVTYPE[(a$EVTYPE == "drowning")] <- "heavy rain"
# "whirlwind" -> "dust devil"
# Inspect REMARKS of the events and they are all related to "dust devils"
a$EVTYPE[(a$EVTYPE == "whirlwind")] <- "dust devil"
# "small hail" -> "hail"
a$EVTYPE[(a$EVTYPE == "small hail")] <- "hail"
length(unique(a$EVTYPE)) # 112 unique event types
## [1] 112
# "mixed precip"
# -> "winter weather" if REMARKS does not contain "storm",
# -> "winter storm" otherwise
# "mixed precipitation"
# -> "winter weather" if REMARKS does not contain "storm",
# -> "winter storm" otherwise
a$EVTYPE[(a$EVTYPE == "mixed precip")&(grepl("storm",a$REMARKS))] <-
"winter storm"
a$EVTYPE[(a$EVTYPE == "mixed precipitation")&(grepl("storm",a$REMARKS))] <-
"winter storm"
a$EVTYPE[grep("^mixed precip",a$EVTYPE)] <- "winter weather"
length(unique(a$EVTYPE)) # 110 unique event types
## [1] 110
# "dam break" -> "flash flood"
a$EVTYPE[grep("^dam break",a$EVTYPE)] <- "flash flood"
# "flash flood/flood"
# One event found
# Inspect REMARKs - If contains "flash" -> "flash flood", otherwise -> "flood"
a$EVTYPE[(a$EVTYPE == "flash flood/flood")&(grepl("flash",a$REMARKS))] <-
"flash flood"
a$EVTYPE[(a$EVTYPE == "flash flood/flood")&(!(grepl("flash",a$REMARKS)))] <-
"flood"
# "flood/flash/flood"
# One event found
# Inspect REMARKs - If contains "flash" -> "flash flood", otherwise -> "flood"
a$EVTYPE[(a$EVTYPE == "flood/flash/flood")&(grepl("flash",a$REMARKS))] <-
"flash flood"
a$EVTYPE[(a$EVTYPE == "flood/flash/flood")&(!(grepl("flash",a$REMARKS)))] <-
"flood"
length(unique(a$EVTYPE)) # 107 unique event types
## [1] 107
# "fog" -> "dense fog"
a$EVTYPE[grep("^fog",a$EVTYPE)] <- "dense fog"
# "glaze" -> "winter weather"
a$EVTYPE[grep("^glaze",a$EVTYPE)] <- "winter weather"
# "freezing drizzle" -> "winter weather"
# "freezing rain" -> "winter weather"
# "freezing spray" -> "winter weather"
# "light freezing rain" -> "winter weather"
a$EVTYPE[grep("^freezing drizzle",a$EVTYPE)] <- "winter weather"
a$EVTYPE[grep("^freezing rain",a$EVTYPE)] <- "winter weather"
a$EVTYPE[grep("^freezing spray",a$EVTYPE)] <- "winter weather"
a$EVTYPE[grep("^light freezing rain",a$EVTYPE)] <- "winter weather"
# "frost" -> "frost/freeze"
# "ice on road" -> "frost/freeze"
# "ice roads" -> "frost/freeze"
# "icy roads" -> "frost/freeze"
# "early frost" -> "frost/freeze"
a$EVTYPE[grep("^frost",a$EVTYPE)] <- "frost/freeze"
a$EVTYPE[grep("^ice on road",a$EVTYPE)] <- "frost/freeze"
a$EVTYPE[grep("^ice roads",a$EVTYPE)] <- "frost/freeze"
a$EVTYPE[grep("^icy roads",a$EVTYPE)] <- "frost/freeze"
a$EVTYPE[grep("^early frost",a$EVTYPE)] <- "frost/freeze"
# "snow squall", "snow squalls" -> "heavy snow"
a$EVTYPE[grep("^snow squall",a$EVTYPE)] <- "heavy snow"
length(unique(a$EVTYPE)) # 94 unique event types
## [1] 94
# cold/wind chill -> -18F temp 15F windchill
# extreme cold/wind chill -> -35F temp + windchill
# blizzard -> > 3 hours
# heavy snow -> 12 hours with 4,6,8 inches, 24 hours with 6,8,10 inches
# winter storm -> more than 1 preciptation event
# winter weather -> one or more winter precipitation event
# "cold" -> "cold/wind chill"
a$EVTYPE[a$EVTYPE=="cold"] <- "cold/wind chill"
# "cold and snow"
# Only 1 record and fatalities are due to cold temperatures
a$EVTYPE[a$EVTYPE=="cold and snow"] <- "cold/wind chill"
# "cold temperature"
# 2 records found and fatalities are due to cold temperatures
a$EVTYPE[a$EVTYPE=="cold temperature"] <- "cold/wind chill"
# "cold weather"
# Only 1 record and fatalities are due to cold temperatures and wind chills
a$EVTYPE[a$EVTYPE=="cold weather"] <- "cold/wind chill"
# "extended cold"
# Only 1 record and damages are due to prolonged period of cold temperatures
a$EVTYPE[a$EVTYPE=="extended cold"] <- "cold/wind chill"
# "excessive snow" -> "heavy snow"
a$EVTYPE[a$EVTYPE=="excessive snow"] <- "heavy snow"
# "falling snow/ice"
# 2 records found and damages are due to accumulated ice/snow falling off
# roofs with no further info on the severity of the snowfall
a$EVTYPE[a$EVTYPE=="falling snow/ice"] <- "winter weather"
# "snow" -> "winter weather"
a$EVTYPE[a$EVTYPE=="snow"] <- "winter weather"
# "snow and ice" -> "winter storm"
a$EVTYPE[a$EVTYPE=="snow and ice"] <- "winter storm"
# "heavy snow shower" -> "heavy snow"
a$EVTYPE[a$EVTYPE=="heavy snow shower"] <- "heavy snow"
# "late season snow"
# Only 1 event in this category -> "heavy snow"
a$EVTYPE[a$EVTYPE=="late season snow"] <- "heavy snow"
# "light snow" -> "winter weather"
a$EVTYPE[a$EVTYPE=="light snow"] <- "winter weather"
# "light snowfall" -> "winter weather"
a$EVTYPE[a$EVTYPE=="light snowfall"] <- "winter weather"
length(unique(a$EVTYPE)) # 81 unique event types
## [1] 81
# heat -> heat index < 100
# excessive heat -> heat index > 100
# "heat wave"
# 1 event found with heat index > 100
a$EVTYPE[a$EVTYPE=="heat wave"] <- "excessive heat"
# "record heat"
a$EVTYPE[a$EVTYPE=="record heat"] <- "excessive heat"
# "unseasonably warm"
a$EVTYPE[a$EVTYPE=="unseasonably warm"] <- "heat"
# "warm weather"
a$EVTYPE[a$EVTYPE=="warm weather"] <- "heat"
length(unique(a$EVTYPE)) # 77 unique event types
## [1] 77
# "high seas" -> "high surf"
a$EVTYPE[a$EVTYPE=="high seas"] <- "high surf"
# "high surf advisory" -> "high surf"
a$EVTYPE[a$EVTYPE=="high surf advisory"] <- "high surf"
# "heavy surf" -> "high surf"
a$EVTYPE[a$EVTYPE=="heavy surf"] <- "high surf"
# "heavy seas" -> "high surf"
a$EVTYPE[a$EVTYPE=="heavy seas"] <- "high surf"
# "heavy surf/high surf" -> "high surf"
a$EVTYPE[a$EVTYPE=="heavy surf/high surf"] <- "high surf"
# "hazardous surf" -> "high surf"
a$EVTYPE[a$EVTYPE=="hazardous surf"] <- "high surf"
# "marine accident" -> "high surf"
# Only 1 event found and it's due to high surf
a$EVTYPE[a$EVTYPE=="marine accident"] <- "high surf"
# "non severe wind damage" -> "strong wind"
# Only 1 event found and it's due to strong wind (note: wind speeds 20-30mph)
a$EVTYPE[a$EVTYPE=="non severe wind damage"] <- "strong wind"
# "high water" -> "flash flood"
a$EVTYPE[a$EVTYPE=="high water"] <- "flash flood"
length(unique(a$EVTYPE)) # 68 unique event types
## [1] 68
# "gradient wind" -> "strong wind"
# Inspect REMARKS column of 6 events under this category - No super high wind
# speeds
a$EVTYPE[a$EVTYPE=="gradient wind"] <- "strong wind"
# "wind/hail"
# 1 event due to thunderstorm activity, damages from high wind
a$EVTYPE[a$EVTYPE=="wind/hail"] <- "high wind"
# "wind/hvy rain"
# 1 event with damages from strong wind
a$EVTYPE[a$EVTYPE=="wind/hvy rain"] <- "strong wind"
# "wind/rain"
# 1 event with damages from thunderstorm wind
a$EVTYPE[a$EVTYPE=="wind/rain"] <- "thunderstorm wind"
# "heavy rain/high surf"
# 1 event -> "heavy rain"
a$EVTYPE[a$EVTYPE=="heavy rain/high surf"] <- "heavy rain"
# "heavy surf and wind"
# 1 event -> "high surf"
a$EVTYPE[a$EVTYPE=="heavy surf and wind"] <- "high surf"
# "unseasonal rain"
# 2 events -> "heavy rain"
a$EVTYPE[a$EVTYPE=="unseasonal rain"] <- "heavy rain"
# "urban/sml stream fld"
# many events - categorized as flood events
a$EVTYPE[a$EVTYPE=="urban/sml stream fld"] <- "flood"
length(unique(a$EVTYPE)) # 60 unique event types
## [1] 60
# "torrential rainfall"
# 1 event -> "heavy rain"
a$EVTYPE[a$EVTYPE=="torrential rainfall"] <- "heavy rain"
# "unseasonable cold"
# 1 event -> "cold/wind chill"
a$EVTYPE[a$EVTYPE=="unseasonable cold"] <- "cold/wind chill"
# "unseasonably cold"
# 3 events -> "cold/wind chill"
a$EVTYPE[a$EVTYPE=="unseasonably cold"] <- "cold/wind chill"
# "wind and wave"
# 1 event -> "high surf"
a$EVTYPE[a$EVTYPE=="wind and wave"] <- "high surf"
# "storm surge" -> "storm tide"
a$EVTYPE[a$EVTYPE=="storm surge"] <- "storm tide"
# "thunderstorm"
# 2 events -> "thunderstorm wind"
a$EVTYPE[a$EVTYPE=="thunderstorm"] <- "thunderstorm wind"
# "rain/snow"
# 2 events -> "winter storm"
a$EVTYPE[a$EVTYPE=="rain/snow"] <- "winter storm"
# "rain"
# 3 events: 1 with damages from lightning, the other two with damages from
# heavy rain
a$EVTYPE[(a$EVTYPE == "rain")&(grepl("lightning",a$REMARKS))] <- "lightning"
a$EVTYPE[a$EVTYPE=="rain"] <- "heavy rain"
# "storm surge/tide" -> "storm tide"
a$EVTYPE[a$EVTYPE=="storm surge/tide"] <- "storm tide"
# "wind" -> "strong wind" (too many events to be invidually accounted for)
a$EVTYPE[a$EVTYPE=="wind"] <- "strong wind"
# "landspout" -> "tornado"
a$EVTYPE[a$EVTYPE=="landspout"] <- "tornado"
# "tidal flooding" -> "flood"
a$EVTYPE[a$EVTYPE=="tidal flooding"] <- "flood"
# "other"
a$EVTYPE[(a$EVTYPE == "other")&(grepl("dust devil",a$REMARKS))] <- "dust devil"
a$EVTYPE[(a$EVTYPE == "other")&(grepl("dustdevil",a$REMARKS))] <- "dust devil"
a$EVTYPE[(a$EVTYPE == "other")&(grepl("excess rainfall",a$REMARKS))] <-
"heavy rain"
a$EVTYPE[(a$EVTYPE == "other")&(grepl("avalanche",a$REMARKS))] <- "avalanche"
# "coastal storm" -> "winter storm"
# Inspect REMARKS of the 5 events in this category
a$EVTYPE[a$EVTYPE=="coastal storm"] <- "winter storm"
length(unique(a$EVTYPE)) # 47 unique event types
## [1] 47
# Change all event types back to full upper case for formality purposes in
# later analysis steps
a$EVTYPE <- trimws(toupper(a$EVTYPE))
Analysis #1: 1) What type of events are most harmful with respect to population health? - To study impact to population health, we can take a look at the FATALITIES and INJURIES variables. - Sum up values of these two variables for each EVTYPE category. - Inspect which EVTYPE has the most FATALITIES and which has the most INJURIES. - Create panel plot displaying the FATALITIEs and INJURIES count with respect to EVTYPE as visual illustrations.
# Now we are at a stage to sum up fatalities and injuries for each of the
# even types, and arrange results in descending order according to total
# fatalities and then total injuries
total_fatalities <- sum(a$FATALITIES)
total_injuries <- sum(a$INJURIES)
b <- a %>%
select(EVTYPE, FATALITIES, INJURIES) %>%
aggregate(. ~ EVTYPE, . , FUN = sum, na.rm = TRUE)
# Quick inspection of events with the most fatalities first followed by injuries
# since fatalities are considered more severe than injuries
b1 <- arrange(b, desc(FATALITIES, INJURIES))
head(b1,5)
## EVTYPE FATALITIES INJURIES
## 1 EXCESSIVE HEAT 1799 6461
## 2 TORNADO 1511 20667
## 3 FLASH FLOOD 895 1676
## 4 LIGHTNING 651 4141
## 5 RIP CURRENT 543 503
# Subset out results for the top 5 events in terms of fatalities
b1 <- b1[1:5,]
# Calculate % of fatalities accounted for by these top 5 events
b1_top5 <- sum(b1$FATALITIES)/total_fatalities
# Another safety check of events with the most injuries first followed by
# fatalities as we would like to find out if there are events with an
# unusually large number of injuries but relatively small number of fatalities
b2 <- arrange(b, desc(INJURIES, FATALITIES))
head(b2,5)
## EVTYPE FATALITIES INJURIES
## 1 TORNADO 1511 20667
## 2 FLOOD 444 6839
## 3 EXCESSIVE HEAT 1799 6461
## 4 THUNDERSTORM WIND 382 5154
## 5 LIGHTNING 651 4141
# Subset out results for the top 5 events in terms of fatalities
b2 <- b2[1:5,]
# Calculate % of injuries accounted for by these top 5 events
b2_top5 <- sum(b2$INJURIES)/total_injuries
# Graphical representation of results
library(ggplot2) # For creating ggplots
library(gridExtra) # For organizing multiple ggplots
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
# Graphs look better with horizontal bars
# 1 - Top 5 Events by Fatalities
b1plot <- ggplot(b1[,1:2], aes(x = reorder(EVTYPE, -FATALITIES), y=FATALITIES)) +
geom_col() +
coord_flip() +
labs(title="Top 5 Event Types (by Fatalities)",
x ="Events",
y = "Fatalities") +
theme(text = element_text(size=8)) +
scale_y_continuous(breaks = seq(0, max(b1[,2]), by=500))
# 2 - Top 5 Events by Injuries
b2plot <- ggplot(b2[,1:3], aes(x = reorder(EVTYPE, -INJURIES), y=INJURIES)) +
geom_col() +
coord_flip() +
labs(title="Top 5 Event Types (by Injuries)",
x ="Events",
y = "Injuries") +
theme(text = element_text(size=8)) +
scale_y_continuous(breaks = seq(0, max(b2[,3]), by=10000))
# Put the two graphs together
grid.arrange(b1plot, b2plot, ncol=1)
From the above analysis:
Tornadoes have the 2nd highest fatalities and highest injuries. On the other hand, excessive heat causes the most fatalities and the 2nd highest injuries. Hence, among all event types, these two even types are therefore considered the most harmful with respect to population health.
In terms of fatalities, the top five event types are: 1) Excessive Heat 2) Tornado 3) Flash Flood 4) Lightning 5) Rip Current
In addition, the top 5 event types have accounted for 61.8300504% of the total fatalities.
In terms of injuries, the top five event types are: 1) Tornado 2) Excessive Heat 3) Lightning 4) Flash Flood 5) Rip Current
In addition, the top 5 event types have accounted for 74.6218197% of the total injuries.
Analysis #2: 2) What type of events have the greatest economic consequences? - To study impact to economy, we can take a look at the PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP variables. - Inspect what type of exponents are available. - Calculate actual property (PROPDMG) and crop (CROPDMG) damages with exponents (EXP) taken into account. - Inspect which EVTYPE has the most crop damages and which has the most property damages. - Create panel plot displaying the crop damages and property damages with respect to EVTYPE as visual illustrations.
# Determine values of exponent for property damages
table(raw$PROPDMGEXP)
##
## + - 0 1 2 3 4 5 6
## 465934 5 1 216 25 13 4 4 28 4
## 7 8 ? B H K M h m
## 5 1 8 40 6 424665 11330 1 7
# Determine values of exponent for crop damages
table(raw$CROPDMGEXP)
##
## 0 2 ? B K M k m
## 618413 19 1 7 9 281832 1994 21 1
# Notes on exponents:
# + / - / ? - Will be disregarded
# 0 to 8 - Powers of 10
# h / H - Hundreds (10^2)
# k / K - Thousands (10^3)
# m / M - Millions (10^6)
# b / B - Billions (10^9)
# Convert all non-numeric exponents to numeric (e.g.: k / K converted to 3)
# Set up a new column "prop_exp_num" for PROPDMGEXP
# Set up a new column "crop_exp_num" for CROPDMGEXP
# Set up a new column "prop_dmg_total" as property damages total for each record
# Set up a new column "crop_dmg_total" as crop damages total for each record
c <- a %>%
select(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
mutate(prop_exp_num = case_when(
PROPDMGEXP=="+" ~ 0,
PROPDMGEXP=="-" ~ 0,
PROPDMGEXP=="?" ~ 0,
PROPDMGEXP=="0" ~ 0,
PROPDMGEXP=="1" ~ 1,
PROPDMGEXP=="2" ~ 2,
PROPDMGEXP=="3" ~ 3,
PROPDMGEXP=="4" ~ 4,
PROPDMGEXP=="5" ~ 5,
PROPDMGEXP=="6" ~ 6,
PROPDMGEXP=="7" ~ 7,
PROPDMGEXP=="8" ~ 8,
PROPDMGEXP=="9" ~ 9,
PROPDMGEXP=="h" ~ 2,
PROPDMGEXP=="H" ~ 2,
PROPDMGEXP=="k" ~ 3,
PROPDMGEXP=="K" ~ 3,
PROPDMGEXP=="m" ~ 6,
PROPDMGEXP=="M" ~ 6,
PROPDMGEXP=="b" ~ 9,
PROPDMGEXP=="B" ~ 9,
TRUE ~ 0)) %>%
mutate(crop_exp_num = case_when(
CROPDMGEXP=="+" ~ 0,
CROPDMGEXP=="-" ~ 0,
CROPDMGEXP=="?" ~ 0,
CROPDMGEXP=="0" ~ 0,
CROPDMGEXP=="1" ~ 1,
CROPDMGEXP=="2" ~ 2,
CROPDMGEXP=="3" ~ 3,
CROPDMGEXP=="4" ~ 4,
CROPDMGEXP=="5" ~ 5,
CROPDMGEXP=="6" ~ 6,
CROPDMGEXP=="7" ~ 7,
CROPDMGEXP=="8" ~ 8,
CROPDMGEXP=="9" ~ 9,
CROPDMGEXP=="h" ~ 2,
CROPDMGEXP=="H" ~ 2,
CROPDMGEXP=="k" ~ 3,
CROPDMGEXP=="K" ~ 3,
CROPDMGEXP=="m" ~ 6,
CROPDMGEXP=="M" ~ 6,
CROPDMGEXP=="b" ~ 9,
CROPDMGEXP=="B" ~ 9,
TRUE ~ 0)) %>%
mutate(prop_dmg_total = PROPDMG*10^(prop_exp_num)) %>%
mutate(crop_dmg_total = CROPDMG*10^(crop_exp_num)) %>%
select(EVTYPE, prop_dmg_total, crop_dmg_total) %>%
aggregate(. ~ EVTYPE, . , FUN = sum, na.rm = TRUE)
# Total property damages from all records
total_prop_dmg <- sum(c$prop_dmg_total)
# Total crop damages from all records
total_crop_dmg <- sum(c$crop_dmg_total)
# Quick inspection of events with the most property damages first followed by
# crop damages
c1 <- arrange(c, desc(prop_dmg_total, crop_dmg_total))
head(c1,5)
## EVTYPE prop_dmg_total crop_dmg_total
## 1 FLOOD 144129608200 5013161500
## 2 HURRICANE (TYPHOON) 81718889010 5350107800
## 3 STORM TIDE 47835539000 855000
## 4 TORNADO 24616952710 283425010
## 5 FLASH FLOOD 15224381910 1334901700
# Subset out results for the top 5 events in terms of property damages
c1 <- c1[1:5,]
# Calculate % of property damages accounted for by these top 5 events
c1_top5 <- sum(c1$prop_dmg_total)/total_prop_dmg
# Next inspect events with most crop damages first followed by
# property damages
c2 <- arrange(c, desc(crop_dmg_total, prop_dmg_total))
head(c2,5)
## EVTYPE prop_dmg_total crop_dmg_total
## 1 DROUGHT 1046101000 13367566000
## 2 HURRICANE (TYPHOON) 81718889010 5350107800
## 3 FLOOD 144129608200 5013161500
## 4 HAIL 14595213420 2496822450
## 5 FROST/FREEZE 19038200 1368761000
# Subset out results for the top 5 events in terms of crop damages
c2 <- c2[1:5,]
# Calculate % of crop damages accounted for by these top 5 events
c2_top5 <- sum(c2$crop_dmg_total)/total_crop_dmg
# Graphical representation of results
# Graphs look better with horizontal bars
# 1 - Top 5 Events by Property Damages
c1plot <- ggplot(c1[,1:2], aes(x = reorder(EVTYPE, -prop_dmg_total),
y=prop_dmg_total)) +
geom_col() +
coord_flip() +
labs(title="Top 5 Event Types (by Property Damages)",
x ="Events",
y = "Property Damages") +
theme(text = element_text(size=8)) +
scale_y_continuous(breaks = seq(0, max(c1[,2]), by=10000000000))
# 2 - Top 5 Events by Crop Damages
c2plot <- ggplot(c2[,1:3], aes(x = reorder(EVTYPE, -crop_dmg_total), y=crop_dmg_total)) +
geom_col() +
coord_flip() +
labs(title="Top 5 Event Types (by Crop Damages)",
x ="Events",
y = "Crop Damages") +
theme(text = element_text(size=8)) +
scale_y_continuous(breaks = seq(0, max(c2[,3]), by=1000000000))
# Put the two graphs together
grid.arrange(c1plot, c2plot, ncol=1)
From the above analysis:
Flood has resulted the highest property damages while drought has resulted in the highest crop damages. These two event types are therefore considered the most harmful with respect to economic consequences.
In terms of property damages, the top five event types are: 1) Flood 2) Hurricane (Typhoon) 3) Storm Tide 4) Tornado 5) Flash Flood
In addition, the top 5 event types have accounted for 85.4833845% of the total property damages.
In terms of crop damages, the top five event types are: 1) Drought 2) Hurricane (Typhoon) 3) Flood 4) Hail 5) Frost/Freeze
In addition, the top 5 event types have accounted for 79.4079192% of the crop damages.
This analysis utilized storm data downloaded from the United States (US) National Oceanic and Atmospheric Administration’s (NOAA) database with the objective of identifying storm event types with the most impact to population health and the economy. Analysis results have indicated that the most fatalities were caused by excessive heat and the most injuries were caused by tornados. Economy-wise, flood caused the most property damages while drought resulted in the most crop damages.