Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This report explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage for the period from 1950 through to 2011.
Underlying storm events are aggregated and analyzed to be able to identify those events which have the most significant
with a view to providing appropriate data to assist a policy maker arriving at decisions around appropriate resource allocations for prevention and mitigation activity.
The storm data is a copy of the NOAA storm database made available from the Reproducible Research course website. The data is provided in a csv and bz2 compressed format. The original data is provided by the NOAA National Centers for Environmental Information and the original raw data is available in the Storm Events Database.
We obtain the file and load it into a data table. Given the time taken to download and parse the file we persist download and parsed file after processing for the first time.
storm.data <- (function() {
parsed.file = './raw-data/storm.data.rds'
if (!file.exists(parsed.file)) {
if (!file.exists('raw-data')) {
dir.create(file.path(getwd(), 'raw-data'))
}
url <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
file <- './raw-data/repdata-data-StormData.csv.bz2'
if (!file.exists(file)) {
download(url, file)
storm.data <- data.table(read.csv(bzfile("./raw-data/repdata-data-StormData.csv.bz2")))
setkey(storm.data, 'REFNUM')
saveRDS(storm.data, parsed.file)
}
} else {
storm.data <- readRDS(parsed.file)
}
storm.data
})()
Data on storm events is provided from 1950 through to November 2011. During this period there have been 902297 observations of weather events captured which collectively represent `r length(unique(storm.data$EVTYPE)) different event types. There are however a number of inconsistencies with the data and prior to performing analysis additional processing was performed to clean the data.
Prior to 1996 the data is limited in scope adn only captures Tornado Data (1950 - 1995 from the Storm Prediction Center) and Thunderstorm Wind and Hail Data (1959 - 1995 Storm Prediection Center). It was not until 1996 that the capture of storm data was standardized against the current Storm Data Event definitions so data prior to 1996 was discarded to avoid skewing towards a narrow series of events.
storm.data <- storm.data[as.Date(BGN_DATE, "%m/%d/%Y") >= "1996-01-01"]
As we are looking at the economic and health impact of events we can discard any events that do not meet these criteria. We filter data where there is no health impact (0 fatalities or injuries) or economic damage (0 property or crop damage).
storm.data <- storm.data[storm.data$FATALITIES > 0 | storm.data$INJURIES > 0 | storm.data$PROPDMG > 0 | storm.data$FATALITIES > 0 , ]
There is a lack of standardization for event categorization. Equivalent events are described using various cases, punctuation and descriptions so we have looked to clean these by:
storm.data$EVTYPE <- tolower(storm.data$EVTYPE)
storm.data$EVTYPE <- gsub("[[:blank:][:punct:]+]", " ", storm.data$EVTYPE)
storm.data$EVTYPE <- gsub("^\\s+", "", storm.data$EVTYPE)
storm.data$EVTYPE <- gsub("[0-9]+", "", storm.data$EVTYPE)
storm.data$EVTYPE <- gsub("\\s+$", "", storm.data$EVTYPE)
storm.data$EVTYPE <- gsub("\\s+", " ", storm.data$EVTYPE)
storm.data <- storm.data[!EVTYPE %like% "summary"]
storm.data <- storm.data[!EVTYPE %like% "none" | !EVTYPE %like% "other"]
storm.data <- storm.data[!grep("\\s.{1,2}$", EVTYPE)]
Subsequent to this event normalization there are
length(unique(storm.data$EVTYPE))
## [1] 163
distinct event types covering
nrow(storm.data)
## [1] 196024
observations. Prior to further analysis we need to normalize these to the 48 standardized event types provided in p.6 2.1.1 Table 1. Storm Data Event Table of NOAA’s NWS Documentation. We have done this by producing a lookup table which provides appropriate mappings for the remaining event types to the standardized values. This was achieved using a technical described by Daniel Falster using lookup tables and implemented in addNewData.R.
The souce for Daniel Falster’s implementation follows:
##' Modifies 'data' by adding new values supplied in newDataFileName
##'
##' newDataFileName is expected to have columns
##' c(lookupVariable,lookupValue,newVariable,newValue,source)
##'
##' Within the column 'newVariable', replace values that
##' match 'lookupValue' within column 'lookupVariable' with the value
##' newValue'. If 'lookupVariable' is NA, then replace *all* elements
##' of 'newVariable' with the value 'newValue'.
##'
##' Note that lookupVariable can be the same as newVariable.
##'
##' @param newDataFileName name of lookup table
##' @param data existing data.frame
##' @param allowedVars vector of permissible variable names for newVariable
##' @return modified data.frame
addNewData <- function(newDataFileName, data, allowedVars){
import <- readNewData(newDataFileName, allowedVars)
if( !is.null(import)){
for(i in seq_len(nrow(import))){ #Make replacements
col.to <- import$newVariable[i]
col.from <- import$lookupVariable[i]
if(is.na(col.from)){ # apply to whole column
data[col.to] <- import$newValue[i]
} else { # apply to subset
rows <- data[[col.from]] == import$lookupValue[i]
data[rows,col.to] <- import$newValue[i]
}
}
}
data
}
##' Utility function to read/process newDataFileName for addNewData
##'
##' @param newDataFileName name of lookup table
##' @param allowedVars vector of permissible variable names for newVariable
##' @return data.frame with columns c(lookupVariable,lookupValue,newVariable,newValue,source)
readNewData <- function(newDataFileName, allowedVars){
if( file.exists(newDataFileName)){
import <- read.csv(newDataFileName, header=TRUE, stringsAsFactors=FALSE,
strip.white=TRUE)
if( nrow(import)> 0 ){
#Check columns names for import are right
expectedColumns<- c("lookupVariable","lookupValue","newVariable","newValue")
nameIsOK <- expectedColumns %in% names(import)
if(any(!nameIsOK))
stop("Incorrect name in lookup table for ",
newDataFileName, "--> ", paste(expectedColumns[!nameIsOK],
collapse=", "))
#Check values of newVariable are in list of allowed variables
import$lookupVariable[import$lookupVariable == ""] <- NA
nameIsOK <- import$newVariable %in% allowedVars
if(any(!nameIsOK))
stop("Incorrect name(s) in newVariable column of ",
newDataFileName, "--> ", paste(import$newVariable[!nameIsOK],
collapse=", "))
} else {
import <- NULL
}
} else {
import <- NULL
}
import
}
The data mappings used to normalize the remaining events were:
event.map <- read.csv(file="analysis-data/eventmap.csv",head=TRUE,sep=",")
event.map[, c(2, 4)]
## lookupValue newValue
## 1 x Other
## 2 astronomical high tide Storm Surge/Tide
## 3 astronomical low tide Astronomical Low Tide
## 4 avalanche Avalanche
## 5 beach erosion Coastal Flood
## 6 black ice Frost/Freeze
## 7 blizzard Blizzard
## 8 blowing dust Dust Storm
## 9 blowing snow Ice Storm
## 10 brush fire Wildfire
## 11 coastal erosion Coastal Flood
## 12 coastal flood Coastal Flood
## 13 coastal flooding Coastal Flood
## 14 coastal flooding erosion Coastal Flood
## 15 coastal storm Coastal Flood
## 16 coastalstorm Coastal Flood
## 17 cold Cold/Wind Chill
## 18 cold and snow Cold/Wind Chill
## 19 cold temperature Cold/Wind Chill
## 20 cold weather Cold/Wind Chill
## 21 cold wind chill Cold/Wind Chill
## 22 dam break Flood
## 23 damaging freeze Frost/Freeze
## 24 dense fog Dense Fog
## 25 dense smoke Dense Smoke
## 26 downburst Heavy Rain
## 27 drought Drought
## 28 drowning Other
## 29 dry microburst Thunderstorm Wind
## 30 dust devil Dust Devil
## 31 dust storm Dust Storm
## 32 erosion cstl flood Coastal Flood
## 33 excessive heat Excessive Heat
## 34 excessive snow Heavy Snow
## 35 extended cold Extreme Cold/Wind Chill
## 36 extreme cold Extreme Cold/Wind Chill
## 37 extreme cold wind chill Extreme Cold/Wind Chill
## 38 extreme windchill Extreme Cold/Wind Chill
## 39 falling snow ice Ice Storm
## 40 flash flood Flash Flood
## 41 flash flood flood Flash Flood
## 42 flood Flood
## 43 flood flash flood Flash Flood
## 44 fog Dense Fog
## 45 freeze Frost/Freeze
## 46 freezing drizzle Frost/Freeze
## 47 freezing fog Freezing Fog
## 48 freezing rain Sleet
## 49 freezing spray Sleet
## 50 frost Frost/Freeze
## 51 frost freeze Frost/Freeze
## 52 funnel cloud Funnel Cloud
## 53 glaze Frost/Freeze
## 54 gradient wind High Wind
## 55 gusty wind High Wind
## 56 gusty wind hail High Wind
## 57 gusty wind hvy rain High Wind
## 58 gusty wind rain High Wind
## 59 gusty winds High Wind
## 60 hail Hail
## 61 hazardous surf High Surf
## 62 heat Heat
## 63 heat wave Heat
## 64 heavy rain Heavy Rain
## 65 heavy rain high surf Heavy Rain
## 66 heavy seas High Surf
## 67 heavy snow Heavy Snow
## 68 heavy snow shower Heavy Snow
## 69 heavy surf High Surf
## 70 heavy surf and wind High Surf
## 71 heavy surf high surf High Surf
## 72 high seas High Surf
## 73 high surf High Surf
## 74 high surf advisory High Surf
## 75 high swells High Surf
## 76 high water Storm Surge/Tide
## 77 high wind High Wind
## 78 high winds High Wind
## 79 hurricane Hurricane (Typhoon)
## 80 hurricane edouard Hurricane (Typhoon)
## 81 hurricane typhoon Hurricane (Typhoon)
## 82 hyperthermia exposure Extreme Cold/Wind Chill
## 83 hypothermia exposure Extreme Cold/Wind Chill
## 84 ice jam flood minor Flood
## 85 ice on road Frost/Freeze
## 86 ice roads Frost/Freeze
## 87 ice storm Frost/Freeze
## 88 icy roads Frost/Freeze
## 89 lake effect snow Lake-Effect Snow
## 90 lakeshore flood Lakeshore Flood
## 91 landslide Debris Flow
## 92 landslides Debris Flow
## 93 landslump Debris Flow
## 94 landspout Debris Flow
## 95 late season snow Heavy Snow
## 96 light freezing rain Sleet
## 97 light snow Winter Weather
## 98 light snowfall Winter Weather
## 99 lightning Lightning
## 100 marine accident Other
## 101 marine hail Marine Hail
## 102 marine high wind Marine High Wind
## 103 marine strong wind Marine Strong Wind
## 104 marine tstm wind Marine Thunderstorm Wind
## 105 marine thunderstorm wind Marine Thunderstorm Wind
## 106 microburst Thunderstorm Wind
## 107 mixed precip Heavy Rain
## 108 mixed precipitation Heavy Rain
## 109 mud slide Debris Flow
## 110 mudslide Debris Flow
## 111 mudslides Debris Flow
## 112 non severe wind damage High Wind
## 113 non tstm wind High Wind
## 114 non thunderstorm wind High Wind
## 115 other Other
## 116 rain Heavy Rain
## 117 rain snow Heavy Rain
## 118 record heat Excessive Heat
## 119 rip current Rip Current
## 120 rip currents Rip Current
## 121 river flood Flood
## 122 river flooding Flood
## 123 rock slide Debris Flow
## 124 rogue wave High Surf
## 125 rough seas High Surf
## 126 rough surf High Surf
## 127 seiche Seiche
## 128 small hail Hail
## 129 snow Heavy Snow
## 130 snow and ice Ice Storm
## 131 snow squall Ice Storm
## 132 snow squalls Ice Storm
## 133 storm surge Storm Surge/Tide
## 134 storm surge tide Storm Surge/Tide
## 135 strong wind Strong Wind
## 136 strong winds Strong Wind
## 137 thunderstorm Thunderstorm Wind
## 138 thunderstorm wind Thunderstorm Wind
## 139 tidal flooding Storm Surge/Tide
## 140 tornado Tornado
## 141 torrential rainfall Heavy Rain
## 142 tropical depression Tropical Depression
## 143 tropical storm Tropical Storm
## 144 tstm wind Thunderstorm Wind
## 145 tstm wind and lightning Thunderstorm Wind
## 146 tstm wind hail Thunderstorm Wind
## 147 tsunami Tsunami
## 148 typhoon Hurricane (Typhoon)
## 149 unseasonably warm Heat
## 150 urban sml stream fld Flood
## 151 volcanic ash Volcanic Ash
## 152 warm weather Heat
## 153 waterspout Waterspout
## 154 wet microburst Thunderstorm Wind
## 155 whirlwind Strong Wind
## 156 wild forest fire Wildfire
## 157 wildfire Wildfire
## 158 wind High Wind
## 159 wind and wave High Wind
## 160 wind damage High Wind
## 161 winds High Wind
## 162 winter storm Winter Storm
## 163 winter weather Winter Weather
## 164 winter weather mix Winter Weather
## 165 wintry mix Winter Weather
Applying these mappings to the filtered storm data
allowedVars<-c("EVTYPE")
storm.data <- addNewData("analysis-data/eventmap.csv", storm.data, allowedVars)
results in
length(unique(storm.data$EVTYPE))
## [1] 49
distinct event types which represent the 48 standard types and an additional classification of Other to account for event types that it was not possible to map.
Storm events impacting population health can be considered to be those events resulting in injuries or fatalities. During the sample period there were a total of
nrow(storm.data[FATALITIES > 0 | INJURIES > 0])
## [1] 12760
events which had an impact on population health. Isolating those events to a discrete data frame
health.impact <- storm.data[FATALITIES > 0 | INJURIES > 0]
and determing the total impact for the health events given some events produce both injuries and fatalities
health.impact$INJURIES[is.na(health.impact$INJURIES)] <- 0
health.impact$CROPDMG1[is.na(health.impact$FATALITIES)] <- 0
health.impact <- health.impact[, IMPACT := .SD[, INJURIES + FATALITIES]]
we are then able to plot the number of events. Producing the plots filtered to the events of each type having the greatest impact
injuries <- ggplot(health.impact[, head(.SD, 20), by=INJURIES],
aes(x = EVTYPE, y = INJURIES)) +
geom_bar(stat = "identity",
aes(fill = INJURIES),
position = "dodge") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("Event") +
ylab("Injuries")
fatalities <- ggplot(health.impact[, head(.SD, 20), by=FATALITIES],
aes(x = EVTYPE, y = FATALITIES)) +
geom_bar(stat = "identity",
aes(fill = FATALITIES), position = "dodge") + theme(axis.text.x = element_text(angle =45, hjust = 1)) +
xlab("Event") +
ylab("Fatalities")
combined <- ggplot(health.impact[, head(.SD, 20), by=IMPACT],
aes(x = EVTYPE, y = IMPACT)) +
geom_bar(stat = "identity",
aes(fill = IMPACT),
position = "dodge") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("Event") +
ylab("Health Impact")
grid.arrange(injuries, fatalities, combined, ncol=1, main = "Storm Events Impacting Population Health in the United States (1996-2011)")
The histograms demonstrate that it is a Tornado which has the greatest impact on population health, both from the perspective of injuries and fatalaties.
Economic damage caused by storm events is classified as 2 distinct groups - property damage (PROPDMG) and crop damage (CROPDMG). The economic impact of the damage can be determined by the dollar amount in association with an exponent (PROPDMGEXP or CROPDMGEXP). The expected values for the exponents is numeric or a textual representation eg. m or M to represent 10^6. To facilitate the subsequent analysis this was normalised to a numeric representation.
map.exponent.as.numeric <- function(exponent, ...) {
exponent <- as.factor(exponent)
levels(exponent) <- list(...)
exponent
}
`PROPDMGEXP’ has
unique(storm.data$PROPDMGEXP)
## [1] K M B
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
values to be mapped in the exponent.
storm.data$PROPDMGEXP <- map.exponent.as.numeric(storm.data$PROPDMGEXP,
"1"=c("1"), "2"=c("2"), "3"=c("3"), "4"=c("4"), "5"=c("5"), "6"=c("6"), "7"=c("7"), "8"=c("8"), "9"=c("9"),
"100"=c('h', 'H'), "1000"=c('k','K'), "1000000"=c('m','M'), "1000000000"=c('b','B'))
`CROPDMGEXP’ has
unique(storm.data$CROPDMGEXP)
## [1] K M B
## Levels: ? 0 2 B k K m M
values to be mapped in the exponent.
storm.data$CROPDMGEXP <- map.exponent.as.numeric(storm.data$CROPDMGEXP,
"1"=c("1"), "2"=c("2"), "3"=c("3"), "4"=c("4"), "5"=c("5"), "6"=c("6"), "7"=c("7"), "8"=c("8"), "9"=c("9"),
"100"=c('h', 'H'), "1000"=c('k','K'), "1000000"=c('m','M'), "1000000000"=c('b','B'))
Having mapped the values to the damage exponents we are able to determine values for the economic damage associated with impacting events by multiplying the damage amounts (PROPDMG and CROPDMG) by the respective exponents (PROPDMGEXP and CROPDMGEXP).
Extracting that subset of the data which covers events with an economic impact
economic.impact <- storm.data[PROPDMG > 0 | CROPDMG > 0]
we have a total of
nrow(economic.impact)
## [1] 189232
events which have caused an economic impact. We determine the notional $ value for each of these economic events by multiplying the damage by the mapped exponents and, given some events cause both crop and property damage, sum these to arrive at the total economic impact.
economic.impact <- economic.impact[, PROPDMG1:=0]
economic.impact <- economic.impact[, CROPDMG1:=0]
economic.impact <- economic.impact[, TOTALDMG:=0]
economic.impact <- economic.impact[, PROPDMG1 := .SD[, PROPDMG * as.numeric(levels(PROPDMGEXP))[PROPDMGEXP]]]
economic.impact <- economic.impact[, CROPDMG1 := .SD[, CROPDMG * as.numeric(levels(CROPDMGEXP))[CROPDMGEXP]]]
economic.impact$PROPDMG1[is.na(economic.impact$PROPDMG1)] <- 0
economic.impact$CROPDMG1[is.na(economic.impact$CROPDMG1)] <- 0
economic.impact <- economic.impact[, TOTALDMG := .SD[, CROPDMG1 + PROPDMG1]]
we are then able to plot the notional impact (USD) of the events
cropdmg <- ggplot(economic.impact[, head(.SD, 10), by=CROPDMG1],
aes(x = EVTYPE, y = CROPDMG1)) +
geom_bar(stat = "identity",
aes(fill = CROPDMG1),
position = "dodge") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("Event") +
ylab("Crop Damage (USD)")
propdmg <- ggplot(economic.impact[, head(.SD, 10), by=PROPDMG1],
aes(x = EVTYPE, y = PROPDMG)) +
geom_bar(stat = "identity",
aes(fill = PROPDMG1), position = "dodge") + theme(axis.text.x = element_text(angle =45, hjust = 1)) +
xlab("Event") +
ylab("Property Damage (USD)")
combined <- ggplot(economic.impact[, head(.SD, 10), by=TOTALDMG],
aes(x = EVTYPE, y = TOTALDMG)) +
geom_bar(stat = "identity",
aes(fill = TOTALDMG),
position = "dodge") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("Event") +
ylab("Total Damage (USD)")
grid.arrange(cropdmg, propdmg, combined, ncol=1, main = "Storm Events Economic in the United States (1996-2011)")
The histograms demonstrate that the events causing the most crop damage are Hurricane (Typhoon) and Debris Flow causes the most property damage. Howeverm on aggregate, it is a Flood which causes the most damage overall.
The Tornado is the weather event which has had the most impact on human health, both in terms of injuries and fatalaties, from 1996 through to 2011 although a Flood is the weather event which has caused the most economic damage over the same period.