In this report we aim to describe the most damaging weather event types on NOAA record by looking on the one hand at damages to property and agriculture and on the other hand on damages to public health induced to all severe weather event types across the United states from the years 1950 to 2011. The reader will learn which events are to prepared for. And the results are surprisingly clear by showing a single most impactful event type each, far ahead of any ranked event types.
From the NOAA Storm Database we obtained the Storm database for the years 1950 to 2011.
We read in the database from a comma separated file (delimited) which actually uses the comma char for field separation. The file has a header for the various field types, which are not officically coded and documented and are therefore subject to interpretation.
options(scipen=999)
if(!exists('pm0')) {
pm0 <- read.table("repdata_data_StormData.csv.bz2", comment.char = "#",
header = TRUE, sep = ",", na.strings = "")
alarm()
}
##
Checking cleanliness of EVType factor variable:
head(unique(pm0$EVTYPE), 50)
## [1] TORNADO TSTM WIND
## [3] HAIL FREEZING RAIN
## [5] SNOW ICE STORM/FLASH FLOOD
## [7] SNOW/ICE WINTER STORM
## [9] HURRICANE OPAL/HIGH WINDS THUNDERSTORM WINDS
## [11] RECORD COLD HURRICANE ERIN
## [13] HURRICANE OPAL HEAVY RAIN
## [15] LIGHTNING THUNDERSTORM WIND
## [17] DENSE FOG RIP CURRENT
## [19] THUNDERSTORM WINS FLASH FLOOD
## [21] FLASH FLOODING HIGH WINDS
## [23] FUNNEL CLOUD TORNADO F0
## [25] THUNDERSTORM WINDS LIGHTNING THUNDERSTORM WINDS/HAIL
## [27] HEAT WIND
## [29] LIGHTING HEAVY RAINS
## [31] LIGHTNING AND HEAVY RAIN FUNNEL
## [33] WALL CLOUD FLOODING
## [35] THUNDERSTORM WINDS HAIL FLOOD
## [37] COLD HEAVY RAIN/LIGHTNING
## [39] FLASH FLOODING/THUNDERSTORM WI WALL CLOUD/FUNNEL CLOUD
## [41] THUNDERSTORM WATERSPOUT
## [43] EXTREME COLD HAIL 1.75)
## [45] LIGHTNING/HEAVY RAIN HIGH WIND
## [47] BLIZZARD BLIZZARD WEATHER
## [49] WIND CHILL BREAKUP FLOODING
## 985 Levels: HIGH SURF ADVISORY COASTAL FLOOD ... WND
It is apparent that Event Type Strings are not assigned in a systematic manner but are rather all over the place in similar variants that do not allow systematic aggregation under the same category as is.
Good Example is the event ‘Thunderstorm’ which has often wind speeds inside the Event type, although wind speeds are clearly a different piece of information.
# cleaning and streamling efforts
# goal is to have most similar event types under one factor variable
pm0$EVTYPE <- toupper(pm0$EVTYPE)
pm0$EVTYPE[grep("THUNDERS",pm0$EVTYPE)] = "THUNDERSTORM"
pm0$EVTYPE[grep("TSTM",pm0$EVTYPE)] = "THUNDERSTORM"
pm0$EVTYPE[grep("LIGHTNING",pm0$EVTYPE)] = "THUNDERSTORM"
pm0$EVTYPE[grep("TORNAD",pm0$EVTYPE)] = "TORNADO"
pm0$EVTYPE[grep("HAIL",pm0$EVTYPE)] = "HAIL"
pm0$EVTYPE[grep("FLOOD",pm0$EVTYPE)] = "FLOOD"
pm0$EVTYPE[grep("HEAT",pm0$EVTYPE)] = "HEAT"
pm0$EVTYPE[grep("WIND",pm0$EVTYPE)] = "WIND"
pm0$EVTYPE[grep("SNOW",pm0$EVTYPE)] = "SNOW"
pm0$EVTYPE[grep("HURRICANE",pm0$EVTYPE)] = "HURRICANE"
In order to determine which event types have the strongest impact on public health we need to aggregate the large number of events (>900K events).
For operationalisation we use the sum FATALITIES and INJURIES data.
# public health subset
pm1 <- subset(pm0, pm0$FATALITIES + pm0$INJURIES > 0)
pm1$HARMED <- pm1$FATALITIES + pm1$INJURIES
#pm1 <- pm1[order(pm1$HARMED, decreasing = T),]
library(plyr)
## Warning: package 'plyr' was built under R version 3.6.1
# aggregate by event type
pm1.sum <- ddply(pm1, c("EVTYPE"), summarize, HARMED = sum(HARMED))
# order by impact
pm1.sum <- pm1.sum[order(pm1.sum$HARMED, decreasing = T),]
# plotting Top 10 event types
library(ggplot2)
ggplot(pm1.sum[1:10,], aes(x = reorder(EVTYPE, HARMED), y = HARMED)) +
geom_bar(stat = "identity", colour="Steelblue") +
ggtitle("Types of weather events on record most harmful to the population health in the USA between 1950 and 2011") +
labs(x="Event Type", y="Affected People (Fatalaties + Injuries)") +
coord_flip() +
labs( caption = "Weather events sorted descending by health damages | Data source: NOAA")
In order to answer this question we assume that both agricultural and property damages will be taken into account. Also earlier records are believed not to be complete and concise as to each data points and are certainly just estimations anyway. All together this represents certainly an unknown margin of error.
#preparing and subsetting damages data for summarizing - by merging number and unit field into a single number field
prop <- pm0[,c("EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
prop$PROPDMGEXP <- toupper(pm0$PROPDMGEXP)
# operationalize power calc
prop$PROPDMGEXP[prop$PROPDMGEXP %in% c("", "+" ,"0" ,"?" ,"-")] <- 0
prop$PROPDMGEXP[prop$PROPDMGEXP == "K"] <- 3
prop$PROPDMGEXP[prop$PROPDMGEXP == "M"] <- 6
prop$PROPDMGEXP[prop$PROPDMGEXP == "H"] <- 2
prop$PROPDMGEXP[prop$PROPDMGEXP == "B"] <- 9
# prepare numeric power calc
prop$PROPDMGEXP <- as.numeric(prop$PROPDMGEXP)
# finally calculate the single damage number that should have been there in the first place
prop$PROPTOT <- prop$PROPDMG * 10^(prop$PROPDMGEXP)
#preparing damages data for summarizing by merging number and unit field into a single number field
prop$CROPDMGEXP <- toupper(prop$CROPDMGEXP)
# operationalize power calc
prop$CROPDMGEXP[prop$CROPDMGEXP == "?"] <-0
prop$CROPDMGEXP[prop$CROPDMGEXP == ""] <- 0
prop$CROPDMGEXP[prop$CROPDMGEXP == "B"] <- 9
prop$CROPDMGEXP[prop$CROPDMGEXP == "M"] <- 6
prop$CROPDMGEXP[prop$CROPDMGEXP == "K"] <- 3
# prepare numeric power calc
prop$CROPDMGEXP <- as.numeric(prop$CROPDMGEXP)
# finally calculate the single damage number that should have been there in the first place
prop$CROPTOT <- prop$CROPDMG * 10^(prop$CROPDMGEXP)
# impute na values with zero
prop$CROPTOT[is.na(prop$CROPTOT)] <- 0
prop$PROPTOT[is.na(prop$PROPTOT)] <- 0
# calc combined total damage
prop$TOTAL <- prop$PROPTOT + prop$CROPTOT
library(plyr)
# aggregate all events by event type and summarize TOTAL to GRAND TOTAL
prop.sum <- ddply(prop, c("EVTYPE"), summarize, TOTAL = sum(TOTAL))
prop.sum <- prop.sum[order(prop.sum$TOTAL, decreasing = T),]
# custom tick marks for Billion USD formatting
ylab <- c(25, 50, 100, 150)
library(ggplot2)
ggplot(prop.sum[1:10,], aes(x = reorder(EVTYPE, TOTAL), y = TOTAL)) +
geom_bar(stat = "identity", colour="Steelblue") +
ggtitle("Types of weather events on record with most economical damages\n in the USA between 1950 - 2011") +
labs(x="Event Type", y="Damages in USD (Property & Crops)") +
coord_flip() +
scale_y_continuous(labels = paste0(ylab, "B"), breaks = 10^9 * ylab ) +
labs( caption = "Weather events sorted descending by property damages | Data source: NOAA")
By far the most danger to people lifes and health is the weather event type TORNADO followed by THUNDERSTORMS.
The Number One type of weather event in terms of property damage is by far the flooding. FLOODs are responsible for about 180B USD in this time period between 1950 and 2011 alone.