This paper makes an attempt to explore the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database about severe weather events. It employs an exploratory analysis to find which types of events are most harmful with respect to public health and economy and explains their potential danger in terms of damage caused by single events and the number of times the events occur.
The NOAA storm data base tracks characteristics of major storms and weather events in the US from 1950 to 2011 including when and where they occur, as well as estimates of any fatalities injuries and economic damage.
The data come in form of a comma-separated-value file compressed via the bzip2 algorithm and can be downloaded from here:
Documentation and FAQ to the database:
Description of Raw Data
| Variable Name | Class | Range | Description |
|---|---|---|---|
| EVTYPE | Character | – | Weather event labels, as described by the National Weather Service Storm Data Documentation |
| FATALITIES | Numeric | 0 - 583 | Number of direct and indirect deaths caused by single weather events |
| INJURIES | Numeric | 0 - 1700 | Number of direct and indirect injuries caused by single weather events |
| PROPDMG | Numeric | 0 - 5000 | Amount of Property Damage in Dollars, completed by information from PROPDMGEXP |
| PROPDMGEXP | Character | - | Indicates decimal exponent to multiply PROPDMG with |
| CROPDMG | Numeric | 0 - 990 | Amount of agricultural crop damage in dollars, completed by information from CROPDMGEXP |
| CROPDMGEXP | Character | - | Indicates decimal exponent to multiply CROPDMG with |
The processed datasets, health and economy, which are fed into the exporatory analysis, include the following variables:
| Variable Name | Class | Range | Description |
|---|---|---|---|
| EVENT | Character | – | Range and Description of Types of Events are found in the National Weather Service Storm Data Documentation, section 2.1.1 |
| HEALTH_DMG | Factor | 2 Levels: INJURY, FATALITY | Classification of the observed case into fatalities and injuries |
| CASES | Numeric | 0 - 1700 | Number of injuries or fatalities caused by the EVENT |
| Variable Name | Class | Range | Description |
|---|---|---|---|
| EVENT | Character | – | Range and Description of Types of Events are found in the National Weather Service Storm Data Documentation, section 2.1.1 |
| ECON_DMG | Factor | 2 Levels: CROP, PROP | Classification of the observed case into crop and property damage |
| ECON_DMG_ESTIMATE | Numeric | 0 - 1.15e+11 | Property or agricultural Crop damage in Dollars |
The processing steps required for the transformation of the raw into the two tidy datasets, health and economy, are described in the following. Data Processing includes:
The following packages are installed and loaded into R:
if(!require(R.utils)){
install.packages("R.utils")
}
library(R.utils)
if(!require(data.table)){
install.packages("data.table")
}
library(data.table)
if(!require(dplyr)){
install.packages("dplyr")
}
library(dplyr)
if(!require(ggplot2)){
install.packages("ggplot2")
}
library(ggplot2)
The data is downloaded into the working directory, and the required subset is loaded into the R environment.
path <- getwd()
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if(!file.exists("data.csv")){
download.file(url, file.path(path, "data.csv.bz2"), method = "curl")
bunzip2("data.csv.bz2", "data.csv", remove = FALSE, skip = TRUE)
}
var_names <- c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
data <- fread("data.csv"
, select = var_names
, na.strings = "")
As the projects’s focus is on weather events that caused health or economic damage, all rows, that neither contain health nor economic damage are removed from the dataset.
data <- data %>% filter(FATALITIES != 0 | INJURIES != 0 | CROPDMG != 0 | PROPDMG != 0)
The types of events, that are classified in the National Weather Service Storm Data Documentation, are stored in the events character vector.
events <- c("Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood", "Cold/Wind Chill", "Debris Flow",
"Dense Fog", "Drought", "Dust Devil", "Dust Storm", "Excessive Heat", "Extreme Cold/Wind Chill",
"Flash Flood", "Frost/Freeze", "Funnel Cloud", "Freezing Fog", "Hail", "Heat", "Heavy Rain",
"Heavy Snow","High Surf", "High Wind", "Hurricane (Typhoon)", "Ice Storm", "Lake-Effect Snow", "Lakeshore Flood", "Lightning", "Marina Hail", "Marina High Wind", "Marine Strong Wind",
"Marine Thunderstorm Wind","Rip Current", "Seiche", "Sleet", "Storn Surge/Tide", "Strong Wind", "Thunderstorm Wind", "Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather")
When exploring the consonance between events, and the actual labels in the EVTYPE column in the raw data, we can see that there are multiple labels deviating from the wanted labeling:
table(unique(data$EVTYPE) %in% events)
##
## FALSE TRUE
## 483 5
That is due to misspellings and non-concrete description of the event classifications. In order to solve the problem a new column EVENT is initiated, that will hold the the correct labels in the end.
All Strings in the EVTYPE column are set to upper case letters and white spaces are removed.
# Initiate categorizing column
data$EVENT <- "NO CATEGORY"
# Establish consistency by removing white spaces and making all EVTYPES uppercase
data$EVTYPE <- toupper(trimws(data$EVTYPE))
Most EVTYPE entries differ from the wanted labels, by an -ING, -S or non-alphabetical addition.
regex <- "[^[:alpha:]]"
for (event in events) {
strip_event <- toupper(gsub(regex, "", event))
data$EVENT[gsub(regex, "", data$EVTYPE) == strip_event] <- event
data$EVENT[gsub(regex, "", data$EVTYPE) == paste(strip_event, "S", sep = "")] <- event
data$EVENT[gsub(regex, "", data$EVTYPE) == paste(strip_event, "ING", sep = "")] <- event
}
After the correction of these labels, only about 30% of labels in the data set are still wrong:
sum(data$EVENT == "NO CATEGORY") / nrow(data) * 100
## [1] 30.81062
The rest of the work was done manually, by recruiting EVTYPE classification descriptions, found in section 7 of the National Weather Service Storm Data Documentation.
CoastalFlood <- c("ASTRONOMICAL HIGH TIDE", "TIDAL FLOODING", "COSTAL FLOODING/EROSION",
"COASTAL FLOODING/EROSION" , "EROSION/CSTL FLOOD", "HIGH WINDS/COASTAL FLOOD",
"COASTAL FLOODING/EROSION", "COASTAL EROSION", "COASTAL SURGE")
WinterWeather <- c("LIGHT FREEZING RAIN", "ICY ROADS", "GLAZE", "FREEZING RAIN",
"FREEZING DRIZZLE", "LIGHT SNOW", "LIGHT SNOWFALL", "WINTER WEATHER/MIX",
"MIXED PRECIPITATION", "MIXED PRECIP", "WINTRY MIX", "RAIN/SNOW",
"WINTER WEATHER MIX", "ICE", "EXTENDED COLD", "COLD WAVE" )
HeavySnow <- c("EXCESSIVE SNOW", "SNOW", "HEAVY SNOW SHOWER", "SNOW SQUALL", "SNOW SQUALLS", "HEAVY SNOW/ICE",
"ICE AND SNOW", "BLOWING SNOW", "HEAVY SNOWPACK", "SNOW AND HEAVY SNOW", "SNOW AND ICE",
"HEAVY LAKE SNOW", "HEAVY SNOW/FREEZING RAIN", "HEAVY SNOW/WINTER STORM", "HEAVY SNOW SQUALLS" ,
"SNOW/ICE STORM", "HEAVY SNOW/SQUALLS", "HEAVY SNOW-SQUALLS", "RECORD SNOW", "SNOW ACCUMULATION",
"SNOW/ ICE", "ICE ROADS", "FALLING SNOW/ICE", "SNOW/BLOWING SNOW", "SNOW/HEAVY SNOW", "MARINE HIGH WIND" )
HighWind <- c("WIND", "WINDS", "GUSTY WINDS", "GUSTY WIND", "HIGH WIND (G40)",
"NON TSTM WIND", "NON-TSTM WIND", "WIND DAMAGE", "NON TSTM WIND",
"NON-SEVERE WIND DAMAGE", "GRADIENT WIND", "WIND STORM", "HEAVY SNOW/WIND", "HIGH WIND DAMAGE" )
Freeze <- c("FREEZE", "DAMAGING FREEZE", "EARLY FROST", "FROST", "AGRICULTURAL FREEZE",
"HARD FREEZE", "UNSEASONABLY COLD", "UNSEASONABLE COLD", "COLD", "RECORD COLD", "GLAZE/ICE STORM",
"COLD TEMPERATURE", "ICE ROADS", "ICE ON ROAD", "FREEZING RAIN/SLEET", "GLAZE ICE",
"LOW TEMPERATURE", "COLD WEATHER", "SNOW/ICE" )
ExtremeCold <- c("EXTREME WINDCHILL", "EXTREME COLD")
Flood <- c("RIVER FLOODING", "RIVER FLOOD", "URBAN/SML STREAM FLD", "URBAN FLOOD", "FLOODING",
"FLOOD", "URBAN/SMALL STREAM FLOOD", "URBAN FLOODING", "MINOR FLOODING", "FLOODS", "RURAL FLOOD",
"SMALL STREAM FLOOD", "LAKE FLOOD", "RIVER AND STREAM FLOOD", "FLOOD/RIVER FLOOD",
"MUD SLIDES URBAN FLOODING", "MAJOR FLOOD", "URBAN AND SMALL STREAM FLOODIN", "URBAN FLOODS" )
FlashFlood <- c("FLASH FLOOD/FLOOD", "FLOOD/FLASH FLOOD", "FLASH FLOOD FROM ICE JAMS", "FLASH FLOOD/ STREET",
"FLASH FLOODING/FLOOD", "FLOOD/FLASHFLOOD", "FLASH FLOOD LANDSLIDES", "FLOOD/FLASH",
"FLOOD/FLASH/FLOOD", "FLOOD FLASH", "FLASH FLOOD/LANDSLIDE" )
Thunderstorm <- c("TSTM WIND", "TSTM WINDS", "THUNDERSTORM", "THUNDERSTORMS",
"THUNDERSTORM WINDSS", "THUNDERSTORMS WINDS", "DRY MICROBURST",
"TSTM WIND (G40)", "THUNDERSTORM WIND/ TREES", "MICROBURST",
"WET MICROBURST", "THUNDERTORM WINDS", "THUNDERSTORMS WIND",
"SEVERE THUNDERSTORM WINDS", "TSTM WIND 55", "THUNDERSTORM WIND 60 MPH",
"TSTM WIND (G45)", "SEVERE THUNDERSTORM", "THUDERSTORM WINDS",
"THUNDEERSTORM WINDS", "THUNDERESTORM WINDS", "TSTM WIND 40",
"TSTM WIND G45", "TSTM WIND (G45)", "TSTM WIND (41)", "TSTM WIND 45",
"TSTM WIND (G35)", "TSTM WIND AND LIGHTNING", "TSTM WIND/HAIL",
"THUNDERSTORM WIND (G40)", "THUNDERSTORM WINS", "THUNDERSTORM WINDS LIGHTNING",
"THUNDERSTORM WINDS/HAIL", "THUNDERSTORM WINDS HAIL", "TSTM WIND G58" ,"THUNDERSTORM WIND/AWNING",
"THUNDERSTORM WIND/AWNING" , "THUNDERSTORM WIND TREES" ,"THUNDERSTORM WINDS 63 MPH" ,"THUNDERSTORM WIND/ TREE" ,
"THUNDERSTORM WIND 65MPH" , "THUNDERSTORM WIND 98 MPH", "THUNDERSTORM WINDSHAIL",
"TSTM WIND DAMAGE", "THUNDERSTORM WIND G52", "THUNERSTORM WINDS", "TUNDERSTORM WIND",
"THUNDERSTORM WIND G50", "THUNDERSTORM WIND G55", "THUNDERSTORM WIND G60",
"THUNDERSTORM WINDS G60", "THUNDERSTORM DAMAGE TO", "THUNDERSTORM WIND 65 MPH",
"THUNDERSTORM WINDS AND", "THUNDERSTORMW", "TSTMW", "TSTM WIND 65)", "THUNDERSTROM WIND",
"SEVERE THUNDERSTORMS", "DRY MIRCOBURST WINDS", "MICROBURST WINDS")
Hail <- c("HAIL DAMAGE", "SMALL HAIL", "HAILSTORM", "MARINE HAIL")
Hurricane <- c("HURRICANE", "TYPHOON", "HURRICANE OPAL", "HURRICANE ERIN",
"HURRICANE EDOUARD", "HURRICANE EMILY", "HURRICANE FELIX",
"HURRICANE GORDON", "HURRICANE OPAL/HIGH WINDS")
HeavySurf <- c("HEAVY SURF/HIGH SURF", "HEAVY SURF", "HIGH SURF ADVISORY", "HEAVY SURF AND WIND",
"HAZARDOUS SURF", "RIP CURRENTS/HEAVY SURF")
WildFire <- c("WILD/FOREST FIRE", "BRUSH FIRE")
Heat <- c("UNSEASONABLY WARM", "WARM WEATHER", "HEAT WAVE", "HEAT WAVE DROUGHT", "HEAT WAVES",
"UNSEASONABLY WARM AND DRY")
HeavyRain <- c("TORRENTIAL RAINFALL", "RAIN", "UNSEASONAL RAIN", "RECORD RAINFALL", "HEAVY RAIN/SEVERE WEATHER", "EXCESSIVE RAINFALL", "HEAVY SHOWER",
"HVY RAIN")
Tornado <- c("TORNADO F0", "TORNADO F1", "TORNADO F2", "TORNADOES", "COLD AIR TORNADO", "WATERSPOUT/ TORNADO",
"TORNADO F3", "TORNDAO", "WATERSPOUT/TORNADO", "LANDSPOUT")
TropicalStorm <- c("TROPICAL STORM ALBERTO", "TROPICAL STORM GORDON", "TROPICAL STORM JERRY", "TROPICAL STORM DEAN" )
Lightning <- c("LIGHTNING INJURY", "LIGNTNING", "LIGHTNING FIRE", "LIGHTING" )
ExcessiveHeat <- c("EXTREME HEAT", "DROUGHT/EXCESSIVE HEAT", "HYPOTHERMIA",
"HYPOTHERMIA/EXPOSURE", "HYPERTHERMIA/EXPOSURE", "RECORD HEAT", "RECORD/EXCESSIVE HEAT")
MarineStrongWind <- c("MARINE TSTM WIND", "MARINE TSTM WIND", "MARINE HIGH WIND" )
Avalanche <- c("AVALANCE")
Blizzard <- c("GROUND BLIZZARD")
data$EVENT[data$EVTYPE %in% CoastalFlood] <- "Coastal Flood"
data$EVENT[data$EVTYPE %in% WinterWeather] <- "Winter Weather"
data$EVENT[data$EVTYPE %in% HeavySnow] <- "Heavy Snow"
data$EVENT[data$EVTYPE %in% HighWind] <- "High Wind"
data$EVENT[data$EVTYPE %in% Freeze] <- "Freeze"
data$EVENT[data$EVTYPE %in% ExtremeCold] <- "Extreme Cold"
data$EVENT[data$EVTYPE %in% Flood] <- "Flood"
data$EVENT[data$EVTYPE %in% FlashFlood] <- "Flash Flood"
data$EVENT[data$EVTYPE %in% Thunderstorm] <- "Thunderstorm Wind"
data$EVENT[data$EVTYPE %in% Hail] <- "Hail"
data$EVENT[data$EVTYPE %in% Hurricane] <- "Hurricane (Typhoon)"
data$EVENT[data$EVTYPE %in% HeavySurf] <- "Heavy Surf"
data$EVENT[data$EVTYPE %in% WildFire] <- "Wild Fire"
data$EVENT[data$EVTYPE %in% Heat] <- "Heat"
data$EVENT[data$EVTYPE %in% HeavyRain] <- "Heavy Rain"
data$EVENT[data$EVTYPE %in% Tornado] <- "Tornado"
data$EVENT[data$EVTYPE %in% TropicalStorm] <- "Tropical Storm"
data$EVENT[data$EVTYPE %in% Lightning] <- "Lightning"
data$EVENT[data$EVTYPE %in% ExcessiveHeat] <- "Excessive Heat"
data$EVENT[data$EVTYPE %in% MarineStrongWind] <- "Marine Strong Wind"
data$EVENT[data$EVTYPE %in% Avalanche] <- "Avalanche"
data$EVENT[data$EVTYPE %in% Blizzard] <- "Blizzard"
After this processing step, the percentage of unclassified labels has been greatly reduced and is now at 0.3%:
sum(data$EVENT == "NO CATEGORY") / nrow(data) * 100
## [1] 0.3055378
For these EVTYPE entries there is no description of an unambiguous classification in the National Weather Service Storm Data Documentation and they are therefore excluded from the analysis. Find a list of excluded weather events in the Appendix A.
excluded <- unique(data[data$EVENT == "NO CATEGORY", "EVTYPE"])
data <- data %>% filter(EVENT != "NO CATEGORY") %>%
mutate(EVTYPE = NULL)
The PROPDMGEXP and CROPDMGEXP columns in the raw data should contain the following labels, that indicate the magnitude of PROPDMG and CROPDMG values, respectively:
| Label | Description |
|---|---|
| K,k | Thousand |
| M,m | Million |
| B,b | Billion |
Other labels are not described in the National Weather Service Storm Data Documentation for PROPDMGEXP and CROPDMGEXP. In the raw data there are multiple, undefined labels:
unique(data$CROPDMGEXP)
## [1] NA "M" "K" "m" "B" "0" "k" "?"
unique(data$PROPDMGEXP)
## [1] "K" "M" NA "B" "m" "+" "0" "5" "6" "4" "h" "2" "7" "3" "H" "-"
All measurements that do not contain PROPDMGEXP and CROPDMGEXP = K,k,M,m,B,b or NA are excluded from the analysis. All correct entries are transformed into the corresponding multipliers, all NAs are transformed to 1, and consecutively multiplied with the respective PROPDMG and CROPDMG values.
### Add an Index Key to the data for later merging
data <- data %>% mutate(INDEX = 1:nrow(data))
### Subset and Transform CROP data
CROP <- data %>% select(c("INDEX","CROPDMG","CROPDMGEXP")) %>%
filter(is.na(CROPDMGEXP) | grepl("[KkMmBb]", CROPDMGEXP))
CROP$CROPDMGEXP[is.na(CROP$CROPDMGEXP)] <- 1
CROP$CROPDMGEXP[grepl("[Kk]", CROP$CROPDMGEXP)] <- 1000
CROP$CROPDMGEXP[grepl("[Mm]", CROP$CROPDMGEXP)] <- 1000000
CROP$CROPDMGEXP[grepl("[Bb]", CROP$CROPDMGEXP)] <- 1000000000
CROP <- CROP %>% mutate(CROPDMG = CROPDMG * as.numeric(CROPDMGEXP)) %>%
mutate(CROPDMGEXP = NULL)
### Subset and Transform PROP data
PROP <- data %>% select(c("INDEX","PROPDMG","PROPDMGEXP")) %>%
filter((is.na(PROPDMGEXP) | grepl("[KkMmBb]", PROPDMGEXP)) & PROPDMG < 1000)
PROP$PROPDMGEXP[is.na(PROP$PROPDMGEXP)] <- 1
PROP$PROPDMGEXP[grepl("[Kk]", PROP$PROPDMGEXP)] <- 1000
PROP$PROPDMGEXP[grepl("[Mm]", PROP$PROPDMGEXP)] <- 1000000
PROP$PROPDMGEXP[grepl("[Bb]", PROP$PROPDMGEXP)] <- 1000000000
PROP <- PROP %>% mutate(PROPDMG = PROPDMG * as.numeric(PROPDMGEXP)) %>%
mutate(PROPDMGEXP = NULL)
The correct values for PROPDMG and CROPDMG are merged together with the corrected event labels to form a whole corrected dataset.
data <- mutate(data, CROPDMG = NULL, PROPDMG = NULL, CROPDMGEXP = NULL, PROPDMGEXP = NULL)
temp <- merge(PROP, CROP, by = "INDEX", all = TRUE)
data_correct <- merge(data, temp, by = "INDEX")
In order to tidy the dataset and to use it for an exploratory analysis, the corrected dataset is transformed to meet the following requirements:
Resulting in two tidy datasets, one for economic and the other for health observations.
fatal <- data_correct %>% select(c("EVENT", "FATALITIES")) %>%
mutate(HEALTH_DMG = factor("FATALITY")) %>%
rename("CASES" = "FATALITIES")
injur <- data_correct %>% select(c("EVENT","INJURIES")) %>%
mutate(HEALTH_DMG = factor("INJURY")) %>%
rename("CASES" = "INJURIES")
crop <- data_correct %>% select(c("EVENT","CROPDMG")) %>%
mutate(ECON_DMG = factor("CROP")) %>%
rename("ECON_DMG_ESTIMATE" = "CROPDMG")
prop <- data_correct %>% select(c("EVENT","PROPDMG")) %>%
mutate(ECON_DMG = factor("PROP")) %>%
rename("ECON_DMG_ESTIMATE" = "PROPDMG")
health <- rbind(fatal, injur)
economy <- rbind(crop,prop)
The two datasets health and economy have the form as decribed in the Tidy Data section.
str(health); str(economy)
## 'data.frame': 507710 obs. of 3 variables:
## $ EVENT : chr "Tornado" "Tornado" "Tornado" "Tornado" ...
## $ CASES : num 0 0 0 0 0 0 0 0 1 0 ...
## $ HEALTH_DMG: Factor w/ 2 levels "FATALITY","INJURY": 1 1 1 1 1 1 1 1 1 1 ...
## 'data.frame': 507710 obs. of 3 variables:
## $ EVENT : chr "Tornado" "Tornado" "Tornado" "Tornado" ...
## $ ECON_DMG_ESTIMATE: num 0 0 0 0 0 0 0 0 0 0 ...
## $ ECON_DMG : Factor w/ 2 levels "CROP","PROP": 1 1 1 1 1 1 1 1 1 1 ...
This project adresses the following questions:
EVTYPE variable), are most harmful with respect to population health?One indicator for the danger of weather events is, how much damage certain types of events have caused in total between 1950 and 2011. This total depends on the amount of damage that is caused by a single occasion of an event, as well as on the number of times this event occurs. To classify, whether an weather event is most harmful with respect to health or economic damage, both aspects are going to be taken into consideration for this analysis, as an event that causes only low damage, but occurs very frequently, might be considered as harmful as an event that only occurs occasionally, but causes high damages.
Thus, the results are divided into 3 parts:
The following code has been used to prepare the data for creating the plots shown in the Result section of this article. Find tables of the mean and total values in the Appendix.
### HEALTH DATA ###############################################################################################
# Select measurements that caused over 1000 injuries and/or deaths
# (as we are only interested in the most harmful).
health_selection <- health %>%
group_by(EVENT, HEALTH_DMG) %>%
summarize(CASES_SUM = sum(CASES)) %>%
group_by(EVENT) %>%
summarize(TOTAL = sum(CASES_SUM, na.rm = TRUE)) %>%
arrange(desc(TOTAL)) %>%
filter(TOTAL > 1000)
# Sum up health damage for all events of one type
health_sums <- health %>%
filter(EVENT %in% health_selection$EVENT) %>%
group_by(EVENT, HEALTH_DMG) %>%
summarize(CASES_SUM = sum(CASES, na.rm = TRUE)) %>%
ungroup() %>%
mutate(EVENT = factor(EVENT, levels = tapply(CASES_SUM, EVENT, sum) %>%
sort %>% names))
# Find mean health damage per event
health_means <- health %>%
filter(EVENT %in% health_selection$EVENT) %>%
mutate(CASES = na_if(CASES, 0)) %>%
group_by( EVENT) %>%
summarize(CASES =mean(CASES, na.rm = TRUE)) %>%
arrange(desc(CASES))
# Plot total Health Damage
theme_set(theme_bw())
plot_health <- ggplot(health_sums, aes(fill = HEALTH_DMG, x = EVENT, y = CASES_SUM))
plot_health+ geom_bar(position = "stack", stat = "identity")+
coord_flip() +
labs(title = "Total Health Damage caused by Storm Weather",
y = "Number of Injuries/Fatalities",
x = " ") +
scale_fill_brewer(palette = "Set1", name = " ", labels = c("Fatality", "Injury"))
# Plot mean health damage
theme_set(theme_classic())
plot_health_means <- ggplot(health_means, aes(x = EVENT, y = CASES))
plot_health_means + geom_segment(aes(x = EVENT, xend = EVENT, y = 0, yend = CASES),
linetype = "dashed",
size = 0.1) +
geom_point(size = 3) +
coord_flip() +
labs(title = "Mean Health Damage per Event",
x = "",
y = "Mean # of cases") +
annotate(geom = "text", x = "Tornado", y = 1.6, label = "#1 Total", size = 5, col = "red" ) +
annotate(geom = "text", x = "Thunderstorm Wind", y = 0.4, label = "#2 Total", size = 5, col = "red") +
annotate(geom = "text", x = "Excessive Heat", y = 6.4, label = "#3 Total", size = 5, col = "red" )
### ECONOMY DATA ##############################################################################################
# Select measurements that caused over 1 Million Dollar Damage in total
# (as we are only interested in the most harmful).
economy_selection <- economy %>%
group_by(EVENT, ECON_DMG) %>%
summarize(CASES_SUM = sum(ECON_DMG_ESTIMATE, na.rm = TRUE)) %>%
group_by(EVENT) %>%
summarize(TOTAL = sum(CASES_SUM, na.rm = TRUE)) %>%
arrange(desc(TOTAL)) %>%
filter(TOTAL > 10000000000)
# Sum up economic damage for all events of one type
economy_sums <- economy %>%
filter(EVENT %in% economy_selection$EVENT) %>%
group_by(EVENT, ECON_DMG) %>%
summarize(CASES_SUM = sum(ECON_DMG_ESTIMATE, na.rm = TRUE)) %>%
mutate(CASES_SUM = CASES_SUM/10000000000) %>%
ungroup() %>%
mutate(EVENT = factor(EVENT, levels = tapply(CASES_SUM, EVENT, sum) %>%
sort %>% names))
# Find mean economic damage per event (in Mio Dollars)
economic_means <- economy %>%
filter(EVENT %in% economy_selection$EVENT) %>%
mutate(ECON_DMG_ESTIMATE = ECON_DMG_ESTIMATE/1000000) %>%
mutate(ECON_DMG_ESTIMATE = na_if(ECON_DMG_ESTIMATE, 0)) %>%
group_by(EVENT) %>%
summarize(MEAN_DMG = mean(ECON_DMG_ESTIMATE, na.rm = TRUE)) %>%
arrange(desc(MEAN_DMG))
# Plot total economic damage
theme_set(theme_bw())
plot_economy <- ggplot(economy_sums, aes(fill = ECON_DMG, x = EVENT, y = CASES_SUM))
plot_economy + geom_bar(position = "stack", stat = "identity") +
coord_flip() +
labs(title = "Total Economic Damage Caused by Storm Weather",
y = "Economic Damage in Million Dollars",
x = "") +
scale_fill_brewer(palette = "Set1", name = " ", labels = c("Crop Damage", "Property Damage"))
# Plot Mean Economic Damage
theme_set(theme_classic())
plot_economic_means <- ggplot(economic_means, aes(x = EVENT, y = MEAN_DMG))
plot_economic_means + geom_point(size = 3) +
geom_segment(aes(x = EVENT, xend = EVENT, y = min(MEAN_DMG), yend = MEAN_DMG),
linetype = "dashed",
size = 0.1) +
coord_flip() +
labs(title = "Mean Economic Damage per Event",
x = "",
y = "Mean Damage per Event in Mio Dollars") +
annotate(geom = "text", x = "Flood", y = 20, label = "#1 Total", size = 5, col = "blue") +
annotate(geom = "text", x = "Hurricane (Typhoon)", y = 185, label = "#2 Total", size = 5, col = "blue") +
annotate(geom = "text", x = "Tornado", y = 12, label = "#3 Total", size = 5, col = "blue")
### Number of occurances #######################################################################################
events_count <- data_correct$EVENT
events_count[!(events_count %in% health_selection$EVENT) & !(events_count %in% economy_selection$EVENT )]<- "Other"
# Plot Number of Occurances
plot_events <- ggplot(data = data.frame(events_count), aes(x = events_count))
plot_events + geom_bar(stat = "count", fill = "grey") +
labs(title = "Occurances of Events Causing Economic and/or Health Damage",
x = "",
y = "Count") +
coord_flip() +
ylim(c(0, 150000)) +
annotate(geom = "text", x = "Tornado", y = 50000, label = "#1 Health", size = 3, col = "red" ) +
annotate(geom = "text", x = "Thunderstorm Wind", y = 130000, label = "#2 Health", size = 3, col = "red" ) +
annotate(geom = "text", x = "Excessive Heat", y = 10000, label = "#3 Health", size = 3 , col = "red") +
annotate(geom = "text", x = "Flood", y = 23000, label = "#1 Economy", size = 3, col = "blue" ) +
annotate(geom = "text", x = "Hurricane (Typhoon)", y = 10000, label = "#2 Economy", size = 3, col = "blue") +
annotate(geom = "text", x = "Tornado", y = 70000, label = "#3 Economy", size = 3, col = "blue" )
The results are divided into three parts described in the Aim and Concept section.
In the following paragraph the total economic and health damage of all recorded events is explored and the Top 3 events for each type of damage are determined. There are only event types shown, that caused more than 1000 health cases (fatalities + injuries) or caused total economic damages (crop + prop) of 1 Billion Dollar or more.
The Top 3 most harmful events for public health appear to be Tornadoes, Thunderstorms and Excessive Heat. The damage caused by Tornadoes thereby greatly exceeds the damage of all other causes.
The Top 3 most harmful events for economy appear to be Floods, Hurricanes and also Tornadoes.
In this section the mean damage per single event is explored. To simplify the analysis injuries and fatalities are summed up and considered as general health damages, same for agricultural crop and property damage.
The Top 3 of events that cause the most public health damage per single event are Hurricanes, Ice Storms and Floods. The Top 3 of events that cause the most economic damage per single event are Hurricanes, Droughts and Floods. Hurricanes thereby greatly exceed all other events.
Note, that the events that cause the most mean damage and the events that caused the highest total damage, differ. That is because an event that causes only low damage in a single event, but occurs frequently (e.g. economic damage Tornadoes), might cause the same total damage, and is thereby as dangerous as an event that causes high damage in a single event, but only occurs rarely.
To reassure the findings in the Mean Damage section, the following plot illustrates the number of times these event types occurred between 1950 and 2011. The Top 3 of total health damage and economy are marked in red and blue, respectively.
The Top 3 of most frequent weather events in the US between 1950 and 2011 are Thunderstorm Winds, Tornadoes and Hail.
The order and classification of weather events according to public health and economic damage, depends on the perspective, i.e. whether the interest lies in the average damage per single event of one type or in the total damage that is caused by a type of event over time. After this analysis, which distinguished between the average damage and the total damage, the following conclusions can be made:
The Top 3 event types that caused the highest total damage to public health appear to be Tornadoes, Thunderstorms and Excessive Heat.
The Top 3 most harmful events for economy in total appear to be Floods, Hurricanes and also Tornadoes. In terms of public health, Thunderstorms are an event that cause a comparably low damage on average, but are a frequent phenomenon. That makes them equally dangerous as Excessive Heats, which occur very rarely, but cause a high damage on average.
In terms of economy, Hurricanes behave similar to Excessive Heats in public health: They cause a very high damage, but are rare. Floods on the other hand behave strangely: They show a low economic damage on average AND they are comparably rare, but still cause the highest amount of economic damage in total. That might indicate, that there has been a single flood (or only a few), that had severe economic consequences, while most floods only cause mild damage. That can lead to an overall low mean and low occurance, but when summed up, the total damage is very high.
Tornadoes occur in both the Top 3 for public health and Top 3 for economic damage. In terms of economic damage, they cause a low damage on average, but the public health damage per single event is comparably high. That damage potential in both categories plus the high frequency in which Tornadoes occur, makes this event type the most dangerous among all others.
EVTYPE Entries excluded
## [1] "ICE STORM/FLASH FLOOD" "LIGHTNING AND HEAVY RAIN"
## [3] "HEAVY RAIN/LIGHTNING" "FLASH FLOODING/THUNDERSTORM WI"
## [5] "LIGHTNING/HEAVY RAIN" "BREAKUP FLOODING"
## [7] "HIGH WINDS HEAVY RAINS" "MARINE MISHAP"
## [9] "HIGH TIDES" "HIGH WIND/SEAS"
## [11] "HIGH WINDS/HEAVY RAIN" "HIGH SEAS"
## [13] "SEVERE TURBULENCE" "APACHE COUNTY"
## [15] "THUNDERSTORM WINDS/FUNNEL CLOU" "FLOODING/HEAVY RAIN"
## [17] "HEAVY SURF COASTAL FLOODING" "HIGH"
## [19] "WINTER STORM HIGH WINDS" "MUDSLIDES"
## [21] "RAINSTORM" "FLOOD/RAIN/WINDS"
## [23] "FLASH FLOOD WINDS" "WATERSPOUT TORNADO"
## [25] "STORM SURGE" "TORNADOES, TSTM WIND, HAIL"
## [27] "LIGHTNING THUNDERSTORM WINDS" "WATERSPOUT-TORNADO"
## [29] "LIGHTNING AND THUNDERSTORM WIN" "FREEZING RAIN/SNOW"
## [31] "THUNDERSNOW" "COOL AND WET"
## [33] "HEAVY RAIN/SNOW" "MUD SLIDE"
## [35] "MUD SLIDES" "COLD AND WET CONDITIONS"
## [37] "EXCESSIVE WETNESS" "SLEET/ICE STORM"
## [39] "GUSTNADO" "EXTREME WIND CHILL"
## [41] "ICE JAM FLOODING" "FOG"
## [43] "HAIL/WINDS" "GRASS FIRES"
## [45] "HAIL/WIND" "WIND/HAIL"
## [47] "SNOW AND ICE STORM" "THUNDERSTORM WIND/LIGHTNING"
## [49] "HURRICANE-GENERATED SWELLS" "ICE FLOES"
## [51] "DUST DEVIL WATERSPOUT" "BLIZZARD/WINTER STORM"
## [53] "DUST STORM/HIGH WINDS" "ICE JAM"
## [55] "FOREST FIRES" "HEAVY SNOW AND HIGH WINDS"
## [57] "HEAVY SNOW/HIGH WINDS & FLOOD" "HEAVY RAIN AND FLOOD"
## [59] "URBAN AND SMALL" "FOG AND COLD TEMPERATURES"
## [61] "SNOW/COLD" "MUDSLIDE"
## [63] "HEAVY MIX" "SNOW FREEZING RAIN"
## [65] "SNOW/SLEET" "SNOW/FREEZING RAIN"
## [67] "SNOW/SLEET/FREEZING RAIN" "FLASH FLOOD - HEAVY RAIN"
## [69] "HEAVY SNOW/BLIZZARD" "THUNDERSTORM HAIL"
## [71] "LIGHTNING WAUSEON" "STORM FORCE WINDS"
## [73] "HIGH WIND/HEAVY SNOW" "HEAVY PRECIPITATION"
## [75] "HIGH WIND/BLIZZARD" "RAIN/WIND"
## [77] "HEAVY SNOW/BLIZZARD/AVALANCHE" "HIGH WAVES"
## [79] "HEAVY RAINS/FLOODING" "THUNDERSTORM WINDS/FLOODING"
## [81] "HIGH WINDS/COLD" "COLD/WINDS"
## [83] "SNOW/ BITTER COLD" "RAPIDLY RISING WATER"
## [85] "ICE/STRONG WINDS" "SNOW/HIGH WINDS"
## [87] "HIGH WINDS/SNOW" "SNOWMELT FLOODING"
## [89] "HEAVY SNOW AND STRONG WINDS" "THUNDERSTORM WIND/HAIL"
## [91] "THUNDERSTORM WINDS/ FLOOD" "LANDSLIDE"
## [93] "HIGH WIND AND SEAS" "WILD/FOREST FIRES"
## [95] "HEAVY SEAS" "FLOOD & HEAVY RAIN"
## [97] "?" "HIGH WATER"
## [99] "LANDSLIDES" "URBAN/SMALL STREAM"
## [101] "HEAVY SWELLS" "URBAN SMALL"
## [103] "HEAVY RAIN/SMALL STREAM URBAN" "OTHER"
## [105] "ICE JAM FLOOD (MINOR" "ROUGH SURF"
## [107] "MARINE ACCIDENT" "COASTAL STORM"
## [109] "BEACH EROSION" "HEAVY RAIN/HIGH SURF"
## [111] "LANDSLUMP" "WHIRLWIND"
## [113] "FREEZING SPRAY" "DOWNBURST"
## [115] "GUSTY WIND/RAIN" "GUSTY WIND/HVY RAIN"
## [117] "COLD AND SNOW" "BLACK ICE"
## [119] "COASTALSTORM" "DAM BREAK"
## [121] "HIGH SWELLS" "ROCK SLIDE"
## [123] "GUSTY WIND/HAIL" "WIND AND WAVE"
## [125] "ROUGH SEAS" "LATE SEASON SNOW"
## [127] "ROGUE WAVE" "BLOWING DUST"
## [129] "DROWNING" "STORM SURGE/TIDE"
## [131] "DENSE SMOKE"