Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. My project shows which events lead to the greatest economic consequences, measured by crop and property damages. This project also shows the levels at which events impact population health, measured by fatalities and injuries.
Code below is used to download, and load the data
library(tidyverse)
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
setwd("C:\\Users\\tsuim\\Documents\\R\\JHU Data Course\\Module 5\\Project 2")
download.file(url, destfile = "C:\\Users\\tsuim\\Documents\\R\\JHU Data Course\\Module 5\\Project 2\\dataset.csv.bz2")
# Read data
storm_data <- read.csv(bzfile("dataset.csv.bz2"))
Analysis below cleans the data variables: event type, fatalities, and injuries. Plot is generated for visualization.
# Find what types of event is most harmful to population health
death_inj <- aggregate(INJURIES + FATALITIES ~ EVTYPE,
storm_data, sum)
popHealth <- death_inj[order(death_inj$`INJURIES + FATALITIES`,
decreasing = TRUE),] # Sort by descending to see which event is most harmful
popHealth[1,] # Display of first row shows that tornado have the greatest harm to pop health
## EVTYPE INJURIES + FATALITIES
## 834 TORNADO 96979
colnames(popHealth) <- c('Event_Type', 'Injuries_And_Deaths') # Rename columns
# Plot for population health
popPlot <- ggplot(popHealth[1:5,], aes(Event_Type, Injuries_And_Deaths)) +
geom_bar(stat="identity") +
theme(text = element_text(size=20),
axis.text.x = element_text(angle=90, hjust=1)) +
xlab("Event Type") +
ylab("Total Fatalities") +
ggtitle("Most Fatal Events") +
theme(plot.title = element_text(hjust = 0.5))
popPlot
Fig. 1: Top 5 Events with Greatest Impact to Population Health
Analysis below cleans the data variables: event type, crop damage, crop damage exponent, property damage, and property damage exponent. Data is preprocessed by examining the unique values in the exponent column. I subset the values for the exponent columns so that I can multiply the damage and the exponent values to get the total damage value. I total crop and property damage after that.
Plot is generated for visualization.
# Find which type of events have the greatest economic consequences
# Check unique types of magnitude exponents
unique(storm_data$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
unique(storm_data$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
length(unique(storm_data$PROPDMGEXP)) # There are 19 different values.
## [1] 19
length(unique(storm_data$CROPDMGEXP)) # There are 9 different values.
## [1] 9
# Create smaller data frame for economics
econ <- storm_data[,c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
# Upper case exponent abbreviations
econ$PROPDMGEXP <- toupper(econ$PROPDMGEXP)
econ$CROPDMGEXP <- toupper(econ$CROPDMGEXP)
# Assign integer values to exponents
labels = c(1E3, 1E6, 1E0, 1E9, 1E0, 1E0, 1E5, 1E6, 1E0, 1E4, 1E2, 1E3, 1E2, 1E7, 1E0, 1E1, 1E8)
levels = c("K", "M", "", "B", "+", "0", "5", "6", "?", "4", "2", "3", "H", "7", "-", "1", "8")
econ$PROPDMGEXP <- factor(econ$PROPDMGEXP, levels = levels, labels = labels)
labels_2 = c(1E0, 1E6, 1E3, 1E9, 1E0, 1E0, 1E2)
levels_2 = c("", "M", "K", "B", "?", "0", "2")
econ$CROPDMGEXP <- factor (econ$CROPDMGEXP, levels = levels_2, labels = labels_2)
# Multiply columns to get total economic damage column
econ$PROPDMGTOTAL <- as.numeric(as.character(econ$PROPDMGEXP)) * econ$PROPDMG
econ$CROPDMGTOTAL <- as.numeric(as.character(econ$CROPDMGEXP)) * econ$CROPDMG
econ$DAMAGETOTAL <- econ$PROPDMGTOTAL + econ$CROPDMGTOTAL
# Find what types of event has the greatest economic consequences
damage <- aggregate(DAMAGETOTAL ~ EVTYPE, econ, sum)
damage_ordered <- damage[order(damage$DAMAGETOTAL,
decreasing = TRUE),] # Sort by descending to see which event is most harmful
damage_ordered[1,] # Display of first row shows that tornado have the greatest harm to pop health
## EVTYPE DAMAGETOTAL
## 170 FLOOD 150319678257
# Plot for population health
econPlot <- ggplot(damage_ordered[1:5,], aes(EVTYPE, DAMAGETOTAL)) +
geom_bar(stat="identity") +
theme(text = element_text(size=20),
axis.text.x = element_text(angle=90, hjust=1)) +
xlab("Event Type") +
ylab("Damage Total") +
ggtitle("Events with Greatest Economic Impact") +
theme(plot.title = element_text(hjust = 0.5))
econPlot
Fig. 2: Top 5 Events with Greatest Impact to Economic
The results above indicate the following: