This project involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. In this analysis we explore health (injuries, fatalities) and economic (property damage) consequences of severe weather events in the US. Data is loaded from the Internet, cleaned slightly, summarized by event type, and two plots that indicate event types with worst socioeconomic consequences are produced.
packages <- c("R.utils", "ggplot2", "grid", "gridExtra", "Hmisc")
lapply(packages, library, character.only = TRUE)
## Loading file
if (!file.exists("data")) {dir.create("data")}
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if (!file.exists("./data/StormData.csv.bz2"))
{download.file(fileUrl, destfile = "./data/StormData.csv.bz2", method = "curl")}
if (!file.exists("./data/StormData.csv"))
{bunzip2(filename = "./data/StormData.csv.bz2", remove = FALSE)}
stormData <- read.csv("./data/StormData.csv")
## Cleaning of PROPDMG and PROPDMGEXP variables
stormData$PROPDMGEXP <- toupper(stormData$PROPDMGEXP)
## Allowed exponent values: 0, H, K, M, B, which stand for 10^0, 10^2, 10^3, 10^6, 10^9
## We'll just assume that "dirty" values are to be multiplied by 10^0.
allowed <- data.frame(char = c("0", "H", "K", "M", "B"),
val = c(10^0, 10^2, 10^3, 10^6, 10^9))
stormData$PROPDMGEXP[which(stormData$PROPDMGEXP %nin% allowed$char)] <- 0
for (i in nrow(allowed)) stormData$PROPDMG[stormData$PROPDMGEXP == allowed$char[i]] <-
stormData$PROPDMG[stormData$PROPDMGEXP == allowed$char[i]] * allowed$val[i]
## Plotting
## Plot1 - Across the United States, which types of events (as indicated in the
## EVTYPE variable) are most harmful with respect to population health?
plot1data <- aggregate(cbind(INJURIES, FATALITIES) ~ EVTYPE, stormData, sum)
#top n events
df1 <- droplevels(head(plot1data[order(plot1data$INJURIES, decreasing=TRUE),], n = 6))
df2 <- droplevels(head(plot1data[order(plot1data$FATALITIES, decreasing=TRUE),], n = 6))
plot1.1 <- ggplot(df1,
aes(reorder(EVTYPE, -INJURIES),
INJURIES)) + geom_bar(stat = "identity") + xlab("EVENT TYPE")
plot1.2 <- ggplot(df2,
aes(reorder(EVTYPE, -FATALITIES),
FATALITIES)) + geom_bar(stat = "identity") + xlab("EVENT TYPE")
## Plot2 - Across the United States, which types of events
## have the greatest economic consequences?
plot2data <- aggregate(PROPDMG ~ EVTYPE, stormData, sum)
df3 <- droplevels(head(plot2data[order(plot2data$PROPDMG, decreasing=TRUE),], n = 6))
plot2 <- ggplot(df3, aes(reorder(EVTYPE, -PROPDMG), PROPDMG)) +
geom_bar(aes(fill = EVTYPE), stat = "identity") + xlab("EVENT TYPE") +
labs(title = "Most Harmful Events By Economic Consequences")
grid.arrange(plot1.1, plot1.2, nrow = 2,
main = "Most Harmful To Population Health Events In The US")
It appears that maximum number of injuries per event type is 9.1346 × 104, while mean value is 142.668. Maximum number of fatalities is 5633.
plot2
From the economic perspective, the most damage is done by FLOOD, 1.225 × 1011. The mean value is 2.8006 × 108.