Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The goal of the assignment is to explore the NOAA Storm Database and explore the effects of severe weather events on both population and economy. The analysis aims to investigate:
Across the United States, which types of events are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
library(dplyr)
setwd("/Users/junwen/Documents/Coursera/Data Sciences/Reproducible Research")
df <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))
data <- df[c("EVTYPE", "FATALITIES", "INJURIES","PROPDMG","PROPDMGEXP", "CROPDMG",
"CROPDMGEXP")]
There are 7 variables we are interested regarding the two questions. They are: EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP.
For both property damage and crop damage there’s a column recording a multiplier for each observation. The columns are: PROPDMGEXP and CROPDMGEXP. We create new variables DamageCost representing the absolute total values of property and crop damages.
levels(data$CROPDMGEXP)
## [1] "" "?" "0" "2" "B" "k" "K" "m" "M"
levels(data$PROPDMGEXP)
## [1] "" "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
data <- mutate(data, UNIT = ifelse(PROPDMGEXP %in% c('B', 'b'), 9,
ifelse(PROPDMGEXP %in% c('M', 'm'), 6,
ifelse(PROPDMGEXP %in% c('K', 'k'), 3,
ifelse(PROPDMGEXP %in% c('H', 'h'), 2, 0)))))
DamageCost <- data$PROPDMG * 10^data$UNIT + data$CROPDMG * 10^data$UNIT
data <- cbind(data, DamageCost)
In this section, we’ll explore the economic consequences caused by the events.
library(ggplot2)
# Plotting the Number of Fatalities By the Most Harmful Event Types
subset1 <- aggregate(FATALITIES ~ EVTYPE, data = data, mean, na.rm = TRUE)
subset1 <- subset1[order(-subset1$FATALITIES), ][1:10, ]
subset1$EVTYPE <- factor(subset1$EVTYPE, levels = subset1$EVTYPE)
ggplot(subset1, aes(x = EVTYPE, y = FATALITIES)) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("EVENT TYPE") + ylab("FATALITIES") +
ggtitle("Average Fatalities by Top 10 Weather Events")
subset2 <- aggregate(INJURIES ~ EVTYPE, data = data, mean, na.rm = TRUE)
subset2 <- subset2[order(-subset2$INJURIES), ][1:10, ]
subset2$EVTYPE <- factor(subset2$EVTYPE, levels = subset2$EVTYPE)
ggplot(subset2, aes(x = EVTYPE, y = INJURIES)) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("EVENT TYPE") + ylab("INJURIES") +
ggtitle("Average INJURIES by Top 10 Weather Events")
subset3 <- aggregate(DamageCost ~ EVTYPE, data = data, mean, na.rm = TRUE)
subset3 <- subset3[order(-subset3$DamageCost), ][1:10, ]
subset3$EVTYPE <- factor(subset3$EVTYPE, levels = subset3$EVTYPE)
ggplot(subset3, aes(x = EVTYPE, y = DamageCost)) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("EVENT TYPE") + ylab("DamageCost") +
ggtitle("Average DamageCost by Top 10 Weather Events")
As you can see from previous plots: