Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The exploration of US NOAA Storm Data revealed that tornadoes are by far the most dangerous weather events. Excessive heat is the 2nd most deadly event, but there is a significant difference between the tornadoes and excessive heat.
If we look at the economic downsides of weather events, floods are by far the most destructive weather events in the US. Floods, hurricanes, typhoons, tornadoes, river floods are caused over hundred billions of dollars economic damage.
Firstly, we need to load necessary libraries.
library(ggplot2)
library(scales)
Then, we will download the data.
if(!file.exists("./data")){
dir.create("./data")
}
fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL, destfile = "./data/rrproject2.csv.bz2")
After downloading the data, we will read it into R and extract the columns needed for this analysis. Original data is subsetted for further analysis.
data <- read.csv("./data/rrproject2.csv.bz2")
colnames(data)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
subset_data <- data[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP",
"CROPDMG", "CROPDMGEXP")]
sum(is.na(subset_data))
## [1] 0
aggregate_fatalities <- aggregate(FATALITIES ~ EVTYPE, data = subset_data, sum)
aggregate_fatalities <- aggregate_fatalities[order(-aggregate_fatalities$FATALITIES), ]
aggregate_fatalities <- aggregate_fatalities[1:5, ]
aggregate_injuries <- aggregate(INJURIES ~ EVTYPE, data = subset_data, sum)
aggregate_injuries <- aggregate_injuries[order(-aggregate_injuries$INJURIES), ]
aggregate_injuries <- aggregate_injuries[1:5, ]
subset_economic <- subset(subset_data, subset_data$PROPDMGEXP == "K" | subset_data$PROPDMGEXP ==
"k" | subset_data$PROPDMGEXP == "M" | subset_data$PROPDMGEXP == "m" |
subset_data$PROPDMGEXP == "B" | subset_data$PROPDMGEXP == "b")
subset_economic <- subset(subset_data, subset_data$CROPDMGEXP == "K" | subset_data$CROPDMGEXP ==
"k" | subset_data$CROPDMGEXP == "M" | subset_data$CROPDMGEXP == "m" |
subset_data$CROPDMGEXP == "B" | subset_data$CROPDMGEXP == "b")
subset_economic$PROPDMGEXP <- gsub("m", 1e+06, subset_economic$PROPDMGEXP, ignore.case = TRUE)
subset_economic$PROPDMGEXP <- gsub("k", 1000, subset_economic$PROPDMGEXP, ignore.case = TRUE)
subset_economic$PROPDMGEXP <- gsub("b", 1e+09, subset_economic$PROPDMGEXP, ignore.case = TRUE)
subset_economic$PROPDMGEXP <- as.numeric(subset_economic$PROPDMGEXP)
subset_economic$CROPDMGEXP <- gsub("m", 1e+06, subset_economic$CROPDMGEXP, ignore.case = TRUE)
subset_economic$CROPDMGEXP <- gsub("k", 1000, subset_economic$CROPDMGEXP, ignore.case = TRUE)
subset_economic$CROPDMGEXP <- gsub("b", 1e+09, subset_economic$CROPDMGEXP, ignore.case = TRUE)
subset_economic$CROPDMGEXP <- as.numeric(subset_economic$CROPDMGEXP)
subset_economic$TOTALDMG <- (subset_economic$CROPDMG * subset_economic$CROPDMGEXP) +
(subset_economic$PROPDMG * subset_economic$PROPDMGEXP)
aggregate_economic <- aggregate(TOTALDMG ~ EVTYPE, data = subset_economic, sum)
aggregate_economic <- aggregate_economic[order(-aggregate_economic$TOTALDMG), ]
aggregate_economic <- aggregate_economic[1:5, ]
Following barplots are used to examine which weather events are most harmful to humans and have greatest impact on economy.
g <- ggplot(aggregate_fatalities, aes(x = EVTYPE, y = FATALITIES)) +
geom_bar(stat = "identity", fill = "#FF6666") +
ylim(0, 6000) +
xlab("Event Type") +
ylab("Number of Fatalities") +
ggtitle("Major Causes of Fatalities") +
theme(plot.title = element_text(hjust = 0.5))
print(g)
g <- ggplot(aggregate_injuries, aes(x = EVTYPE, y = INJURIES)) +
geom_bar(stat = "identity", fill = "#FF6666") +
ylim(0, 100000) +
xlab("Event Type") +
ylab("Number of Injuries") +
ggtitle("Major Causes of Injuries") +
theme(plot.title = element_text(hjust = 0.5))
print(g)
g <- ggplot(aggregate_economic, aes(x = EVTYPE, y = TOTALDMG)) +
geom_bar(stat = "identity", fill = "#FF6666") +
xlab("Event Type") +
ylab("Total Damage Done ($)") +
ggtitle("Major Causes of Economic Damage") +
theme(plot.title = element_text(hjust = 0.5)) +
scale_y_continuous(labels = comma)
print(g)
From barplots that shown above, as a conclusion, tornadoes are most hazardous weather events to human health while floods have the greatest impact on the economy.