This study involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
This is an analysis done on “NOAA Storm Data”. It covers the period of 1950 to 2011 with a number of 902297 records. The analysis is focused to answer the following questions:
The Analysis founds, in the past 60 years, tornados are most harmful with respect to population health, injuries and floods have the greatest economic consequences, which have cause over 138 billion dollars economic losses.
Throughout this report when writing code chunks in the R markdown document, always use echo = TRUE so that someone else will be able to read the code. First, we set echo equal a TRUE and results equal a ‘hold’ as global options for this document. The document was prepared with R version x64 3.1.2, RStudio Version 0.98.1087 on Win8.1 OS.
library(knitr)
library(ggplot2)
library(plyr)
This assignment makes use of data on “NOAA Storm Data”: Storm Database
Working directory:
setwd("E:/1. Data/4. COURSES E-LEARNING/7_DATA SCIENCE ANALYSIS SPECIALIZATION/5_REPRODUCIBLE RESEARCH/Week 3/Peer assessment 2")
stormData <- read.csv("repdata_data_StormData.csv", header = TRUE, stringsAsFactors = FALSE)
FATALITIES
convert <- function(dataset = stormData, fieldName, newFieldName) {
totalLen <- dim(dataset)[2]
index <- which(colnames(dataset) == fieldName)
dataset[, index] <- as.character(dataset[, index])
logic <- !is.na(toupper(dataset[, index]))
dataset[logic & toupper(dataset[, index]) == "B", index] <- "9"
dataset[logic & toupper(dataset[, index]) == "M", index] <- "6"
dataset[logic & toupper(dataset[, index]) == "K", index] <- "3"
dataset[logic & toupper(dataset[, index]) == "H", index] <- "2"
dataset[logic & toupper(dataset[, index]) == "", index] <- "0"
dataset[, index] <- as.numeric(dataset[, index])
dataset[is.na(dataset[, index]), index] <- 0
dataset <- cbind(dataset, dataset[, index - 1] * 10^dataset[, index])
names(dataset)[totalLen + 1] <- newFieldName
return(dataset)
}
stormData <- convert(stormData, "PROPDMGEXP", "propertyDamage")
stormData <- convert(stormData, "CROPDMGEXP", "cropDamage")
sort <- function(fieldName, top = 20, dataset = stormData) {
index <- which(colnames(dataset) == fieldName)
field <- aggregate(dataset[, index], by = list(dataset$EVTYPE), FUN = "sum")
names(field) <- c("EVTYPE", fieldName)
field <- arrange(field, field[, 2], decreasing = T)
field <- head(field, n = top)
field <- within(field, EVTYPE <- factor(x = EVTYPE, levels = field$EVTYPE))
return(field)
}
options(scipen=999)
fatalities <- sort("FATALITIES", dataset = stormData)
injuries <- sort("INJURIES", dataset = stormData)
property <- sort("propertyDamage", dataset = stormData)
crop <- sort("cropDamage", dataset = stormData)
ggplot(data = fatalities, aes(x = fatalities$EVTYPE, y = fatalities$FATALITIES)) + geom_bar(colour = "white", fill = "blue", stat = "identity") + xlab("Event Type") + ylab("Number of Fatalities") + ggtitle("Total number of fatalities in U.S., 1950 - 2011") + theme(axis.text.x = element_text(angle = 90, hjust = 1))
The bar plot above shows that “TORNADO” causes the maximum Fatalities.
INJURIES
ggplot(data = injuries, aes(x = injuries$EVTYPE, y = injuries$INJURIES)) + geom_bar(colour = "white", fill = "blue", stat = "identity") + xlab("Event Type") + ylab("Number of Injuries") + ggtitle("Total number of Injuries in U.S., 1950 - 2011") + theme(axis.text.x = element_text(angle = 90, hjust = 1))
The bar plot above shows that “TORNADO” causes the maximum Injuries.
ggplot(data = property, aes(x = property$EVTYPE, y = property$propertyDamage)) + geom_bar(colour = "white", fill = "blue", stat = "identity") + xlab("Severe Weather Type") + ylab("Property Damage in US dollars") + ggtitle("Total Property Damage by Severe Weather Events in\n the U.S. from 1950 - 2011") + theme(axis.text.x = element_text(angle = 90, hjust = 1))
ggplot(data = crop, aes(x = crop$EVTYPE, y = crop$cropDamage)) + geom_bar(colour = "white", fill = "blue", stat = "identity") + xlab("Severe Weather Type") + ylab("Crop Damage in US dollars") + ggtitle("Total Crop Damage by Severe Weather Events in\n the U.S. from 1950 - 2011") + theme(axis.text.x = element_text(angle = 90, hjust = 1))
We found that excessive heat and tornado are most harmful with respect to population health, while flood, drought, and hurricane/typhoon have the greatest economic consequences.