Synopsis

This study involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

This is an analysis done on “NOAA Storm Data”. It covers the period of 1950 to 2011 with a number of 902297 records. The analysis is focused to answer the following questions:

  1. Which events are most harmful with respect to population health?
  2. Which events have the greatest economic consequences?

The Analysis founds, in the past 60 years, tornados are most harmful with respect to population health, injuries and floods have the greatest economic consequences, which have cause over 138 billion dollars economic losses.


Prepare the R environment

Throughout this report when writing code chunks in the R markdown document, always use echo = TRUE so that someone else will be able to read the code. First, we set echo equal a TRUE and results equal a ‘hold’ as global options for this document. The document was prepared with R version x64 3.1.2, RStudio Version 0.98.1087 on Win8.1 OS.

Load required libraries
library(knitr)
library(ggplot2)
library(plyr)

Data Loading and preprocessing

This assignment makes use of data on “NOAA Storm Data”: Storm Database

Data Loading

Working directory:

setwd("E:/1. Data/4. COURSES E-LEARNING/7_DATA SCIENCE ANALYSIS SPECIALIZATION/5_REPRODUCIBLE RESEARCH/Week 3/Peer assessment 2")

stormData <- read.csv("repdata_data_StormData.csv", header = TRUE, stringsAsFactors = FALSE)

Results

1. Which types of events are most harmful with respect to population health?

FATALITIES

convert <- function(dataset = stormData, fieldName, newFieldName) {
    totalLen <- dim(dataset)[2]
    index <- which(colnames(dataset) == fieldName)
    dataset[, index] <- as.character(dataset[, index])
    logic <- !is.na(toupper(dataset[, index]))
    dataset[logic & toupper(dataset[, index]) == "B", index] <- "9"
    dataset[logic & toupper(dataset[, index]) == "M", index] <- "6"
    dataset[logic & toupper(dataset[, index]) == "K", index] <- "3"
    dataset[logic & toupper(dataset[, index]) == "H", index] <- "2"
    dataset[logic & toupper(dataset[, index]) == "", index] <- "0"
    dataset[, index] <- as.numeric(dataset[, index])
    dataset[is.na(dataset[, index]), index] <- 0
    dataset <- cbind(dataset, dataset[, index - 1] * 10^dataset[, index])
    names(dataset)[totalLen + 1] <- newFieldName
    return(dataset)
}

stormData <- convert(stormData, "PROPDMGEXP", "propertyDamage")
stormData <- convert(stormData, "CROPDMGEXP", "cropDamage")
sort <- function(fieldName, top = 20, dataset = stormData) {
    index <- which(colnames(dataset) == fieldName)
    field <- aggregate(dataset[, index], by = list(dataset$EVTYPE), FUN = "sum")
    names(field) <- c("EVTYPE", fieldName)
    field <- arrange(field, field[, 2], decreasing = T)
    field <- head(field, n = top)
    field <- within(field, EVTYPE <- factor(x = EVTYPE, levels = field$EVTYPE))
    return(field)
}
options(scipen=999)
fatalities <- sort("FATALITIES", dataset = stormData)
injuries <- sort("INJURIES", dataset = stormData)
property <- sort("propertyDamage", dataset = stormData)
crop <- sort("cropDamage", dataset = stormData)
ggplot(data = fatalities, aes(x = fatalities$EVTYPE, y = fatalities$FATALITIES)) +  geom_bar(colour = "white", fill = "blue", stat = "identity") + xlab("Event Type") +  ylab("Number of Fatalities") + ggtitle("Total number of fatalities in U.S., 1950 - 2011") +  theme(axis.text.x = element_text(angle = 90, hjust = 1))

plot of chunk fatalities

The bar plot above shows that “TORNADO” causes the maximum Fatalities.

INJURIES

ggplot(data = injuries, aes(x = injuries$EVTYPE, y = injuries$INJURIES)) +  geom_bar(colour = "white", fill = "blue", stat = "identity") + xlab("Event Type") +  ylab("Number of Injuries") + ggtitle("Total number of Injuries in U.S., 1950 - 2011") +  theme(axis.text.x = element_text(angle = 90, hjust = 1))

plot of chunk injuries

The bar plot above shows that “TORNADO” causes the maximum Injuries.


2. Which events have the greatest economic consequences?
ggplot(data = property, aes(x = property$EVTYPE, y = property$propertyDamage)) +  geom_bar(colour = "white", fill = "blue", stat = "identity") + xlab("Severe Weather Type") +  ylab("Property Damage in US dollars") + ggtitle("Total Property Damage by Severe Weather Events in\n the U.S. from 1950 - 2011") +  theme(axis.text.x = element_text(angle = 90, hjust = 1))

plot of chunk plotdamage

ggplot(data = crop, aes(x = crop$EVTYPE, y = crop$cropDamage)) +  geom_bar(colour = "white", fill = "blue", stat = "identity") + xlab("Severe Weather Type") +  ylab("Crop Damage in US dollars") + ggtitle("Total Crop Damage by Severe Weather Events in\n the U.S. from 1950 - 2011") +  theme(axis.text.x = element_text(angle = 90, hjust = 1))

plot of chunk plotdamage


Conclusions

We found that excessive heat and tornado are most harmful with respect to population health, while flood, drought, and hurricane/typhoon have the greatest economic consequences.