Synopsis

In this report we aim to identify the types of events which are most harmful with respect to population health and greatest economic consequences across the United States. This report involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The events in the database start in the year 1950 and end in November 2011. This analysis found that events Flood, Hurricane/Typhoon, Tornado and Storm Surge caused severe property and crop damages varying from 40 ~ 150 Billion dollars. Tornado causes maximum sufferings to the population.

Load Library

# install data wrangling package "dplyr"
library(R.utils)
library(car)
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.1.3
## 
## Attaching package: 'tidyr'
## 
## The following object is masked from 'package:R.utils':
## 
##     extract
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.1.2
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Datasets of Storm data and related documentation are found in following links:
Storm Data
National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ

Data Processing

#fileURL = "https://d396qusza40orc.cloudfront.net/repdata/data/StormData.csv.bz2"
destFile = "StormData.csv.bz2"
#download.file(fileURL, destFile)

myData <- read.csv(bzfile(destFile))
records <- nrow(myData)

In this study, Storm Events that caused population health hazard and greatest economic consequences are captured in column “EVTYPE”. Columns “FATALITIES”,“INJURIES”" captures dames related to population health. Economic consequences to damaged properties are captured by column “PROPDMG”,“PROPDMGEXP” and crop damages are captured “CROPDMG”,“CROPDMGEXP”.

Analysis considered 902297 records. Fields ‘EVTYPE’, ‘FATALITIES’,‘INJURIES’,“PROPDMG”,“CROPDMG” have data to perform necessary analysis.

eco_data <- select(myData, EVTYPE,PROPDMG:CROPDMGEXP)

# lower all character to lower
eco_data$PROPDMGEXP <- factor(tolower(eco_data$PROPDMGEXP))
eco_data$CROPDMGEXP <- factor(tolower(eco_data$CROPDMGEXP))

# translate factors from features PROPDMGEXP , CROPDMGEXP into a meaningful numerical values
levels <-"''=10^0;'-'=10^1;'?'=10^0;'+'=1;'0'=10^0;'1'=10^1;'2'=10^2;'3'=10^3;'4'=10^4;'5'=10^5;'6'=10^6;'7'=10^7;'8'=10^8;'b'=10^9;'h'=10^2;'k'=10^3;'m'=10^6"

# Convert factors into meningful numeric values
eco_data$PROPDMMULT <- as.numeric(recode(eco_data$PROPDMGEXP,as.factor.result=F,as.numeric.result=T,levels))
eco_data$CROPDMMULT <- as.numeric(recode(eco_data$CROPDMGEXP,as.factor.result=F,as.numeric.result=T,levels))

## compute actual damage in $ terms
eco_data <- mutate(eco_data,PROPDMGVAL = PROPDMG * PROPDMMULT, CROPDMGVAL = CROPDMG * CROPDMMULT )

# summarize cost by event type; convert $$$ to Billions of $$$
eco_DMG <- eco_data %>% group_by(EVTYPE) %>% summarise(Property = sum(PROPDMGVAL)/10^9 , Crops = sum(CROPDMGVAL)/10^9, Total = (Property + Crops))  %>% arrange(desc(Total))

# selected top 20 events
eco_DMG_20 <- head(eco_DMG,20)

#rearrange EVTYPE to match decending order of damage
eco_DMG_20 <- within(eco_DMG_20, EVTYPE <- factor(EVTYPE, levels = as.character(EVTYPE)))

# rearrnge eco_DMG_10 for plotting purpose
eco_DMG_plot <- gather(eco_DMG_20,"DMGType" , "Value", 2:4)

Since there are no clear instructions in Storm Data we estimate property damge from columns PROPDMG and PROPDMGEXP. PROPDMG carries numerical values and PROPDMGEXP carries multipliers as factor variables. PROPDMGEXP carries following factors which needs to be assigned to the values in feature PROPDMG:

unique(eco_data$PROPDMGEXP)

[1] k m b + 0 5 6 ? 4 2 3 h 7 - 1 8 Levels: - ? + 0 1 2 3 4 5 6 7 8 b h k m We estimete crop damge from columns CROPDMG and CROPDMGEXP. CROPDMG carries numerical values and CROPDMGEXP carries multipliers as factor variables. CROPDMGEXP carries following factors which needs to be assigned to the values in feature CROPDMG:

unique(eco_data$CROPDMGEXP)

[1] m k b ? 0 2 Levels: ? 0 2 b k m Table below shows how factor values from PROPDMGEXP and CROPDMGEXP are interpreted while computing dollar values of damaged properties and crop.

Characters found in PROPDMGEXP,CROPDMGEXP Multiply values in PROPDMG,CROPDMG with
blank 10^0
? 10^0
+ 10^1
- 10^1
0 10^0
1 10^1
2 10^2
3 10^3
4 10^4
5 10^5
6 10^6
7 10^7
8 10^8
b 10^9
h 10^2
k 10^3
m 10^6
pop_data <- select(myData, EVTYPE,FATALITIES:INJURIES)

# summarize FATALITIES and INJURIES by event type
pop_DMG <- pop_data %>% group_by(EVTYPE) %>% summarise(Fatalities = sum(FATALITIES) , Injuries = sum(INJURIES))  %>% arrange(desc(Injuries))

# selected top 30 events  that caused max damage
pop_DMG_30 <- head(pop_DMG,30)

#rearrange EVTYPE to match decending order of damage
pop_DMG_30 <- within(pop_DMG_30, EVTYPE <- factor(EVTYPE, levels = as.character(EVTYPE)))

# rearrnge eco_DMG_30 for plotting purpose
pop_DMG_plot <- gather(pop_DMG_30,"HarmType" , "Value", 2:3)

It is observed weather events: Flood, Hurricane/Typhoon, Tornado and Storm Surge caused severe property and crop damages varying from 40 ~ 150 Billion dollars. However there is a distinct variation in events that cuases damages to property versus crops. Crops are affected severly by Drought followed by Flood, River Flood and Ice Storm whereas property damages are mainly from Flood, Hurricane/Typhoon, Tornado and Storm Surge. Fig. (1)

Results

library(ggplot2)
p <- ggplot(eco_DMG_plot) 
p <- p +  geom_histogram(aes(x = EVTYPE, y = Value),stat = "identity")
p <- p +  labs( x="Event type", y="Economic Impact Billions ($)") 
p <- p +  labs(title = "Figure (1) - Value of Property and Crop Damaged Due to Weather Events")
p <- p + facet_grid(DMGType ~ .,scales = "free_y") 
p <- p + theme_bw()
p <- p + theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) 
 
print(p)

plot of chunk plotFigure1

Tornado causes maximum fatalities to the population. Events like excessive heat,lightining, heat and ice storm causes substantial loss of life to the population. Population suffers large number of injuries from event Tornado, and to lesser extent from events TSTM Wind, Flood, Excessive heat and lightining. Fig. (2)

p <- ggplot(pop_DMG_plot) 
p <- p +  geom_histogram(aes(x = EVTYPE, y = Value),stat = "identity")
p <- p +  labs( x="Event type", y="Number of Fatalities / Injuries") 
p <- p +  labs(title = "Figure (2) Weather Events Brought Most Harm To Health of US Population")
p <- p + facet_grid(HarmType ~ .,scales = "free_y") 
p <- p + theme_bw()
p <- p + theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) 

print(p)

plot of chunk plotFigure2