Economic and Health Impact correlated to Storms across the United States

Herminio Vazquez / September 2016

Coursera Peer Assessment 2 / Reproducible Research

Introduction

This analysis study uses the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. It contains data from 1950 until November 2011. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The purpose of this analysis is visualise the correlation of events with health and economic impact.

Data Processing

if (!file.exists("repdata-data-StormData.csv")) {
  temp <- tempfile() 
  # Remote location of file to be downloaded
  download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",temp)
  unlink(temp)
}

# Reading the csv data into the data variable
# Faster Read
data <- fread('repdata-data-StormData.csv', header = T, sep = ',')

## 
Read 11.4% of 967216 rows
Read 37.2% of 967216 rows
Read 54.8% of 967216 rows
Read 75.5% of 967216 rows
Read 82.7% of 967216 rows
Read 902297 rows and 37 (of 37) columns from 0.523 GB file in 00:00:08

The data set is has the following dimensions

dim(data)

## [1] 902297     37

In this analysis exercise we will be looking into the correlation between the storm related events registered in correlation to health and economic consequences.

To simplify the data set we will be only accounting the following columns:

dataset <- data %>% select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

Results

This section include the analysis data specifically for the types of events and their relation with health and economic impact. We will be producing some bar graphs for ease identification of the main contributors to the hazards for health and economically.

Fatalities Analysis

The following table show the top 10 events that produced fatalitites

top_10_fatalities_by_event <- dataset %>% group_by(EVTYPE) %>% summarise(total=sum(FATALITIES)) %>% arrange(desc(total)) %>% top_n(10) %>% transform(EVTYPE = reorder(EVTYPE, total))

## Selecting by total

kable(top_10_fatalities_by_event)

EVTYPE	total
TORNADO	5633
EXCESSIVE HEAT	1903
FLASH FLOOD	978
HEAT	937
LIGHTNING	816
TSTM WIND	504
FLOOD	470
RIP CURRENT	368
HIGH WIND	248
AVALANCHE	224

Distribution of most dangerous events correlated to the fatalities they produce

g <- ggplot(top_10_fatalities_by_event, aes(x=factor(EVTYPE), y=total))
g + geom_bar(stat = "identity") + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5, size=rel(0.7)))

Injuries Analysis

The following table show the top 10 events that produced injuries

top_10_injuries_by_event <- dataset %>% group_by(EVTYPE) %>% summarise(total=sum(INJURIES)) %>% arrange(desc(total)) %>% top_n(10) %>% transform(EVTYPE = reorder(EVTYPE, total))

## Selecting by total

kable(top_10_injuries_by_event)

EVTYPE	total
TORNADO	91346
TSTM WIND	6957
FLOOD	6789
EXCESSIVE HEAT	6525
LIGHTNING	5230
HEAT	2100
ICE STORM	1975
FLASH FLOOD	1777
THUNDERSTORM WIND	1488
HAIL	1361

Distribution of most dangerous events correlated to the injuries they produce

g <- ggplot(top_10_injuries_by_event, aes(x=factor(EVTYPE), y=total))
g + geom_bar(stat = "identity") + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5, size=rel(0.7)))

Economic Impact Analysis

The economic impact is measured by property and crop damages. PROPDMGEXP and CROPDMGEXP are factor variables with the following levels.

Some of values in the data set are not expressed in numbers, so it is required to transform this values in notation to real numbers.

dataset$PROPDMGEXP <- as.character(dataset$PROPDMGEXP)
dataset$PROPDMGEXP = gsub("\\-|\\+|\\?","0",dataset$PROPDMGEXP)
dataset$PROPDMGEXP = gsub("B|b", "9", dataset$PROPDMGEXP)
dataset$PROPDMGEXP = gsub("M|m", "6", dataset$PROPDMGEXP)
dataset$PROPDMGEXP = gsub("K|k", "3", dataset$PROPDMGEXP)
dataset$PROPDMGEXP = gsub("H|h", "2", dataset$PROPDMGEXP)
dataset$PROPDMGEXP <- as.numeric(dataset$PROPDMGEXP)
dataset$PROPDMGEXP[is.na(dataset$PROPDMGEXP)] = 0
dataset$ActPropDam<- dataset$PROPDMG * 10^dataset$PROPDMGEXP
propDam <- aggregate(ActPropDam~EVTYPE, data=dataset, sum)
propDam_reorder<- propDam[order(-propDam$ActPropDam),]
PropDam10<-propDam_reorder[1:10,]

dataset$CROPDMGEXP <- as.character(dataset$CROPDMGEXP)
dataset$CROPDMGEXP = gsub("\\-|\\+|\\?","0",dataset$CROPDMGEXP)
dataset$CROPDMGEXP = gsub("B|b", "9", dataset$CROPDMGEXP)
dataset$CROPDMGEXP = gsub("M|m", "6", dataset$CROPDMGEXP)
dataset$CROPDMGEXP = gsub("K|k", "3", dataset$CROPDMGEXP)
dataset$CROPDMGEXP = gsub("H|h", "2", dataset$CROPDMGEXP)
dataset$CROPDMGEXP <- as.numeric(dataset$CROPDMGEXP)
dataset$CROPDMGEXP[is.na(dataset$CROPDMGEXP)] = 0
dataset$ActCropDam<- dataset$CROPDMG * 10^dataset$CROPDMGEXP
cropDam <- aggregate(ActCropDam~EVTYPE, data=dataset, sum)
cropDam_reorder<- cropDam[order(-cropDam$ActCropDam),]
CropDam10<-cropDam_reorder[1:10,]

TotalDam <- aggregate(ActPropDam + ActCropDam~EVTYPE, data=dataset, sum)
names(TotalDam)[2] <- "total"
TotalDam10 <- arrange(TotalDam, desc(total)) %>% top_n(10)

## Selecting by total

Plots for analysis on the overall impact of storms in economic aspects

par(mfrow=c(1,3))
barplot(PropDam10$ActPropDam, 
        names = PropDam10$EVTYPE,
        cex.names = 0.7,
        cex.axis = 0.7,
        xlab = "Event Type",
        ylab = "Total Property Damage ($)",
        main = "Top 10 Events Causing \n Most Property Damage")
barplot(CropDam10$ActCropDam, 
        names = CropDam10$EVTYPE,
        cex.names = 0.7,
        cex.axis = 0.7,
        xlab = "Event Type",
        ylab = "Total Crop Damage ($)",
        main = "Top 10 Events Causing \n Most Crop Damage")
barplot(TotalDam10$total, 
        names = TotalDam10$EVTYPE,
        cex.names = 0.7,
        cex.axis = 0.7,
        xlab = "Event Type",
        ylab = "Total Crop Damage ($)",
        main = "Top 10 Events Causing \n Most Total Damage")

Conclusions

As you can see from the analysis, TORNADO caused the most fatalities and most injuries. FLOOD caused the most property damage. DROUGHT caused the most crop damange, while FLOOD caused the most overall economic damage.