Economic and Health Impact correlated to Storms across the United States

Herminio Vazquez / September 2016

Coursera Peer Assessment 2 / Reproducible Research


Introduction

This analysis study uses the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. It contains data from 1950 until November 2011. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The purpose of this analysis is visualise the correlation of events with health and economic impact.

Data Processing

if (!file.exists("repdata-data-StormData.csv")) {
  temp <- tempfile() 
  # Remote location of file to be downloaded
  download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",temp)
  unlink(temp)
}

# Reading the csv data into the data variable
# Faster Read
data <- fread('repdata-data-StormData.csv', header = T, sep = ',')
## 
Read 11.4% of 967216 rows
Read 37.2% of 967216 rows
Read 54.8% of 967216 rows
Read 75.5% of 967216 rows
Read 82.7% of 967216 rows
Read 902297 rows and 37 (of 37) columns from 0.523 GB file in 00:00:08

The data set is has the following dimensions

dim(data)
## [1] 902297     37

In this analysis exercise we will be looking into the correlation between the storm related events registered in correlation to health and economic consequences.

To simplify the data set we will be only accounting the following columns:

dataset <- data %>% select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

Results

This section include the analysis data specifically for the types of events and their relation with health and economic impact. We will be producing some bar graphs for ease identification of the main contributors to the hazards for health and economically.

Fatalities Analysis

The following table show the top 10 events that produced fatalitites

top_10_fatalities_by_event <- dataset %>% group_by(EVTYPE) %>% summarise(total=sum(FATALITIES)) %>% arrange(desc(total)) %>% top_n(10) %>% transform(EVTYPE = reorder(EVTYPE, total))
## Selecting by total
kable(top_10_fatalities_by_event)
EVTYPE total
TORNADO 5633
EXCESSIVE HEAT 1903
FLASH FLOOD 978
HEAT 937
LIGHTNING 816
TSTM WIND 504
FLOOD 470
RIP CURRENT 368
HIGH WIND 248
AVALANCHE 224

Distribution of most dangerous events correlated to the fatalities they produce

g <- ggplot(top_10_fatalities_by_event, aes(x=factor(EVTYPE), y=total))
g + geom_bar(stat = "identity") + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5, size=rel(0.7)))

Injuries Analysis

The following table show the top 10 events that produced injuries

top_10_injuries_by_event <- dataset %>% group_by(EVTYPE) %>% summarise(total=sum(INJURIES)) %>% arrange(desc(total)) %>% top_n(10) %>% transform(EVTYPE = reorder(EVTYPE, total))
## Selecting by total
kable(top_10_injuries_by_event)
EVTYPE total
TORNADO 91346
TSTM WIND 6957
FLOOD 6789
EXCESSIVE HEAT 6525
LIGHTNING 5230
HEAT 2100
ICE STORM 1975
FLASH FLOOD 1777
THUNDERSTORM WIND 1488
HAIL 1361

Distribution of most dangerous events correlated to the injuries they produce

g <- ggplot(top_10_injuries_by_event, aes(x=factor(EVTYPE), y=total))
g + geom_bar(stat = "identity") + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5, size=rel(0.7)))

Economic Impact Analysis

The economic impact is measured by property and crop damages. PROPDMGEXP and CROPDMGEXP are factor variables with the following levels.

Some of values in the data set are not expressed in numbers, so it is required to transform this values in notation to real numbers.

dataset$PROPDMGEXP <- as.character(dataset$PROPDMGEXP)
dataset$PROPDMGEXP = gsub("\\-|\\+|\\?","0",dataset$PROPDMGEXP)
dataset$PROPDMGEXP = gsub("B|b", "9", dataset$PROPDMGEXP)
dataset$PROPDMGEXP = gsub("M|m", "6", dataset$PROPDMGEXP)
dataset$PROPDMGEXP = gsub("K|k", "3", dataset$PROPDMGEXP)
dataset$PROPDMGEXP = gsub("H|h", "2", dataset$PROPDMGEXP)
dataset$PROPDMGEXP <- as.numeric(dataset$PROPDMGEXP)
dataset$PROPDMGEXP[is.na(dataset$PROPDMGEXP)] = 0
dataset$ActPropDam<- dataset$PROPDMG * 10^dataset$PROPDMGEXP
propDam <- aggregate(ActPropDam~EVTYPE, data=dataset, sum)
propDam_reorder<- propDam[order(-propDam$ActPropDam),]
PropDam10<-propDam_reorder[1:10,]

dataset$CROPDMGEXP <- as.character(dataset$CROPDMGEXP)
dataset$CROPDMGEXP = gsub("\\-|\\+|\\?","0",dataset$CROPDMGEXP)
dataset$CROPDMGEXP = gsub("B|b", "9", dataset$CROPDMGEXP)
dataset$CROPDMGEXP = gsub("M|m", "6", dataset$CROPDMGEXP)
dataset$CROPDMGEXP = gsub("K|k", "3", dataset$CROPDMGEXP)
dataset$CROPDMGEXP = gsub("H|h", "2", dataset$CROPDMGEXP)
dataset$CROPDMGEXP <- as.numeric(dataset$CROPDMGEXP)
dataset$CROPDMGEXP[is.na(dataset$CROPDMGEXP)] = 0
dataset$ActCropDam<- dataset$CROPDMG * 10^dataset$CROPDMGEXP
cropDam <- aggregate(ActCropDam~EVTYPE, data=dataset, sum)
cropDam_reorder<- cropDam[order(-cropDam$ActCropDam),]
CropDam10<-cropDam_reorder[1:10,]

TotalDam <- aggregate(ActPropDam + ActCropDam~EVTYPE, data=dataset, sum)
names(TotalDam)[2] <- "total"
TotalDam10 <- arrange(TotalDam, desc(total)) %>% top_n(10)
## Selecting by total

Plots for analysis on the overall impact of storms in economic aspects

par(mfrow=c(1,3))
barplot(PropDam10$ActPropDam, 
        names = PropDam10$EVTYPE,
        cex.names = 0.7,
        cex.axis = 0.7,
        xlab = "Event Type",
        ylab = "Total Property Damage ($)",
        main = "Top 10 Events Causing \n Most Property Damage")
barplot(CropDam10$ActCropDam, 
        names = CropDam10$EVTYPE,
        cex.names = 0.7,
        cex.axis = 0.7,
        xlab = "Event Type",
        ylab = "Total Crop Damage ($)",
        main = "Top 10 Events Causing \n Most Crop Damage")
barplot(TotalDam10$total, 
        names = TotalDam10$EVTYPE,
        cex.names = 0.7,
        cex.axis = 0.7,
        xlab = "Event Type",
        ylab = "Total Crop Damage ($)",
        main = "Top 10 Events Causing \n Most Total Damage")

Conclusions

As you can see from the analysis, TORNADO caused the most fatalities and most injuries. FLOOD caused the most property damage. DROUGHT caused the most crop damange, while FLOOD caused the most overall economic damage.