Statistical Analysis of Natural Weather Events

with respect to Population Health and Economic Impact.

Synopsis

This thesis is concerned with the quantitative study of specific natural weather events and their effect on human populations and the economic impact thereof. The results will therefore fall into two main categories, Human Toll and Economic Toll. The events in the database start in the year 1950 and end in November 2011.

Data Analysis

Our First steps are to download our storm data from a subset of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. .

The R libraries we will utilize are shown below:

library(Hmisc)
library(dplyr)
library(reshape2)
library(Hmisc)
## Loading required package: grid
## Loading required package: lattice
## Loading required package: survival
## Loading required package: splines
## Loading required package: Formula
## Loading required package: ggplot2
## 
## Attaching package: 'Hmisc'
## 
## The following objects are masked from 'package:base':
## 
##     format.pval, round.POSIXt, trunc.POSIXt, units
library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:Hmisc':
## 
##     combine, src, summarize
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(reshape2)
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if (!file.exists("stormdat.csv.bz2"))
   download.file(url, destfile = "stormdat.csv.bz2", method = "curl")
stormdata <- read.csv("./stormdat.csv.bz2")
stormdata$EVTYPE <- capitalize(tolower(stormdata$EVTYPE))
attach(stormdata) 
## The following object is masked from package:base:
## 
##     F

Initial data assessment.

## Initial number of columns in  dataset ncol(stormdata) : 37
## Initial number of rows in  dataset nrow(stormdata) : 902297

The variables most consistent with human and economic toll due to weather events are

found below:

“FATALITIES” “INJURIES” “CROPDMG” “PROPDMG” “PROPDMGEXP” “CROPDMGEXP”

The EVTYPE variable also gave us an immediate count of unique variables.

NROW(unique(stormdata$EVTYPE)) 
## [1] 898

We use base R package : order() to sort this matrix.

Our first analysis is found by splitting the data into subsets, computing a statistic sum() for each, and returning it to a usable form.

f <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE, 
           data = stormdata, 
           FUN = "sum")
fatalinjuries <- melt(head(f[order(-f$FATALITIES, -f$INJURIES), ], 10)) # top 10
## Using EVTYPE as id variables
## Decreasing order or top EVTYPES, FATALITIES and INJURIES
##             EVTYPE FATALITIES INJURIES
## 754        Tornado       5633    91346
## 109 Excessive heat       1903     6525
## 132    Flash flood        978     1777
## 237           Heat        937     2100
## 409      Lightning        816     5230
## 777      Tstm wind        504     6957

A simple barplot displays the human toll of top weather related events.

library(ggplot2)
ggplot(fatalinjuries, aes(x = EVTYPE, y = value, fill = variable)) + geom_bar(stat = "identity") +
    coord_flip() + ggtitle("Human Population Effects") + labs(x = "", y = "Human Impact") +
    scale_fill_manual(values = c("black", "red"), labels = c("FATALITIES", "INJURIES"))

We summarize , melt , and plot the data for crop and property damage as shown in the PROPDMG and PROPDMG variables. There is only one major difference that is required for this set of variables.

And in now in no small thanks to my fellow students taking Reproducible Research Coursera R class (March 2015) , I change PROPDMG and CROPDMG with the factors found on PROPDMGEXP and CROPDMGEXP before aggregating , melting and plotting.

PROPDMGEXP and CROMDMGEXP contained a multiple of values to either coerce or create a new column.

I chose the latter.

Total variables found under each category:

## ---- Property Damage Expense Codes ----
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
## 
## ---- Crop Damage Expense Codes ----
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M

The r code below also shows the numeric equivalence and the two added columns Aprop and Bcrop representing total cost of Property damage and Crop damage.

stormdata$Aprop <- stormdata$PROPDMG * as.numeric(stormdata$PROPDMGEXP,
                     "'0'=1;'1'=10;'2'=100;   
                     '3'=1000;'4'=10000;'5'=100000;
                     '6'=1000000;'7'=10000000;'8'=100000000;
                     'B'=1000000000;
                     'h'=100;'H'=100;
                     'K'=1000;
                     'm'=1000000;'M'=1000000;
                     '-'=0;'?'=0;'+'=0",
                     as.factor.result = FALSE)


stormdata$Bcrop <- stormdata$CROPDMG * as.numeric(stormdata$CROPDMGEXP,
                     "'0'=1;     # multiple of 1
                     '2'=100;    # 100
                     'k'=1000;   # etc
                     'K'=1000;
                     'm'=1000000;'M'=1000000;
                     'B'=1000000000;
                     ''=0;'?'=0",
                     as.factor.result = FALSE)

Then the summary analysis and barplot.

g <- aggregate(cbind(Aprop, Bcrop) ~ EVTYPE,
     data = stormdata,
     FUN = "sum" )
dmg <- melt(head(g[order(-g$Aprop, -g$Bcrop), ], 10))
## Using EVTYPE as id variables
## Decreasing order or top EVTYPES, Aprop and Bcrop
dmg <- melt(head(g[order(-g$Aprop, -g$Bcrop), ], 10))
## Using EVTYPE as id variables
library(ggplot2)
ggplot(dmg, aes(x = EVTYPE, 
                y = value, 
                fill = variable)) + 
                geom_bar(stat = "identity") + 
                coord_flip() + 
                ggtitle("Economic consequences") + 
                labs(x = "", y = "Damage in Dollars") + 
                scale_fill_manual(values = c("black", "red"), 
                                  labels = c("Property Damage", "Crop Damage"))

Conclusion

Fatalities and injuries were found to be the highest for human populations with tornadoes.

Detrimental economic effects were also highest with tornadoes.