This thesis is concerned with the quantitative study of specific natural weather events and their effect on human populations and the economic impact thereof. The results will therefore fall into two main categories, Human Toll and Economic Toll. The events in the database start in the year 1950 and end in November 2011.
Our First steps are to download our storm data from a subset of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. .
The R libraries we will utilize are shown below:
library(Hmisc)
library(dplyr)
library(reshape2)
library(Hmisc)
## Loading required package: grid
## Loading required package: lattice
## Loading required package: survival
## Loading required package: splines
## Loading required package: Formula
## Loading required package: ggplot2
##
## Attaching package: 'Hmisc'
##
## The following objects are masked from 'package:base':
##
## format.pval, round.POSIXt, trunc.POSIXt, units
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:Hmisc':
##
## combine, src, summarize
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(reshape2)
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if (!file.exists("stormdat.csv.bz2"))
download.file(url, destfile = "stormdat.csv.bz2", method = "curl")
stormdata <- read.csv("./stormdat.csv.bz2")
stormdata$EVTYPE <- capitalize(tolower(stormdata$EVTYPE))
attach(stormdata)
## The following object is masked from package:base:
##
## F
## Initial number of columns in dataset ncol(stormdata) : 37
## Initial number of rows in dataset nrow(stormdata) : 902297
The EVTYPE variable also gave us an immediate count of unique variables.
NROW(unique(stormdata$EVTYPE))
## [1] 898
We use base R package : order() to sort this matrix.
Our first analysis is found by splitting the data into subsets, computing a statistic sum() for each, and returning it to a usable form.
f <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE,
data = stormdata,
FUN = "sum")
fatalinjuries <- melt(head(f[order(-f$FATALITIES, -f$INJURIES), ], 10)) # top 10
## Using EVTYPE as id variables
## Decreasing order or top EVTYPES, FATALITIES and INJURIES
## EVTYPE FATALITIES INJURIES
## 754 Tornado 5633 91346
## 109 Excessive heat 1903 6525
## 132 Flash flood 978 1777
## 237 Heat 937 2100
## 409 Lightning 816 5230
## 777 Tstm wind 504 6957
A simple barplot displays the human toll of top weather related events.
library(ggplot2)
ggplot(fatalinjuries, aes(x = EVTYPE, y = value, fill = variable)) + geom_bar(stat = "identity") +
coord_flip() + ggtitle("Human Population Effects") + labs(x = "", y = "Human Impact") +
scale_fill_manual(values = c("black", "red"), labels = c("FATALITIES", "INJURIES"))
We summarize , melt , and plot the data for crop and property damage as shown in the PROPDMG and PROPDMG variables. There is only one major difference that is required for this set of variables.
And in now in no small thanks to my fellow students taking Reproducible Research Coursera R class (March 2015) , I change PROPDMG and CROPDMG with the factors found on PROPDMGEXP and CROPDMGEXP before aggregating , melting and plotting.
PROPDMGEXP and CROMDMGEXP contained a multiple of values to either coerce or create a new column.
I chose the latter.
Total variables found under each category:
## ---- Property Damage Expense Codes ----
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
##
## ---- Crop Damage Expense Codes ----
## [1] M K m B ? 0 k 2
## Levels: ? 0 2 B k K m M
The r code below also shows the numeric equivalence and the two added columns Aprop and Bcrop representing total cost of Property damage and Crop damage.
stormdata$Aprop <- stormdata$PROPDMG * as.numeric(stormdata$PROPDMGEXP,
"'0'=1;'1'=10;'2'=100;
'3'=1000;'4'=10000;'5'=100000;
'6'=1000000;'7'=10000000;'8'=100000000;
'B'=1000000000;
'h'=100;'H'=100;
'K'=1000;
'm'=1000000;'M'=1000000;
'-'=0;'?'=0;'+'=0",
as.factor.result = FALSE)
stormdata$Bcrop <- stormdata$CROPDMG * as.numeric(stormdata$CROPDMGEXP,
"'0'=1; # multiple of 1
'2'=100; # 100
'k'=1000; # etc
'K'=1000;
'm'=1000000;'M'=1000000;
'B'=1000000000;
''=0;'?'=0",
as.factor.result = FALSE)
Then the summary analysis and barplot.
g <- aggregate(cbind(Aprop, Bcrop) ~ EVTYPE,
data = stormdata,
FUN = "sum" )
dmg <- melt(head(g[order(-g$Aprop, -g$Bcrop), ], 10))
## Using EVTYPE as id variables
## Decreasing order or top EVTYPES, Aprop and Bcrop
dmg <- melt(head(g[order(-g$Aprop, -g$Bcrop), ], 10))
## Using EVTYPE as id variables
library(ggplot2)
ggplot(dmg, aes(x = EVTYPE,
y = value,
fill = variable)) +
geom_bar(stat = "identity") +
coord_flip() +
ggtitle("Economic consequences") +
labs(x = "", y = "Damage in Dollars") +
scale_fill_manual(values = c("black", "red"),
labels = c("Property Damage", "Crop Damage"))
Fatalities and injuries were found to be the highest for human populations with tornadoes.
Detrimental economic effects were also highest with tornadoes.