This the assignment #2 of Reproducible Research Assignment for Storm Data Analysis.
To understand the damages (public health and economic problems) for communities and municipalities in different types of natural disasters and events, we explored and ran simple analysis on storm database of the U.S. National Oceanic and Atmospheric Administration (NOAA). The database of NOAA contains records of damages in different types of events from 1950 to 2011.
In our first data analysis, we filtered and then processed the valid dataset to accumulate the number of fatalities and injuries by different types of events. So, we can understand how these events impact health of people. The second analysis about the economic lost. We accumulate the damage cost (product s and crops) by different types of events.
The results show hail, thunderstorm and flood are ranked the top natural disasters of health of the citizens. The hurricane, storm and flood are the top threats of economic lost.
CAVEAT: Many records in the database can be either lost or incorrect (human typing error), the valid data are less than 10% of the original data. The analyzed results might not reflect the reality.
## load library and declare trim function
library(xtable)
library(ggplot2)
library(R.utils)
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.6.1 (2014-01-04) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.18.0 (2014-02-22) successfully loaded. See ?R.oo for help.
##
## Attaching package: 'R.oo'
##
## The following objects are masked from 'package:methods':
##
## getClasses, getMethods
##
## The following objects are masked from 'package:base':
##
## attach, detach, gc, load, save
##
## R.utils v1.33.0 (2014-08-24) successfully loaded. See ?R.utils for help.
##
## Attaching package: 'R.utils'
##
## The following object is masked from 'package:utils':
##
## timestamp
##
## The following objects are masked from 'package:base':
##
## cat, commandArgs, getOption, inherits, isOpen, parse, warnings
trim <- function (x) gsub("^\\s+|\\s+$", "", x)
Download storm data (repdata_data_StormData.csv.bz2 ), extract repdata_data_StormData.csv from repdata_data_StormData.csv.bz2 and read dataset.
library(knitr)
opts_chunk$set(cache=TRUE)
source_url="https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
source_file="repdata_data_StormData.csv.bz2"
data_file="repdata_data_StormData.csv"
## download file and extract file
if (!file.exists(source_file)) {
## download repdata_data_StormData.csv.bz2
download.file(source_url, source_file, method="curl", quiet=T)
}
if (!file.exists(data_file)){
## extract bz2 file
bunzip2(source_file,data_file)
}
## read csv data
data<-read.csv(data_file, sep=",", header=T, quote ="", nrows=902297)Remove unused rows, empty events, keep (EVTYPE,FATALITIES,INJURIES,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP) and trim character rows
## remove unused rows
wdata <- data[,c(8,23:28)]
## trim and set to upper case
wdata$EVTYPE=toupper(trim(wdata$EVTYPE))
wdata$PROPDMGEXP=toupper(trim(wdata$PROPDMGEXP))
wdata$CROPDMGEXP=toupper(trim(wdata$CROPDMGEXP))
## remove NA in EVTYPE
wdata <- subset(wdata,!(wdata$EVTYPE=="" | wdata$EVTYPE=="?"))
## transform factor columns to numeric columns
wdata$FATALITIES <- as.numeric(wdata$FATALITIES)
wdata$INJURIES <- as.numeric(wdata$INJURIES)
wdata$PROPDMG <- as.numeric(wdata$PROPDMG)
wdata$CROPDMG <- as.numeric(wdata$CROPDMG)
## remove rows of no damages
wdata <- subset(wdata,!(wdata$FATALITIES==0
& wdata$INJURIES==0
& wdata$PROPDMG==0
& wdata$CROPDMG==0))Fix values in PROPDMGEXP and CROPDMGEXP
## values of PROPDMGEXP & CROPDMGEXP
## 0 = 1
## H = 100
## K = 1000
## M = 1000000
## B = 1000000000
## Others = (unknown or typo, ignore these rows)
wdata <- subset(wdata, wdata$PROPDMGEXP %in% c("0","H","K","M","B"))
wdata <- subset(wdata, wdata$CROPDMGEXP %in% c("0","H","K","M","B"))Aggregate impacted persons by EVTYPE
## add new column TOTAL_PEOPLE to calculate sum op FATALITIES INJURIES
wdata$TOTAL_PEOPLE = wdata$FATALITIES+wdata$INJURIES
## aggregate sum of TOTAL_PEOPLE by different EVTYPE
people_data <- aggregate(TOTAL_PEOPLE~EVTYPE, data=wdata, FUN=sum)
people_data <- people_data[order(people_data$TOTAL_PEOPLE,decreasing=T),] Aggregate total damage cost by EVTYPE
## add new columns PROD_LOST, CROP_LOST, TOTAL_LOST
## calculate total damange of product and crop
wdata$PROD_LOST=wdata$PROPDMG*sapply(wdata$PROPDMGEXP, function(x) {
if (x=="0") {1}
else if (x=="H") {100}
else if (x=="K") {1000}
else if (x=="M") {1000000}
else if (x=="B") {1000000000}
})
wdata$CROP_LOST=wdata$CROPDMG*sapply(wdata$CROPDMGEXP, function(x) {
if (x=="0") {1}
else if (x=="H") {100}
else if (x=="K") {1000}
else if (x=="M") {1000000}
else if (x=="B") {1000000000}
})
wdata$TOTAL_LOST=wdata$PROD_LOST+wdata$CROP_LOST
## aggregate sum of TOTAL_LOST by different EVTYPE
cost_data <- aggregate(TOTAL_LOST~EVTYPE, data=wdata, FUN=sum)
cost_data <- cost_data[order(cost_data$TOTAL_LOST,decreasing=T),]names(people_data) <- c("Event", "Population")
xt <- xtable(head(people_data[,1:2],10))
| Event | Population |
|---|---|
| HAIL | 3292226.00 |
| TSTM WIND | 2572767.00 |
| FLASH FLOOD | 998069.00 |
| TORNADO | 544724.00 |
| THUNDERSTORM WINDS | 470629.00 |
| FLOOD | 313161.00 |
| LIGHTNING | 149042.00 |
| EXCESSIVE HEAT | 105759.00 |
| HIGH WIND | 104744.00 |
| URBAN/SML STREAM FLD | 93152.00 |
hplot <- ggplot(data=people_data[1:10,],
aes(x=reorder(Event,-Population), y=Population)) +
geom_bar(stat="identity", colour="grey", fill="grey",
position=position_dodge()) +
theme(axis.text.x=element_text(angle=90,hjust=1)) +
xlab("") + ylab("Fatalities & Injuries")
print(hplot)
names(cost_data) <- c("Event", "Lost")
xt <- xtable(head(cost_data[,1:2],10))
| Event | Lost |
|---|---|
| HURRICANE | 1006281994000.00 |
| HURRICANE OPAL | 520677000000.00 |
| ICE STORM | 499939764981.00 |
| FLASH FLOOD | 385171102648.00 |
| WILDFIRE | 291079702000.00 |
| HURRICANE OPAL/HIGH WINDS | 244233000000.00 |
| “TORNADOES | 223000000607.00 |
| FLOOD | 158942458000.00 |
| HAIL | 127807687180.00 |
| TSTM WIND | 81978601000.00 |
hplot <- ggplot(data=cost_data[1:10,],
aes(x=reorder(Event,-Lost), y=Lost)) +
geom_bar(stat="identity", colour="grey", fill="grey",
position=position_dodge()) +
theme(axis.text.x=element_text(angle=90,hjust=1)) +
xlab("") + ylab("Economic Lost (USD)")
print(hplot)