This R Markdown document explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The database records events from 1950-2011. It contains information related to major storms and weather events, as well as estimates for population health and economic damage. We shall try to answer the following questions:
1. Which types of events are most harmful with respect to population health?
2. Which types of events have the greatest economic consequences?
This document has the following dependencies:
library(dplyr)
Describe how the data were loaded into R and processed for the analysis.
First, set the working directory and create a data directory if it does not exist.
setwd("./")
if(!file.exists(".data")){dir.create("./data")}
setwd("./data")
Download zip file.
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl,destfile="./storm_data.bz2", method="curl")
Load data. The read.csv() can handle the compressed file automatically.
dat <- tbl_df(read.csv("storm_data.bz2", header = TRUE))
Tidy Data. Processed data necessary for the analysis.
FATALITIES
INJURIES
Health Damage
dat.health <- select(dat, EVTYPE, FATALITIES, INJURIES) %>%
filter(FATALITIES>0 | INJURIES>0) %>%
group_by(EVTYPE) %>%
summarise(total.FAT=sum(FATALITIES), total.INJ=sum(INJURIES))
PROPERTY DAMAGE
CROP DAMAGE
To estimate the cost of the economic damage Let’s give a look at the National Weather Service Storm Data Documentation. Damage estimates are entered as actual dollar amounts. The columns PROPDMGEXP and CROPDMGEXP contain alphabetical characters signifying the magnitude of the number. We will keep the characters: “K” for thousands, “M” for millions, and “B” for billions.
2A. Property Damage
Create an index for each monetary unit:
ind.K <- which(dat$PROPDMGEXP=="K")
ind.M <- which(dat$PROPDMGEXP=="M")
ind.B <- which(dat$PROPDMGEXP=="B")
Create a new column in the dataframe with the cost expressed in the same units (E.g. Billions $)
new.column <- rep(0,dim(dat)[1])
new.column[ind.B] = dat$PROPDMG[ind.B]
new.column[ind.M] = dat$PROPDMG[ind.M]/10^3
new.column[ind.K] = dat$PROPDMG[ind.K]/10^6
Create a new variable in the dataframe
dat$MillionsPROP = new.column
2B. Crop Damage
Let’s proceed for Crops Damage in the same way
Create an index for each monetary unit:
ind.cropK <- which(dat$CROPDMGEXP=="K")
ind.cropM <- which(dat$CROPDMGEXP=="M")
ind.cropB <- which(dat$CROPDMGEXP=="B")
Create a new column in the dataframe with the cost expressed in the same units (E.g. Billions $)
new.column <- rep(0,dim(dat)[1])
new.column[ind.cropB] = dat$CROPDMG[ind.cropB]
new.column[ind.cropM] = dat$CROPDMG[ind.cropM]/10^3
new.column[ind.cropK] = dat$CROPDMG[ind.cropK]/10^6
Create a new variable in the dataframe
dat$MillionsCROP = new.column
Economic Damage
dat.dmg <- select(dat, EVTYPE, PROPDMG, MillionsPROP,CROPDMG, MillionsCROP) %>%
filter(PROPDMG>0 | CROPDMG>0) %>%
group_by(EVTYPE) %>%
summarise(Properties=sum(MillionsPROP), Crops=sum(MillionsCROP))
Extract the data:
n.fatality <- arrange(dat.health, desc(total.FAT))
n.injury <- arrange(dat.health, desc(total.INJ))
Distribution of the Top10 events causing more harmful to population health:
par(mfrow=c(1,2), oma=c(1,0,3,0))
with(n.fatality, barplot(names.arg = EVTYPE[1:10], height = total.FAT[1:10],
cex.axis = 0.8, cex.names = 0.70, cex.lab=0.9,log="y", las=2, xpd=FALSE,
ylab= "Number of Fatalities", col="lightsteelblue"))
with(n.injury, barplot(names.arg = EVTYPE[1:10], height = total.INJ[1:10],
cex.axis = 0.8, cex.names = 0.70, cex.lab=0.9, log="y", las=2,
xpd=FALSE, ylab= "Number of Injuries", col="mediumaquamarine"))
title("Top10 most harmful events to population health in U.S.\n (1950-2011)", outer=TRUE)
We may want to create a function to identify the events that causes 50% of fatalities:
ind <- min(which(cumsum(n.fatality[,2])/sum(n.fatality[,2])>=0.5))
n.fatality[1:ind,1]
## Source: local data frame [3 x 1]
##
## EVTYPE
## (fctr)
## 1 TORNADO
## 2 EXCESSIVE HEAT
## 3 FLASH FLOOD
We may want to create a function to identify the events that causes 50% of injuries:
ind2 <- min(which(cumsum(n.injury[,2])/sum(n.injury[,2])>=0.5))
n.injury[1:ind2,1]
## Source: local data frame [4 x 1]
##
## EVTYPE
## (fctr)
## 1 TORNADO
## 2 TSTM WIND
## 3 FLOOD
## 4 EXCESSIVE HEAT
Extract the data:
prop.damage <- arrange(dat.dmg, desc(Properties))
crop.damage <- arrange(dat.dmg, desc(Crops))
Distribution of the Top10 events causing more economic damage:
par(mfrow=c(1,2),oma=c(1,1,3,0))
with(prop.damage, barplot(names.arg = EVTYPE[1:10], height = Properties[1:10],
cex.axis = 0.8, cex.names = 0.60, cex.lab=0.75, las=2, xpd=FALSE,
ylab= "$ Billions", main="Damage on Properties", col="lightsteelblue"))
with(crop.damage, barplot(names.arg = EVTYPE[1:10], height = Crops[1:10],
cex.axis = 0.8, cex.names = 0.60, cex.lab=0.75, las=2,
xpd=FALSE, ylab= "$ Billions ($)", main="Damage on Crops",col="mediumaquamarine"))
title("Top10 most harmful events with economic consequences.\n (1950-2011)", outer=TRUE)
We may want to create a function to identify the events that causes 50% of property damages:
ind3 <- min(which(cumsum(prop.damage[,2])/sum(prop.damage[,2])>=0.5))
prop.damage[1:ind3,1]
## Source: local data frame [2 x 1]
##
## EVTYPE
## (fctr)
## 1 FLOOD
## 2 HURRICANE/TYPHOON
We may want to create a function to identify the events that causes 50% of crop damages:
ind4 <- min(which(cumsum(crop.damage[,3])/sum(crop.damage[,3])>=0.5))
crop.damage[1:ind4,1]
## Source: local data frame [3 x 1]
##
## EVTYPE
## (fctr)
## 1 DROUGHT
## 2 FLOOD
## 3 RIVER FLOOD
. Actions to be taken in order to protect population health in U.S.:
. Actions to be taken in order to protect properties and crop damage in U.S.: