Synopsis

This R Markdown document explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The database records events from 1950-2011. It contains information related to major storms and weather events, as well as estimates for population health and economic damage. We shall try to answer the following questions:

        1. Which types of events are most harmful with respect to population health?  
        
        2. Which types of events have the greatest economic consequences?

Dependencies and Global Options

This document has the following dependencies:

library(dplyr)

Data Processing

Describe how the data were loaded into R and processed for the analysis.

First, set the working directory and create a data directory if it does not exist.

setwd("./")
if(!file.exists(".data")){dir.create("./data")}
setwd("./data")

Download zip file.

fileUrl  <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl,destfile="./storm_data.bz2", method="curl")

Load data. The read.csv() can handle the compressed file automatically.

dat  <- tbl_df(read.csv("storm_data.bz2", header = TRUE)) 

Tidy Data. Processed data necessary for the analysis.

  1. Harmful with respect to population health. Let’s look at the following variables:

Health Damage

dat.health <- select(dat, EVTYPE, FATALITIES, INJURIES) %>%
          filter(FATALITIES>0 | INJURIES>0) %>%
          group_by(EVTYPE) %>%
          summarise(total.FAT=sum(FATALITIES), total.INJ=sum(INJURIES))
  1. Economic damage. Let’s look at the following variables:

To estimate the cost of the economic damage Let’s give a look at the National Weather Service Storm Data Documentation. Damage estimates are entered as actual dollar amounts. The columns PROPDMGEXP and CROPDMGEXP contain alphabetical characters signifying the magnitude of the number. We will keep the characters: “K” for thousands, “M” for millions, and “B” for billions.

2A. Property Damage

Create an index for each monetary unit:

ind.K <- which(dat$PROPDMGEXP=="K")
ind.M <- which(dat$PROPDMGEXP=="M")
ind.B <- which(dat$PROPDMGEXP=="B")

Create a new column in the dataframe with the cost expressed in the same units (E.g. Billions $)

new.column  <- rep(0,dim(dat)[1])
new.column[ind.B] = dat$PROPDMG[ind.B]
new.column[ind.M] = dat$PROPDMG[ind.M]/10^3
new.column[ind.K] = dat$PROPDMG[ind.K]/10^6

Create a new variable in the dataframe

dat$MillionsPROP = new.column

2B. Crop Damage

Let’s proceed for Crops Damage in the same way
Create an index for each monetary unit:

ind.cropK <- which(dat$CROPDMGEXP=="K")
ind.cropM <- which(dat$CROPDMGEXP=="M")
ind.cropB <- which(dat$CROPDMGEXP=="B")

Create a new column in the dataframe with the cost expressed in the same units (E.g. Billions $)

new.column  <- rep(0,dim(dat)[1])
new.column[ind.cropB] = dat$CROPDMG[ind.cropB]
new.column[ind.cropM] = dat$CROPDMG[ind.cropM]/10^3
new.column[ind.cropK] = dat$CROPDMG[ind.cropK]/10^6

Create a new variable in the dataframe

dat$MillionsCROP = new.column

Economic Damage

dat.dmg <- select(dat, EVTYPE, PROPDMG, MillionsPROP,CROPDMG, MillionsCROP) %>%
          filter(PROPDMG>0 | CROPDMG>0) %>%
           group_by(EVTYPE) %>%
           summarise(Properties=sum(MillionsPROP), Crops=sum(MillionsCROP)) 

Results

Question 1. Which types of events are most harmful with respect to population health?

Extract the data:

n.fatality <- arrange(dat.health, desc(total.FAT))
n.injury <- arrange(dat.health, desc(total.INJ))

Distribution of the Top10 events causing more harmful to population health:

par(mfrow=c(1,2), oma=c(1,0,3,0)) 
with(n.fatality, barplot(names.arg = EVTYPE[1:10], height = total.FAT[1:10],
         cex.axis = 0.8, cex.names = 0.70, cex.lab=0.9,log="y", las=2, xpd=FALSE,
         ylab= "Number of Fatalities", col="lightsteelblue"))
with(n.injury, barplot(names.arg = EVTYPE[1:10], height = total.INJ[1:10],
          cex.axis = 0.8, cex.names = 0.70, cex.lab=0.9, log="y", las=2, 
          xpd=FALSE, ylab= "Number of Injuries", col="mediumaquamarine"))
title("Top10 most harmful events to population health in U.S.\n (1950-2011)", outer=TRUE) 

We may want to create a function to identify the events that causes 50% of fatalities:

ind <- min(which(cumsum(n.fatality[,2])/sum(n.fatality[,2])>=0.5))
n.fatality[1:ind,1]
## Source: local data frame [3 x 1]
## 
##           EVTYPE
##           (fctr)
## 1        TORNADO
## 2 EXCESSIVE HEAT
## 3    FLASH FLOOD

We may want to create a function to identify the events that causes 50% of injuries:

ind2 <- min(which(cumsum(n.injury[,2])/sum(n.injury[,2])>=0.5))
n.injury[1:ind2,1]
## Source: local data frame [4 x 1]
## 
##           EVTYPE
##           (fctr)
## 1        TORNADO
## 2      TSTM WIND
## 3          FLOOD
## 4 EXCESSIVE HEAT

Question 2. Across the United States, which types of events have the greatest economic consequences?

Extract the data:

prop.damage <- arrange(dat.dmg, desc(Properties)) 
crop.damage <- arrange(dat.dmg, desc(Crops))

Distribution of the Top10 events causing more economic damage:

par(mfrow=c(1,2),oma=c(1,1,3,0))
with(prop.damage, barplot(names.arg = EVTYPE[1:10], height = Properties[1:10],
         cex.axis = 0.8, cex.names = 0.60, cex.lab=0.75, las=2, xpd=FALSE,
         ylab= "$ Billions", main="Damage on Properties", col="lightsteelblue"))
with(crop.damage, barplot(names.arg = EVTYPE[1:10], height = Crops[1:10],
          cex.axis = 0.8, cex.names = 0.60, cex.lab=0.75, las=2, 
          xpd=FALSE, ylab= "$ Billions ($)", main="Damage on Crops",col="mediumaquamarine"))
title("Top10 most harmful events with economic consequences.\n (1950-2011)", outer=TRUE) 

We may want to create a function to identify the events that causes 50% of property damages:

ind3 <- min(which(cumsum(prop.damage[,2])/sum(prop.damage[,2])>=0.5))
prop.damage[1:ind3,1]
## Source: local data frame [2 x 1]
## 
##              EVTYPE
##              (fctr)
## 1             FLOOD
## 2 HURRICANE/TYPHOON

We may want to create a function to identify the events that causes 50% of crop damages:

ind4 <- min(which(cumsum(crop.damage[,3])/sum(crop.damage[,3])>=0.5))
crop.damage[1:ind4,1]
## Source: local data frame [3 x 1]
## 
##        EVTYPE
##        (fctr)
## 1     DROUGHT
## 2       FLOOD
## 3 RIVER FLOOD

Concluding Remarks

. Actions to be taken in order to protect population health in U.S.:

. Actions to be taken in order to protect properties and crop damage in U.S.: