Reproducible Research - Project 2

Title: Impact of Severe Weather Events

Assignment Synopsis:

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. The project aims in exploring the data and track the characteristics of the storms in the US and analyze: 1. What types of events are most harmful with respect to population health 2. What types of events have greatest economic impact

Data Processing:

Data for the analysis can be downloaded from the course website: Storm Data There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

The first step is to load the necessary packages and read the storm data.

library(plyr)
library(ggplot2)
library(gridExtra)
library(grid)
data <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))

Since the analysis intends to focus on health and economic consequences of the severe weather events, we will limit the dataset to the needed columns for faster processing.

stormData <- data[,c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]

Population Health

We will first summarize the fatalaties and injuries by event type.

harm2health <- ddply(stormData, .(EVTYPE), summarize,fatalities = sum(FATALITIES),injuries = sum(INJURIES))
fatal <- harm2health[order(harm2health$fatalities, decreasing = TRUE), ]
injury <- harm2health[order(harm2health$injuries, decreasing = TRUE), ]

Economic Impacts

Since the exponential values are stored in seperate column, we will use a function and convert the value of the exponent to a number.

getExp <- function(e) {
    if (e %in% c("h", "H"))
        return(2)
    else if (e %in% c("k", "K"))
        return(3)
    else if (e %in% c("m", "M"))
        return(6)
    else if (e %in% c("b", "B"))
        return(9)
    else if (!is.na(as.numeric(e))) 
        return(as.numeric(e))
    else if (e %in% c("", "-", "?", "+"))
        return(0)
    else {
        stop("Invalid value.")
    }
}

We will now calculate the values for property and crop damages

propExp <- sapply(stormData$PROPDMGEXP, FUN=getExp)
stormData$propDamage <- stormData$PROPDMG * (10 ** propExp)
cropExp <- sapply(stormData$CROPDMGEXP, FUN=getExp)
stormData$cropDamage <- stormData$CROPDMG * (10 ** cropExp)

We will now summarize the damages for crops and property by event type by excluding the events which didnot cause finalcial impact

economicDamage <- ddply(stormData, .(EVTYPE), summarize,propDamage = sum(propDamage), cropDamage = sum(cropDamage))

economicDamage <- economicDamage[(economicDamage$propDamage > 0 | economicDamage$cropDamage > 0), ]

We will now sort the data

propDmgSorted <- economicDamage[order(economicDamage$propDamage, decreasing = TRUE), ]
cropDmgSorted <- economicDamage[order(economicDamage$cropDamage, decreasing = TRUE), ]

We have now processed the data accordingly to present our results.

Results

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

We will plot the top 5 events which are causing most harm to the health of the citizens in terms of injuries and fatalities in a grid.

plot1 <- ggplot(data=head(injury,5), aes(x=reorder(EVTYPE, injuries), y=injuries)) +
   geom_bar(fill="deepskyblue1",stat="identity")  + coord_flip() + 
    ylab("Number of Injuries") + xlab("Event Type") +
    ggtitle("Health Impacts of weather events") +
    theme(legend.position="none")

plot2 <- ggplot(data=head(fatal,5), aes(x=reorder(EVTYPE, fatalities), y=fatalities)) +
    geom_bar(fill="gold1",stat="identity") + coord_flip() +
    ylab("Number of Fatalities") + xlab("Event Type") +
    theme(legend.position="none")

grid.arrange(plot1, plot2, nrow =2)

  1. Across the United States, which types of events have the greatest economic consequences?

We will plot the top 5 events which are causing most financial damage in terms of crop and property damages in a grid. We will plot the y axis in log 10 scale for better readability.

plot3 <- ggplot(data=head(propDmgSorted,5), aes(x=reorder(EVTYPE, propDamage), y=log10(propDamage), fill=propDamage )) +
    geom_bar(fill="deepskyblue1", stat="identity") + coord_flip() +
    xlab("Event Type") + ylab("Property Damages (in $, log 10)") +
    ggtitle("Economic impact of weather events") +
    theme(plot.title = element_text(hjust = 0))

plot4 <- ggplot(data=head(cropDmgSorted,5), aes(x=reorder(EVTYPE, cropDamage), y=log10(cropDamage), fill=cropDamage)) +
    geom_bar(fill="gold1", stat="identity") + coord_flip() + 
    xlab("Event Type") + ylab("Crop Damages (in $, log 10)") + 
    theme(legend.position="none")

grid.arrange(plot3, plot4, ncol=1, nrow =2)

Findings:

  • Tornados cause most injuries
  • Tornados cause most fatalities
  • Flash Floods case most property damages
  • Drought cause most crop damages