Synopsis

The goal of this assignment is to analyze the effects of harmful weather on the U.S. population and economy. Storm data between 1950 and 2011 was obtained from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The results of our finding were:

  1. Tornado contributed the highest number of fatalities and injuries, more than 10 times of the second highest contributor, which was due to excessive heat.
  2. Flood contributed to the highest economic consequence, more than 2 times of the second highest contributor, which was from hurricanes and typhoons.

Method

The approach of this assignment is centered around addressing the following questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

Data Processing

The following are the steps used to process the data from start to end.

Step 1: Obtain data file and read into R

setwd("C:\\Users\\Home\\ReproducibleFinalProject")
#URL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
#download.file(URL, destfile = "./Data/StormData.bz2")
rawData <- read.csv("./Data/StormData.bz2")

Step 2: Prepare the dataset to analyze the effects on population health

Two variables were identified to quantify the effects on population health - $FATALITIES AND $INJURIES.

# Build a data frame for types of events and their corresponding effects on population health
popHealth <- data.frame(EventType = rawData$EVTYPE, Fatalities = rawData$FATALITIES, 
                        Injuries = rawData$INJURIES)
# Inspect if there are any missing data
(sum(is.na(popHealth)))
## [1] 0
# Find the total fatalities and injuries based on event types
library(plyr)
popHealth.summary <- ddply(popHealth, c("EventType"), summarize, 
                           TotalFatalitiesAndInjuries = sum(Fatalities)+sum(Injuries))
# Sort from highest to lowest value and retain the top 5 event types
top5.popHealth <- popHealth.summary[order(-popHealth.summary$TotalFatalitiesAndInjuries),][1:5,]

Step 3: Prepare the dataset to analyze the effects on the economy

Four variables were identified to quantify the effects on the economy - $PROPDMG, $PROPDGMEXP, $CROPDMG, $CROPDMGEXP. The variables ending with “EXP” represent the multipliers of their respective cost. So “H” represents a multiplier of 100, “K” represents a multiplier of 1000, “M” represents a multiplier of 1000000, and “B” represents a multiplier of 1000000000.

# Build a data frame for types of events and their corresponding effects on the economy
damageCost <- data.frame(EventType = rawData$EVTYPE, PropDmg = rawData$PROPDMG, 
                         PropDmgExp = rawData$PROPDMGEXP, CropDmg = rawData$CROPDMG, 
                         CropDmgExp = rawData$CROPDMGEXP)
# Inspect if there are any missing data
(sum(is.na(damageCost)))
## [1] 0
# Generate two new columns for the numerical value of the corresponding cost multiplier for both types of damage
damageCost$PropDmgMult=0
damageCost$CropDmgMult=0
damageCost[damageCost$PropDmgExp=="H",6]=100
damageCost[damageCost$PropDmgExp=="K",6]=1000
damageCost[damageCost$PropDmgExp=="M",6]=1000000
damageCost[damageCost$PropDmgExp=="B",6]=1000000000
damageCost[damageCost$CropDmgExp=="H",7]=100
damageCost[damageCost$CropDmgExp=="K",7]=1000
damageCost[damageCost$CropDmgExp=="M",7]=1000000
damageCost[damageCost$CropDmgExp=="B",7]=1000000000
# Combine the damage costs for properties and crops for each event type using the multipliers
damageCost.summary <- ddply(damageCost, c("EventType"), summarize,
                            TotalDamageCosts = sum(PropDmg*PropDmgMult)+sum(CropDmg*CropDmgMult))
# Sort from highest to lowest value and retain the top 5 event types
top5.damageCost <- damageCost.summary[order(-damageCost.summary$TotalDamageCosts),][1:5,]
# Round off damage cost values to 2 significant numbers in units of $Billion
top5.damageCost[,2]=round(top5.damageCost[,2]/1000000000,2)
names(top5.damageCost)[names(top5.damageCost)=="TotalDamageCosts"] <- "TotalDamageCostInBillions"

Results

The following plot based on the dataset prepared in Step 2 of Data Processing shows the top 5 weather events between 1950-2011 that caused the highest damage on population health in terms of injuries and fatalities:

library(ggplot2)
ggplot(top5.popHealth, aes(x=EventType, y=TotalFatalitiesAndInjuries))+geom_bar(stat="identity")+ 
  xlab("Event Type")+ ylab("Total Fatalities and Injuries")+ 
  ggtitle("Total Fatalities and Injuries for the top 5 weather events in the US 
          between 1950-2011")

The following plot based on the dataset prepared in Step 3 of Data Processing shows the top 5 weather events between 1950-2011 that caused the highest damage cost to the economy:

ggplot(top5.damageCost, aes(x=EventType, y=TotalDamageCostInBillions))+geom_bar(stat="identity")+ 
  xlab("Event Type")+ ylab("Total Damage Cost in $ Billion")+ 
  ggtitle("Total damage costs in $ Billion to the economy due to the top 5 weather events 
          in the US between 1950-2011")

Conclusion

  1. Across the United States, the top 3 weather events that were most harmful with respect to population health between 1950 and 2011 were “Tornado”, “Excessive Heat” and “TSTM Wind”. By far, “Tornado” contributed the highest to the population health damage with a total number of approximately 100,000 fatalities and injuries.
  2. Across the United States, the top 3 weather events with the greatest economic consequences between 1950 and 2011 were “Flood”, “Hurricane/Typhoon” and “Tornado”. The highest economic consequence measured by the damage costs to property and crops was more than twice of the second highest economic consequence, which was approximately $150 billion.