The goal of this assignment is to analyze the effects of harmful weather on the U.S. population and economy. Storm data between 1950 and 2011 was obtained from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The results of our finding were:
The approach of this assignment is centered around addressing the following questions:
The following are the steps used to process the data from start to end.
setwd("C:\\Users\\Home\\ReproducibleFinalProject")
#URL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
#download.file(URL, destfile = "./Data/StormData.bz2")
rawData <- read.csv("./Data/StormData.bz2")
Two variables were identified to quantify the effects on population health - $FATALITIES AND $INJURIES.
# Build a data frame for types of events and their corresponding effects on population health
popHealth <- data.frame(EventType = rawData$EVTYPE, Fatalities = rawData$FATALITIES,
Injuries = rawData$INJURIES)
# Inspect if there are any missing data
(sum(is.na(popHealth)))
## [1] 0
# Find the total fatalities and injuries based on event types
library(plyr)
popHealth.summary <- ddply(popHealth, c("EventType"), summarize,
TotalFatalitiesAndInjuries = sum(Fatalities)+sum(Injuries))
# Sort from highest to lowest value and retain the top 5 event types
top5.popHealth <- popHealth.summary[order(-popHealth.summary$TotalFatalitiesAndInjuries),][1:5,]
Four variables were identified to quantify the effects on the economy - $PROPDMG, $PROPDGMEXP, $CROPDMG, $CROPDMGEXP. The variables ending with “EXP” represent the multipliers of their respective cost. So “H” represents a multiplier of 100, “K” represents a multiplier of 1000, “M” represents a multiplier of 1000000, and “B” represents a multiplier of 1000000000.
# Build a data frame for types of events and their corresponding effects on the economy
damageCost <- data.frame(EventType = rawData$EVTYPE, PropDmg = rawData$PROPDMG,
PropDmgExp = rawData$PROPDMGEXP, CropDmg = rawData$CROPDMG,
CropDmgExp = rawData$CROPDMGEXP)
# Inspect if there are any missing data
(sum(is.na(damageCost)))
## [1] 0
# Generate two new columns for the numerical value of the corresponding cost multiplier for both types of damage
damageCost$PropDmgMult=0
damageCost$CropDmgMult=0
damageCost[damageCost$PropDmgExp=="H",6]=100
damageCost[damageCost$PropDmgExp=="K",6]=1000
damageCost[damageCost$PropDmgExp=="M",6]=1000000
damageCost[damageCost$PropDmgExp=="B",6]=1000000000
damageCost[damageCost$CropDmgExp=="H",7]=100
damageCost[damageCost$CropDmgExp=="K",7]=1000
damageCost[damageCost$CropDmgExp=="M",7]=1000000
damageCost[damageCost$CropDmgExp=="B",7]=1000000000
# Combine the damage costs for properties and crops for each event type using the multipliers
damageCost.summary <- ddply(damageCost, c("EventType"), summarize,
TotalDamageCosts = sum(PropDmg*PropDmgMult)+sum(CropDmg*CropDmgMult))
# Sort from highest to lowest value and retain the top 5 event types
top5.damageCost <- damageCost.summary[order(-damageCost.summary$TotalDamageCosts),][1:5,]
# Round off damage cost values to 2 significant numbers in units of $Billion
top5.damageCost[,2]=round(top5.damageCost[,2]/1000000000,2)
names(top5.damageCost)[names(top5.damageCost)=="TotalDamageCosts"] <- "TotalDamageCostInBillions"
The following plot based on the dataset prepared in Step 2 of Data Processing shows the top 5 weather events between 1950-2011 that caused the highest damage on population health in terms of injuries and fatalities:
library(ggplot2)
ggplot(top5.popHealth, aes(x=EventType, y=TotalFatalitiesAndInjuries))+geom_bar(stat="identity")+
xlab("Event Type")+ ylab("Total Fatalities and Injuries")+
ggtitle("Total Fatalities and Injuries for the top 5 weather events in the US
between 1950-2011")
The following plot based on the dataset prepared in Step 3 of Data Processing shows the top 5 weather events between 1950-2011 that caused the highest damage cost to the economy:
ggplot(top5.damageCost, aes(x=EventType, y=TotalDamageCostInBillions))+geom_bar(stat="identity")+
xlab("Event Type")+ ylab("Total Damage Cost in $ Billion")+
ggtitle("Total damage costs in $ Billion to the economy due to the top 5 weather events
in the US between 1950-2011")