Marco Antonio Gonzalez Junior - July 25, 2015
The goal of this research is to analyze the impact of different weather events on public health and economy based on the weather conditions database collected by the U.S. National Oceanic and Atmospheric Administration’s (NOAA) from 1950 to 2011. The estimates of fatalities, injuries, property and crop damage are used to decide which types of events are most harmful to the population health and economy. This research conclusion is that excessive heat and tornado are most harmful to population health, while flood, drought, and hurricane/typhoon have the greatest economic consequences.
Library dependencies:
library(R.utils)
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.7.0 (2015-02-19) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.19.0 (2015-02-27) successfully loaded. See ?R.oo for help.
##
## Attaching package: 'R.oo'
##
## The following objects are masked from 'package:methods':
##
## getClasses, getMethods
##
## The following objects are masked from 'package:base':
##
## attach, detach, gc, load, save
##
## R.utils v2.1.0 (2015-05-27) successfully loaded. See ?R.utils for help.
##
## Attaching package: 'R.utils'
##
## The following object is masked from 'package:utils':
##
## timestamp
##
## The following objects are masked from 'package:base':
##
## cat, commandArgs, getOption, inherits, isOpen, parse, warnings
library(plyr)
library(ggplot2)
require(gridExtra)
## Loading required package: gridExtra
Download and unzip the data.
setwd("~/projects/DSS_ReproducibleResearch_PA2/")
if (!"stormData.csv.bz2" %in% dir("./data/")) {
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "data/stormData.csv.bz2")
bunzip2("data/stormData.csv.bz2", overwrite = TRUE, remove = FALSE)
}
Read the generated csv file.
if (!"weatherData" %in% ls()) {
weatherData <- read.csv("data/stormData.csv", sep = ",")
}
if (dim(weatherData)[2] == 37) {
weatherData$year <- as.numeric(format(as.Date(weatherData$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"), "%Y"))
}
hist(weatherData$year, breaks = 30, main = "Bad weather events registered per year", xlab = "Year", ylab = "Events")
The above histogram shows that the number of events tracked starts to significantly increase around 1995. So, we use the subset of the data from 1990 to 2011 to get most out of good records.
storms <- weatherData[weatherData$year >= 1995, ]
This section calculates the number of fatalities and injuries caused by the bad weather events.
sortHelper <- function(fieldName, top = 15, dataset = weatherData) {
index <- which(colnames(dataset) == fieldName)
field <- aggregate(dataset[, index], by = list(dataset$EVTYPE), FUN = "sum")
names(field) <- c("EVTYPE", fieldName)
field <- arrange(field, field[, 2], decreasing = TRUE)
field <- head(field, n = top)
field <- within(field, EVTYPE <- factor(x = EVTYPE, levels = field$EVTYPE))
return(field)
}
fatalities <- sortHelper("FATALITIES", dataset = storms)
injuries <- sortHelper("INJURIES", dataset = storms)
Converting the property damage and crop damage data into comparable numerical forms according to the meaning of units described in the code book (Storm Events). Both PROPDMGEXP and CROPDMGEXP columns record a multiplier for each observation where we have Hundred (H), Thousand (K), Million (M) and Billion (B).
convertHelper <- function(dataset = storms, fieldName, newFieldName) {
totalLen <- dim(dataset)[2]
index <- which(colnames(dataset) == fieldName)
dataset[, index] <- as.character(dataset[, index])
logic <- !is.na(toupper(dataset[, index]))
dataset[logic & toupper(dataset[, index]) == "B", index] <- "9"
dataset[logic & toupper(dataset[, index]) == "M", index] <- "6"
dataset[logic & toupper(dataset[, index]) == "K", index] <- "3"
dataset[logic & toupper(dataset[, index]) == "H", index] <- "2"
dataset[logic & toupper(dataset[, index]) == "", index] <- "0"
dataset[, index] <- as.numeric(dataset[, index])
dataset[is.na(dataset[, index]), index] <- 0
dataset <- cbind(dataset, dataset[, index - 1] * 10^dataset[, index])
names(dataset)[totalLen + 1] <- newFieldName
return(dataset)
}
storms <- convertHelper(storms, "PROPDMGEXP", "propertyDamage")
## Warning in convertHelper(storms, "PROPDMGEXP", "propertyDamage"): NAs
## introduced by coercion
storms <- convertHelper(storms, "CROPDMGEXP", "cropDamage")
## Warning in convertHelper(storms, "CROPDMGEXP", "cropDamage"): NAs
## introduced by coercion
property <- sortHelper("propertyDamage", dataset = storms)
crop <- sortHelper("cropDamage", dataset = storms)
As for the impact on public health, we have got two sorted lists of severe weather events below by the number of people badly affected.
fatalities
## EVTYPE FATALITIES
## 1 EXCESSIVE HEAT 1903
## 2 TORNADO 1545
## 3 FLASH FLOOD 934
## 4 HEAT 924
## 5 LIGHTNING 729
## 6 FLOOD 423
## 7 RIP CURRENT 360
## 8 HIGH WIND 241
## 9 TSTM WIND 241
## 10 AVALANCHE 223
## 11 RIP CURRENTS 204
## 12 WINTER STORM 195
## 13 HEAT WAVE 161
## 14 THUNDERSTORM WIND 131
## 15 EXTREME COLD 126
injuries
## EVTYPE INJURIES
## 1 TORNADO 21765
## 2 FLOOD 6769
## 3 EXCESSIVE HEAT 6525
## 4 LIGHTNING 4631
## 5 TSTM WIND 3630
## 6 HEAT 2030
## 7 FLASH FLOOD 1734
## 8 THUNDERSTORM WIND 1426
## 9 WINTER STORM 1298
## 10 HURRICANE/TYPHOON 1275
## 11 HIGH WIND 1093
## 12 HAIL 916
## 13 WILDFIRE 911
## 14 HEAVY SNOW 751
## 15 FOG 718
And the following is a pair of graphs of total fatalities and total injuries affected by these severe weather events.
fatalitiesPlot <- qplot(EVTYPE, data = fatalities, weight = FATALITIES, geom = "bar", binwidth = 1) +
scale_y_continuous("Number of fatalities") +
theme(axis.text.x = element_text(angle = 45,
hjust = 1)) + xlab("Weather condition") +
ggtitle("Total fatalities by bad weather\n events in the U.S.A.\n from 1995 to 2011")
injuriesPlot <- qplot(EVTYPE, data = injuries, weight = INJURIES, geom = "bar", binwidth = 1) +
scale_y_continuous("Number of injuries") +
theme(axis.text.x = element_text(angle = 45,
hjust = 1)) + xlab("Weather condition") +
ggtitle("Total injuries by bad weather\n events in the U.S.A.\n from 1995 to 2011")
grid.arrange(fatalitiesPlot, injuriesPlot, ncol = 2)
Based on the above histograms, we find that excessive heat and tornado cause most fatalities; tornado causes most injuries in the United States of America from 1995 to 2011.
As for the impact on economy, we have got two sorted lists below by the amount of money cost by damages.
property
## EVTYPE propertyDamage
## 1 FLOOD 144022037057
## 2 HURRICANE/TYPHOON 69305840000
## 3 STORM SURGE 43193536000
## 4 TORNADO 24935939545
## 5 FLASH FLOOD 16047794571
## 6 HAIL 15048722103
## 7 HURRICANE 11812819010
## 8 TROPICAL STORM 7653335550
## 9 HIGH WIND 5259785375
## 10 WILDFIRE 4759064000
## 11 STORM SURGE/TIDE 4641188000
## 12 TSTM WIND 4482361440
## 13 ICE STORM 3643555810
## 14 THUNDERSTORM WIND 3399282992
## 15 HURRICANE OPAL 3172846000
crop
## EVTYPE cropDamage
## 1 DROUGHT 13922066000
## 2 FLOOD 5422810400
## 3 HURRICANE 2741410000
## 4 HAIL 2614127070
## 5 HURRICANE/TYPHOON 2607872800
## 6 FLASH FLOOD 1343915000
## 7 EXTREME COLD 1292473000
## 8 FROST/FREEZE 1094086000
## 9 HEAVY RAIN 728399800
## 10 TROPICAL STORM 677836000
## 11 HIGH WIND 633561300
## 12 TSTM WIND 553947350
## 13 EXCESSIVE HEAT 492402000
## 14 THUNDERSTORM WIND 414354000
## 15 HEAT 401411500
The following graphs spots the total property damage and total crop damage affected by the bad weather events.
propertyPlot <- qplot(EVTYPE, data = property, weight = propertyDamage, geom = "bar", binwidth = 1) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_y_continuous("Property damage in USD")+
xlab("Weather condition") + ggtitle("Total property damage by\n bad weather events in\n the U.S.A. from 1995 to 2011")
cropPlot<- qplot(EVTYPE, data = crop, weight = cropDamage, geom = "bar", binwidth = 1) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_y_continuous("Crop damage in USD") +
xlab("Weather condition") + ggtitle("Total crop damage by \nbad weather events in\n the U.S.A. from 1995 - 2011")
grid.arrange(propertyPlot, cropPlot, ncol = 2)
Based on the above histograms, we find that flood and hurricane/typhoon cause most property damage; drought and flood causes most crop damage in the United States of America from 1995 to 2011.
The research leads to the conclusion that excessive heat and tornadoes are most harmful to population health, while flood, drought and hurricane/typhoon have the greatest economic consequences.