In this report, we aim to analyze the impact of different weather events on public health and economy based on the storm database collected from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) from 1950 - 2011. We will use the estimates of fatalities, injuries, property and crop damage to decide which types of event are most harmful to the population health and economy. From these data, we found that excessive heat and tornado are most harmful with respect to population health, while flood, drought, and hurricane/typhoon have the greatest economic consequences.
echo = TRUE # Always make code visible
library(R.utils)
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.7.0 (2015-02-19) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.19.0 (2015-02-27) successfully loaded. See ?R.oo for help.
##
## Attaching package: 'R.oo'
##
## The following objects are masked from 'package:methods':
##
## getClasses, getMethods
##
## The following objects are masked from 'package:base':
##
## attach, detach, gc, load, save
##
## R.utils v2.0.2 (2015-04-27) successfully loaded. See ?R.utils for help.
##
## Attaching package: 'R.utils'
##
## The following object is masked from 'package:utils':
##
## timestamp
##
## The following objects are masked from 'package:base':
##
## cat, commandArgs, getOption, inherits, isOpen, parse, warnings
library(ggplot2)
library(plyr)
require(gridExtra)
## Loading required package: gridExtra
## Loading required package: grid
Download, unzip, and read the dataset.
if (!"stormData.csv.bz2" %in% dir("./data/")) {
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "stormData.csv.bz2")
bunzip2("stormData.csv.bz2", overwrite=T, remove=F)
}
if (!"stormData" %in% ls()) {
stormData <- read.csv("stormData.csv", sep = ",")
}
head(stormData, n = 2)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14 100 3 0 0
## 2 NA 0 2 150 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
Plot the histogram
if (dim(stormData)[2] == 37) {
stormData$year <- as.numeric(format(as.Date(stormData$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"), "%Y"))
}
hist(stormData$year, breaks = 30)
We see that the number of events tracked starts to significantly increase around 1995. So, we subset the data from 1990 to 2011 to get good records.
storm <- stormData[stormData$year >= 1995, ]
We check the number of fatalities and injuries that are caused by the severe weather events. We would like to get the first 15 most severe types of weather events.
sortHelper <- function(fieldName, top = 15, dataset = stormData) {
index <- which(colnames(dataset) == fieldName)
field <- aggregate(dataset[, index], by = list(dataset$EVTYPE), FUN = "sum")
names(field) <- c("EVTYPE", fieldName)
field <- arrange(field, field[, 2], decreasing = T)
field <- head(field, n = top)
field <- within(field, EVTYPE <- factor(x = EVTYPE, levels = field$EVTYPE))
return(field)
}
fatalities <- sortHelper("FATALITIES", dataset = storm)
injuries <- sortHelper("INJURIES", dataset = storm)
We will convert the property damage and crop damage data into comparable numerical forms according to the meaning of units described in the code book (Storm Events). Both PROPDMGEXP and CROPDMGEXP columns record a multiplier for each observation where we have Hundred (H), Thousand (K), Million (M) and Billion (B).
convertHelper <- function(dataset = storm, fieldName, newFieldName) {
totalLen <- dim(dataset)[2]
index <- which(colnames(dataset) == fieldName)
dataset[, index] <- as.character(dataset[, index])
logic <- !is.na(toupper(dataset[, index]))
dataset[logic & toupper(dataset[, index]) == "B", index] <- "9"
dataset[logic & toupper(dataset[, index]) == "M", index] <- "6"
dataset[logic & toupper(dataset[, index]) == "K", index] <- "3"
dataset[logic & toupper(dataset[, index]) == "H", index] <- "2"
dataset[logic & toupper(dataset[, index]) == "", index] <- "0"
dataset[, index] <- as.numeric(dataset[, index])
dataset[is.na(dataset[, index]), index] <- 0
dataset <- cbind(dataset, dataset[, index - 1] * 10^dataset[, index])
names(dataset)[totalLen + 1] <- newFieldName
return(dataset)
}
storm <- suppressWarnings(convertHelper(storm, "PROPDMGEXP", "propertyDamage"))
storm <- suppressWarnings(convertHelper(storm, "CROPDMGEXP", "cropDamage"))
options(scipen=10)
property <- sortHelper("propertyDamage", dataset = storm)
crop <- sortHelper("cropDamage", dataset = storm)
fatalitiesPlot <- qplot(EVTYPE, data = fatalities, weight = FATALITIES, geom = "bar", binwidth = 1) +
scale_y_continuous("Number of Fatalities") +
theme(axis.text.x = element_text(angle = 45,
hjust = 1)) + xlab("Severe Weather Type") +
ggtitle("Total Fatalities by Severe Weather\n Events in the U.S.\n from 1995 - 2011")
injuriesPlot <- qplot(EVTYPE, data = injuries, weight = INJURIES, geom = "bar", binwidth = 1) +
scale_y_continuous("Number of Injuries") +
theme(axis.text.x = element_text(angle = 45,
hjust = 1)) + xlab("Severe Weather Type") +
ggtitle("Total Injuries by Severe Weather\n Events in the U.S.\n from 1995 - 2011")
grid.arrange(fatalitiesPlot, injuriesPlot, ncol = 2)
Based on the above histograms, we find that excessive heat and tornado cause most fatalities; tornato causes most injuries in the United States from 1995 to 2011.
propertyPlot <- qplot(EVTYPE, data = property, weight = propertyDamage, geom = "bar", binwidth = 1) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_y_continuous("Property Damage in US dollars")+
xlab("Severe Weather Type") + ggtitle("Total Property Damage by\n Severe Weather Events in\n the U.S. from 1995 - 2011")
cropPlot<- qplot(EVTYPE, data = crop, weight = cropDamage, geom = "bar", binwidth = 1) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_y_continuous("Crop Damage in US dollars") +
xlab("Severe Weather Type") + ggtitle("Total Crop Damage by \nSevere Weather Events in\n the U.S. from 1995 - 2011")
grid.arrange(propertyPlot, cropPlot, ncol = 2)
Based on the above histograms, we find that flood and hurricane/typhoon cause most property damage; drought and flood causes most crop damage in the United States from 1995 to 2011.
From these data, we found that excessive heat and tornado are most harmful with respect to population health, while flood, drought, and hurricane/typhoon have the greatest economic consequences.