options(rpubs.upload.method = “internal”)
Storms, floods, wild fires and other severe weather events pose a serious public health hazard including injuries and fatalities as well as huge economic losses. Recording ana analyzing such events over a period of time is key to disaster preparedness.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks major storms and weather events in the United States, like their location and time, as well as estimates of any fatalities, injuries, and property damage.
The current study seeks to answer two questions: (1) What kind of weather events result in maximum number of injuries, including fatal ones, for citizens ? (2) What are the top weather event types that result is maximum loss to property, including crop loss ?
The following analysis using R throws some light on these questions.
require(R.utils, warn.conflicts = FALSE, quietly = TRUE)
## Warning: package 'R.utils' was built under R version 3.0.3
## Loading required package: R.methodsS3
##
## Attaching package: 'R.oo'
##
## The following objects are masked from 'package:methods':
##
## getClasses, getMethods
##
## The following objects are masked from 'package:base':
##
## attach, detach, gc, load, save
require(plyr, warn.conflicts = FALSE, quietly = TRUE)
## Warning: package 'plyr' was built under R version 3.0.3
require(ggplot2, warn.conflicts = FALSE, quietly = TRUE)
We first unzip the data file and read it into R.
# unzip the data file
if (!file.exists("repdata_data_StormData.csv")) {
bunzip2("repdata_data_StormData.csv.bz2", "repdata_data_StormData.csv",
remove = F)
}
# load data
da = read.csv("data/repdata_data_StormData.csv", header = TRUE)
# str(da)
dim(da)
## [1] 902297 37
names(da)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
The full data set contains 902,297 rows of 37 variables. But this study involves only two sets of data: the human injuries & fatalities, and economic loss. So we retain only the columns giving the type of disaster event, fatalities, other injuries, property loss, crop loss and the two dimensionality indicators for property and crop loss. We also convert all column names to lower case for uniformity.
# we need only the event type, human harm and property damage columns
da = da[, c(8, 23:28)]
names(da) = tolower(names(da))
# quickly explore the data table(da$fatalities) table(da$injuries)
# table(da$evtype) names(table(da$evtype))
len = length(table(da$evtype))
There are 985 types of weather events. Looking at the list of all event types indicates some amount of overlap. Also, not all of them may lead to significant damage. To judge which of the event types are most dangerous to human health, we create a new variable harm that is a combination of fatalities and injuries.Then we aggregate the total number of cases of harm by event type.
# create a new variable for human fatality + injury counts
da$harm = da$fatalities + da$injuries
totalharm = sum(da$harm)
# add up human harm by disaster event type
agg = aggregate(da$harm, by = list(da$evtype), sum)
names(agg) = c("evtype", "harm")
# sort them in descending order of damage done
agg = arrange(agg, desc(harm))
Now it is time to prioritize the most harmful events:
# find the percentage of harm from the top 20 events
harmpc = round(sum(agg$harm[1:20])/sum(agg$harm) * 100, 1)
We see that 94.6 percent of the harm is caused by just 20 types of events. So it was decided to keep only these top 20 events, and discard the rest of them.
# Retain only the top 20 events
agg = agg[1:20, ]
The figure below summarizes the top 20 event types causing maximum harm to human life and physical well being.
# Plot a bar chart of the top 20 disasters
ggplot(agg, aes(x = evtype, y = harm)) + geom_bar(stat = "identity", fill = "steelblue") +
scale_x_discrete(limits = agg$evtype[order(desc(agg$harm))]) + xlab("Event Type") +
ylab("Fatalities/ Injuries") + ggtitle("Harm to humans from disasters") +
theme(axis.text.y = element_text(face = "bold", colour = "black", size = rel(1.2))) +
theme(axis.text.x = element_text(face = "bold", colour = "black", size = rel(1.2))) +
coord_flip()
It is noted that some of these classes may be overlapping, like the wind and heat related events. If necessary, we may be able to consolidate them into about 16 classes. But this does not alter the main conclusion of this study.
We begin by examining the type of exponents the property and crop damage columns are associated with:
table(da$propdmgexp)
##
## - ? + 0 1 2 3 4 5
## 465934 1 8 5 216 25 13 4 4 28
## 6 7 8 B h H K m M
## 4 5 1 40 1 6 424665 7 11330
table(da$cropdmgexp)
##
## ? 0 2 B k K m M
## 618413 7 19 1 9 21 281832 1 1994
From the table, we find that majority of the records are not having any modifiers. So we decide to keep only the exponents modifiying the values by billion, million and thousands.
# helper function to multiply the figures by the correct exponents
scaleKMB = function(var, expvar) {
w = which(da[[expvar]] == "B" | da[[expvar]] == "b")
da[[var]][w] = da[[var]][w] * 1e+09
w = which(da[[expvar]] == "M" | da[[expvar]] == "m")
da[[var]][w] = da[[var]][w] * 1e+06
w = which(da[[expvar]] == "K" | da[[expvar]] == "k")
da[[var]][w] = da[[var]][w] * 1000
return(da[[var]])
}
# apply the scale factors to the variables
da$propdmg = scaleKMB("propdmg", "propdmgexp")
da$cropdmg = scaleKMB("cropdmg", "cropdmgexp")
After the loss figures are adjusted for the right exponents, we sum up the loss of property and crop into a new variable named loss. This will be the indicator by which we will assess the economic damages of every event type.
# create a new variable for total property loss
da$loss = da$propdmg + da$cropdmg
totalloss = sum(da$loss)
Aggregate economic loss by type of weather event:
agg = aggregate(da$loss, by = list(da$evtype), sum)
names(agg) = c("evtype", "loss")
agg = arrange(agg, desc(loss))
head(agg, 20)
## evtype loss
## 1 FLOOD 1.503e+11
## 2 HURRICANE/TYPHOON 7.191e+10
## 3 TORNADO 5.735e+10
## 4 STORM SURGE 4.332e+10
## 5 HAIL 1.876e+10
## 6 FLASH FLOOD 1.756e+10
## 7 DROUGHT 1.502e+10
## 8 HURRICANE 1.461e+10
## 9 RIVER FLOOD 1.015e+10
## 10 ICE STORM 8.967e+09
## 11 TROPICAL STORM 8.382e+09
## 12 WINTER STORM 6.715e+09
## 13 HIGH WIND 5.909e+09
## 14 WILDFIRE 5.061e+09
## 15 TSTM WIND 5.039e+09
## 16 STORM SURGE/TIDE 4.642e+09
## 17 THUNDERSTORM WIND 3.898e+09
## 18 HURRICANE OPAL 3.192e+09
## 19 WILD/FOREST FIRE 3.109e+09
## 20 HEAVY RAIN/SEVERE WEATHER 2.500e+09
Find the top 20 events in terms of economic loss:
# find the percentage of harm from the top 20 events
losspc = round(sum(agg$loss[1:20])/sum(agg$loss) * 100, 1)
We see that 95.8 percent of the harm is caused by just 20 types of events. So decide to keep only these top 20 events, and discard the rest of them.
# retain only the top 20 destroyers
agg = agg[1:20, ]
The figure below summarizes the top 20 event types causing maximum loss to property, including crop damage, from weather disasters:
# Plot a bar graph of top 20 loss makers y tick positions
ypos = c(0, 5e+10, 1e+11, 1.5e+11)
ggplot(agg, aes(x = evtype, y = loss)) + geom_bar(stat = "identity", fill = "tomato") +
scale_x_discrete(limits = agg$evtype[order(desc(agg$loss))]) + xlab("Event Type") +
ylab("Economic Loss") + ggtitle("Property & crop loss from disasters") +
scale_y_continuous(breaks = ypos, labels = paste("$", ypos/1e+09, "B")) +
theme(axis.text.y = element_text(face = "bold", colour = "black", size = rel(1.2))) +
theme(axis.text.x = element_text(face = "bold", colour = "black", size = rel(1.2))) +
coord_flip()
There were a total of 155,673 cases of either death or personal injuries in the United States due to adverse weather events during the period of report. The top 3 causes of human injury or deaths are Tornados, excessive heat and thunderstorm winds.
As for economic loss, a total of 476.4 Billion dollars worth of property and crop was damaged in the events. The top 3 most damaging event types were floods, hurricane/typhoons and tornados.
A preliminary study of loss to life and property arising out of adverse weather events throws light on the nature and severity of these events. This points to the importance of a robust disaster warning and management system that can minimize the damage to life and property.