Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration's storm database to answer the following questions: 1. Across the United State, which types of events (as indicated by EVTYPE variable) are most harmful with respect to population health ? 2. Across the United States, which types of events have the greatest economic consequences ?
The storm data is available at :
There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
We first read in the data from the raw csv file included in the zip archive. The data is a comma separated file.
suppressPackageStartupMessages(library("R.utils"))
suppressPackageStartupMessages(library("ggplot2"))
suppressPackageStartupMessages(library("data.table"))
setwd("/Users/murtuza/coursera/reproducibleresearch/Peer_Assessment_2")
if (!file.exists("StormData.csv")) {
bunzip2("repdata-data-StormData.csv.bz2", "StormData.csv", remove = FALSE)
}
stormData <- read.csv("StormData.csv")
After reading in the data we check the columns and the first few rows in this dataset
dim(stormData)
## [1] 902297 37
names(stormData)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
head(stormData[, c("STATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP",
"CROPDMG", "CROPDMGEXP")])
## STATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 AL TORNADO 0 15 25.0 K 0
## 2 AL TORNADO 0 0 2.5 K 0
## 3 AL TORNADO 0 2 25.0 K 0
## 4 AL TORNADO 0 2 2.5 K 0
## 5 AL TORNADO 0 2 2.5 K 0
## 6 AL TORNADO 0 6 2.5 K 0
We can see that it has 902297 rows. The columns of interest for us are PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP, FATALITIES and INJURIES.
To answer the first question:
We look at the Fatalities and Injuries columns. Since both Fatalities and Injuries are harmful, we get a summation of both the columns per event type.
harm <- apply(stormData[, c("FATALITIES", "INJURIES")], 1, function(x) {
x[1] + x[2]
})
stormData <- cbind(stormData, POPULATIONHARM = harm)
aggHarmData <- aggregate(stormData$POPULATIONHARM, by = list(stormData$EVTYPE),
FUN = sum)
setnames(aggHarmData, c("Group.1", "x"), c("evtype", "harm"))
To answer the second question:
We look at the PROPDMG and CROPDMG columns. These two columns also have corresponding EXP columns, which specify the multiplier of the dollar amount. Since the multiplier is specified as
levels(stormData$PROPDMGEXP)
## [1] "" "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
We will convert them to correct numeric values so that we can get the total damage value in dollars.
pmultiplier <- c(1, 1, 1, 1, 1, 10, 10^2, 10^3, 10^4, 10^5, 10^6, 10^7, 10^8,
10^9, 10^2, 10^2, 10^3, 10^6, 10^6)
cmultiplier <- c(1, 1, 1, 10^2, 10^9, 10^3, 10^3, 10^6, 10^6)
stormData <- cbind(stormData, PMULTIPLIER = rep(1, nrow(stormData)), CMULTIPLIER = rep(1,
nrow(stormData)))
l <- levels(stormData$PROPDMGEXP)
stormData$PMULTIPLIER <- apply(stormData[, c("PROPDMGEXP", "PMULTIPLIER")],
1, function(x) {
pmultiplier[which(l == x[1])]
})
l <- levels(stormData$CROPDMGEXP)
stormData$CMULTIPLIER <- apply(stormData[, c("CROPDMGEXP", "CMULTIPLIER")],
1, function(x) {
cmultiplier[which(l == x[1])]
})
Now the total damage is property damage + crop damage. We will compute that value and use that to determine which events cause the most damage in dollar amount.
cost <- (stormData$PROPDMG * stormData$PMULTIPLIER) + (stormData$CROPDMG * stormData$CMULTIPLIER)
stormData <- cbind(stormData, TOTALDMG = cost)
aggDmgData <- aggregate(stormData$TOTALDMG, by = list(stormData$EVTYPE), FUN = sum)
setnames(aggDmgData, c("Group.1", "x"), c("evtype", "dmg"))
Now let's look at the top 10 events that cause the most harm in terms of fatalities + injuries
head(aggHarmData[base::order(-aggHarmData$harm), ], 10)
## evtype harm
## 834 TORNADO 96979
## 130 EXCESSIVE HEAT 8428
## 856 TSTM WIND 7461
## 170 FLOOD 7259
## 464 LIGHTNING 6046
## 275 HEAT 3037
## 153 FLASH FLOOD 2755
## 427 ICE STORM 2064
## 760 THUNDERSTORM WIND 1621
## 972 WINTER STORM 1527
Let's plot the most harmful events
q <- qplot(evtype, harm, data = head(aggHarmData[base::order(-aggHarmData$harm),
], 10), xlab = "Event Type", ylab = "Fatalities + Injuries", main = "Population Damage by Natural Calamities in U.S. between 1950 and 2011")
q + theme(axis.text.x = element_text(angle = 90, hjust = 1))
Let's look at the top 10 events that cause most property damage
head(aggDmgData[base::order(-aggDmgData$dmg), ], 10)
## evtype dmg
## 170 FLOOD 1.503e+11
## 411 HURRICANE/TYPHOON 7.191e+10
## 834 TORNADO 5.736e+10
## 670 STORM SURGE 4.332e+10
## 244 HAIL 1.876e+10
## 153 FLASH FLOOD 1.824e+10
## 95 DROUGHT 1.502e+10
## 402 HURRICANE 1.461e+10
## 590 RIVER FLOOD 1.015e+10
## 427 ICE STORM 8.967e+09
Let's plot the property damage against the natural calamities
q <- qplot(evtype, dmg, data = head(aggDmgData[base::order(-aggDmgData$dmg),
], 10), xlab = "Event Type", ylab = "Property + Crop Damage in Dollars",
main = "Property Damage by Natural Calamities in U.S. between 1950 and 2011")
q + theme(axis.text.x = element_text(angle = 90, hjust = 1))