Analysis of human life and property damage by natural calamities between 1950 and 2011

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration's storm database to answer the following questions: 1. Across the United State, which types of events (as indicated by EVTYPE variable) are most harmful with respect to population health ? 2. Across the United States, which types of events have the greatest economic consequences ?

Data Processing

The storm data is available at :

Reading in the data

We first read in the data from the raw csv file included in the zip archive. The data is a comma separated file.

suppressPackageStartupMessages(library("R.utils"))
suppressPackageStartupMessages(library("ggplot2"))
suppressPackageStartupMessages(library("data.table"))
setwd("/Users/murtuza/coursera/reproducibleresearch/Peer_Assessment_2")
if (!file.exists("StormData.csv")) {
    bunzip2("repdata-data-StormData.csv.bz2", "StormData.csv", remove = FALSE)
}
stormData <- read.csv("StormData.csv")

After reading in the data we check the columns and the first few rows in this dataset

dim(stormData)
## [1] 902297     37
names(stormData)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
head(stormData[, c("STATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", 
    "CROPDMG", "CROPDMGEXP")])
##   STATE  EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1    AL TORNADO          0       15    25.0          K       0           
## 2    AL TORNADO          0        0     2.5          K       0           
## 3    AL TORNADO          0        2    25.0          K       0           
## 4    AL TORNADO          0        2     2.5          K       0           
## 5    AL TORNADO          0        2     2.5          K       0           
## 6    AL TORNADO          0        6     2.5          K       0

We can see that it has 902297 rows. The columns of interest for us are PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP, FATALITIES and INJURIES.

To answer the first question:

We look at the Fatalities and Injuries columns. Since both Fatalities and Injuries are harmful, we get a summation of both the columns per event type.

harm <- apply(stormData[, c("FATALITIES", "INJURIES")], 1, function(x) {
    x[1] + x[2]
})
stormData <- cbind(stormData, POPULATIONHARM = harm)
aggHarmData <- aggregate(stormData$POPULATIONHARM, by = list(stormData$EVTYPE), 
    FUN = sum)
setnames(aggHarmData, c("Group.1", "x"), c("evtype", "harm"))

To answer the second question:

We look at the PROPDMG and CROPDMG columns. These two columns also have corresponding EXP columns, which specify the multiplier of the dollar amount. Since the multiplier is specified as

levels(stormData$PROPDMGEXP)
##  [1] ""  "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"

We will convert them to correct numeric values so that we can get the total damage value in dollars.

pmultiplier <- c(1, 1, 1, 1, 1, 10, 10^2, 10^3, 10^4, 10^5, 10^6, 10^7, 10^8, 
    10^9, 10^2, 10^2, 10^3, 10^6, 10^6)
cmultiplier <- c(1, 1, 1, 10^2, 10^9, 10^3, 10^3, 10^6, 10^6)
stormData <- cbind(stormData, PMULTIPLIER = rep(1, nrow(stormData)), CMULTIPLIER = rep(1, 
    nrow(stormData)))
l <- levels(stormData$PROPDMGEXP)
stormData$PMULTIPLIER <- apply(stormData[, c("PROPDMGEXP", "PMULTIPLIER")], 
    1, function(x) {
        pmultiplier[which(l == x[1])]
    })
l <- levels(stormData$CROPDMGEXP)
stormData$CMULTIPLIER <- apply(stormData[, c("CROPDMGEXP", "CMULTIPLIER")], 
    1, function(x) {
        cmultiplier[which(l == x[1])]
    })

Now the total damage is property damage + crop damage. We will compute that value and use that to determine which events cause the most damage in dollar amount.

cost <- (stormData$PROPDMG * stormData$PMULTIPLIER) + (stormData$CROPDMG * stormData$CMULTIPLIER)
stormData <- cbind(stormData, TOTALDMG = cost)
aggDmgData <- aggregate(stormData$TOTALDMG, by = list(stormData$EVTYPE), FUN = sum)
setnames(aggDmgData, c("Group.1", "x"), c("evtype", "dmg"))

Results

Analysis of most harmful events w.r.t. human health

Now let's look at the top 10 events that cause the most harm in terms of fatalities + injuries

head(aggHarmData[base::order(-aggHarmData$harm), ], 10)
##                evtype  harm
## 834           TORNADO 96979
## 130    EXCESSIVE HEAT  8428
## 856         TSTM WIND  7461
## 170             FLOOD  7259
## 464         LIGHTNING  6046
## 275              HEAT  3037
## 153       FLASH FLOOD  2755
## 427         ICE STORM  2064
## 760 THUNDERSTORM WIND  1621
## 972      WINTER STORM  1527

Let's plot the most harmful events

q <- qplot(evtype, harm, data = head(aggHarmData[base::order(-aggHarmData$harm), 
    ], 10), xlab = "Event Type", ylab = "Fatalities + Injuries", main = "Population Damage by Natural Calamities in U.S. between 1950 and 2011")
q + theme(axis.text.x = element_text(angle = 90, hjust = 1))

plot of chunk unnamed-chunk-8

Analysis of most harmful events w.r.t. property damage

Let's look at the top 10 events that cause most property damage

head(aggDmgData[base::order(-aggDmgData$dmg), ], 10)
##                evtype       dmg
## 170             FLOOD 1.503e+11
## 411 HURRICANE/TYPHOON 7.191e+10
## 834           TORNADO 5.736e+10
## 670       STORM SURGE 4.332e+10
## 244              HAIL 1.876e+10
## 153       FLASH FLOOD 1.824e+10
## 95            DROUGHT 1.502e+10
## 402         HURRICANE 1.461e+10
## 590       RIVER FLOOD 1.015e+10
## 427         ICE STORM 8.967e+09

Let's plot the property damage against the natural calamities

q <- qplot(evtype, dmg, data = head(aggDmgData[base::order(-aggDmgData$dmg), 
    ], 10), xlab = "Event Type", ylab = "Property + Crop Damage in Dollars", 
    main = "Property Damage by Natural Calamities in U.S. between 1950 and 2011")
q + theme(axis.text.x = element_text(angle = 90, hjust = 1))

plot of chunk unnamed-chunk-10