Synopsis

The goal of this analysis is to address the following questions:

First, the data is processed and cleaned. Due to RAM constraints, the original dataset is condensed to only include relevant columns, and aggregated by event type. Then, a weighted ranking system is created to assess which event types are most harmful with respect to population health, and which event types have the greatest economic consequences. A ranking of the most harmful event types with respect to population health and economic damage are obtained.

Data Processing

The following code reads in the original csv.bz2 file and keeps only the columns necessary for the analysis.

if(!dir.exists("data")){
        dir.create("./data")
}

temp <- tempfile()
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = temp)
storm <- read.csv(temp)
dim(storm)
## [1] 902297     37
columns <- c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "CROPDMG")
storm <- storm[, columns]
dim(storm)
## [1] 902297      5

Then, as shown below, the dataset is aggregated across event type for each variable of interest. The summed values are then placed into a single dataframe.

#aggregating the dataset by our four variables of interest
fatalities <- aggregate(storm$FATALITIES, by = list(storm$EVTYPE), FUN = sum)
injuries <- aggregate(storm$INJURIES, by = list(storm$EVTYPE), FUN = sum)
property <- aggregate(storm$PROPDMG, by = list(storm$EVTYPE), FUN = sum)
crop <- aggregate(storm$CROPDMG, by = list(storm$EVTYPE), FUN = sum)
storm <- cbind(as.character(fatalities$Group.1), fatalities$x, injuries$x,
               property$x, crop$x)

storm <- data.frame(storm)
colnames(storm) <- c("event", "fatalities", "injuries", "property.dmg", "crop.dmg")
storm[2:5] <- lapply(storm[2:5], as.character)
storm[2:5] <- lapply(storm[2:5], as.numeric)
head(storm)
##                   event fatalities injuries property.dmg crop.dmg
## 1    HIGH SURF ADVISORY          0        0          200        0
## 2         COASTAL FLOOD          0        0            0        0
## 3           FLASH FLOOD          0        0           50        0
## 4             LIGHTNING          0        0            0        0
## 5             TSTM WIND          0        0          108        0
## 6       TSTM WIND (G45)          0        0            8        0
dim(storm)
## [1] 985   5

Methodology

In order to find the most harmful event types, I used a simple weighted ranking system. To find the most harmful event type in terms of health, the formula is: fatalities * 2 + injuries. The respective ranking formula in terms of economic damage is a sum of property damage and crop damage.

Results

A column is added to the dataset representing the health score, and the most harmful events with respect to population health are shown below in descending order.

storm$health <- storm$fatalities * 2 + storm$injuries
healthranked <- storm[order(-storm$health),]
healthranked[1:10,]
##                 event fatalities injuries property.dmg  crop.dmg health
## 834           TORNADO       5633    91346   3212258.16 100018.52 102612
## 130    EXCESSIVE HEAT       1903     6525      1460.00    494.40  10331
## 856         TSTM WIND        504     6957   1335965.61 109202.60   7965
## 170             FLOOD        470     6789    899938.48 168037.88   7729
## 464         LIGHTNING        816     5230    603351.78   3580.61   6862
## 275              HEAT        937     2100       298.50    662.70   3974
## 153       FLASH FLOOD        978     1777   1420124.59 179200.46   3733
## 427         ICE STORM         89     1975     66000.67   1688.95   2153
## 760 THUNDERSTORM WIND        133     1488    876844.17  66791.45   1754
## 972      WINTER STORM        206     1321    132720.59   1978.99   1733

The same is done with respect to economic consequences and the most economically harmful events are shown below.

storm$econ <- storm$property.dmg + storm$crop.dmg
econranked <- storm[order(-storm$econ),]
econranked[1:10,]
##                  event fatalities injuries property.dmg  crop.dmg health
## 834            TORNADO       5633    91346    3212258.2 100018.52 102612
## 153        FLASH FLOOD        978     1777    1420124.6 179200.46   3733
## 856          TSTM WIND        504     6957    1335965.6 109202.60   7965
## 244               HAIL         15     1361     688693.4 579596.28   1391
## 170              FLOOD        470     6789     899938.5 168037.88   7729
## 760  THUNDERSTORM WIND        133     1488     876844.2  66791.45   1754
## 464          LIGHTNING        816     5230     603351.8   3580.61   6862
## 786 THUNDERSTORM WINDS         64      908     446293.2  18684.93   1036
## 359          HIGH WIND        248     1137     324731.6  17283.21   1633
## 972       WINTER STORM        206     1321     132720.6   1978.99   1733
##          econ
## 834 3312276.7
## 153 1599325.1
## 856 1445168.2
## 244 1268289.7
## 170 1067976.4
## 760  943635.6
## 464  606932.4
## 786  464978.1
## 359  342014.8
## 972  134699.6

Below are bar plots to aid in visualization when comparing the event types.

library(ggplot2)

g = ggplot(data = healthranked[1:10,], aes(x = reorder(event, -health), y = health))
g = g + geom_bar(stat = "identity")  + xlab("event type") + ylab("health score") + ggtitle("10 Most Harmful Health Events")
g = g + theme(axis.text.x = element_text(angle = 90, hjust = 1))
g

g = ggplot(data = econranked[1:10,], aes(x = reorder(event, -econ), y = econ))
g = g + geom_bar(stat = "identity") + xlab("event type") + ylab("total economic damage (in US $) ") + ggtitle("10 Most Harmful Economic Events")
g = g + theme(axis.text.x = element_text(angle = 90, hjust = 1))
g

In conclusion, it seems that tornadoes are by far the most harmful event type with respect to population health and economic consequences.