Synopsis

Here we present a quantitative analysis of the damages produced bt adverse weather conditions in the US. The timeline covers between January 1950 and November 2014. Our attention will be centered in two parameters of danger: mortality (i.e, number of casualties and injuries) and destructivity (i.e, the monetary value of the properties destroyed).

Data processing

Load the required libraries

During this analysis we’ll make use of the dplyr package.

library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Unzip and load the raw data

The original dataset was downloaded from the Storm Events Database.

filename <- 'repdata-data-StormData.csv.bz2'
raw <- read.csv(bzfile(filename))

It’s always good to take a look at the data and its dimensions:

dim(raw)
## [1] 902297     37
head(raw, 1)
##   STATE__          BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1 4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                        14   100 3   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15      25          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1

Subsetting the raw data

In order to answer the adressed questions, we only need a subset of columns. Let’s filter them:

cleanData <- select(raw, EVTYPE, FATALITIES, INJURIES, PROPDMG, CROPDMG)
head(cleanData, 1)
##    EVTYPE FATALITIES INJURIES PROPDMG CROPDMG
## 1 TORNADO          0       15      25       0

Processing the clean data

Here we summarize the fatalities and injuries into a single column, and do the same for the damages in properties and crops. In both cases, the process involves summing both values.

cleanData <- mutate(cleanData, PERSONALDMG = FATALITIES + INJURIES, MATERIALDMG = PROPDMG + CROPDMG)
head(cleanData, 1)
##    EVTYPE FATALITIES INJURIES PROPDMG CROPDMG PERSONALDMG MATERIALDMG
## 1 TORNADO          0       15      25       0          15          25

From now on, we’ll use only this summarized data.

damages <- select(cleanData, EVTYPE, PERSONALDMG, MATERIALDMG)
head(damages, 1)
##    EVTYPE PERSONALDMG MATERIALDMG
## 1 TORNADO          15          25

In order to answer both questions, we’d like to collapse all rows by event type summing all the destruction rates.

sumPersonalDamagesPerEvent <- aggregate(damages$PERSONALDMG, list(damages$EVTYPE), FUN = sum, rm.na = TRUE)
names(sumPersonalDamagesPerEvent) <- c("EVTYPE", "PERSONALDMG")

sumMaterialDamagesPerEvent <- aggregate(damages$MATERIALDMG, list(damages$EVTYPE), FUN = sum, rm.na = TRUE)
names(sumMaterialDamagesPerEvent) <- c("EVTYPE", "MATERIALDMG")

sumDamagesPerEvent <- merge(sumPersonalDamagesPerEvent, sumMaterialDamagesPerEvent)
head(sumDamagesPerEvent)
##                  EVTYPE PERSONALDMG MATERIALDMG
## 1    HIGH SURF ADVISORY           1         201
## 2         COASTAL FLOOD           1           1
## 3           FLASH FLOOD           1          51
## 4             LIGHTNING           1           1
## 5             TSTM WIND           1         109
## 6       TSTM WIND (G45)           1           9

Results

Question 1

In order to determine the most harmful events with respect to population health, we first sort the data using PERSONALDMG as ordering parameter.

arrangedByMortality <- arrange(sumDamagesPerEvent, desc(PERSONALDMG))
head(arrangedByMortality)
##           EVTYPE PERSONALDMG MATERIALDMG
## 1        TORNADO       96980   3312277.7
## 2 EXCESSIVE HEAT        8429      1955.4
## 3      TSTM WIND        7462   1445169.2
## 4          FLOOD        7260   1067977.4
## 5      LIGHTNING        6047    606933.4
## 6           HEAT        3038       962.2

It’s useful to represent graphically the most dangerous types of event, so we can get a measure of its relative importance:

barplot(arrangedByMortality$PERSONALDMG[1:6], main = "Mortality of adverse weather events",
        names.arg = arrangedByMortality$EVTYPE[1:6],
        xlab = "Event type", ylab = "Casualties + Injuries", las = 2)

Question 2

In order to determine the most harmful events with respect to economic consequences, we first sort the data using MATERIALDMG as ordering. parameter

arrangedByDestructivity <- arrange(sumDamagesPerEvent, desc(MATERIALDMG))
head(arrangedByDestructivity)
##              EVTYPE PERSONALDMG MATERIALDMG
## 1           TORNADO       96980   3312277.7
## 2       FLASH FLOOD        2756   1599326.1
## 3         TSTM WIND        7462   1445169.2
## 4              HAIL        1377   1268290.7
## 5             FLOOD        7260   1067977.4
## 6 THUNDERSTORM WIND        1622    943636.6

It’s useful to represent graphically the most destructive types of event, so we can get a measure of its relative importance:

barplot(arrangedByDestructivity$MATERIALDMG[1:6], main = "Destructivity of adverse weather events",
        names.arg = arrangedByDestructivity$EVTYPE[1:6],
        xlab = "Event type", ylab = "Monetary loss", las = 2)