Here we present a quantitative analysis of the damages produced bt adverse weather conditions in the US. The timeline covers between January 1950 and November 2014. Our attention will be centered in two parameters of danger: mortality (i.e, number of casualties and injuries) and destructivity (i.e, the monetary value of the properties destroyed).
During this analysis we’ll make use of the dplyr package.
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
The original dataset was downloaded from the Storm Events Database.
filename <- 'repdata-data-StormData.csv.bz2'
raw <- read.csv(bzfile(filename))
It’s always good to take a look at the data and its dimensions:
dim(raw)
## [1] 902297 37
head(raw, 1)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14 100 3 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
In order to answer the adressed questions, we only need a subset of columns. Let’s filter them:
cleanData <- select(raw, EVTYPE, FATALITIES, INJURIES, PROPDMG, CROPDMG)
head(cleanData, 1)
## EVTYPE FATALITIES INJURIES PROPDMG CROPDMG
## 1 TORNADO 0 15 25 0
Here we summarize the fatalities and injuries into a single column, and do the same for the damages in properties and crops. In both cases, the process involves summing both values.
cleanData <- mutate(cleanData, PERSONALDMG = FATALITIES + INJURIES, MATERIALDMG = PROPDMG + CROPDMG)
head(cleanData, 1)
## EVTYPE FATALITIES INJURIES PROPDMG CROPDMG PERSONALDMG MATERIALDMG
## 1 TORNADO 0 15 25 0 15 25
From now on, we’ll use only this summarized data.
damages <- select(cleanData, EVTYPE, PERSONALDMG, MATERIALDMG)
head(damages, 1)
## EVTYPE PERSONALDMG MATERIALDMG
## 1 TORNADO 15 25
In order to answer both questions, we’d like to collapse all rows by event type summing all the destruction rates.
sumPersonalDamagesPerEvent <- aggregate(damages$PERSONALDMG, list(damages$EVTYPE), FUN = sum, rm.na = TRUE)
names(sumPersonalDamagesPerEvent) <- c("EVTYPE", "PERSONALDMG")
sumMaterialDamagesPerEvent <- aggregate(damages$MATERIALDMG, list(damages$EVTYPE), FUN = sum, rm.na = TRUE)
names(sumMaterialDamagesPerEvent) <- c("EVTYPE", "MATERIALDMG")
sumDamagesPerEvent <- merge(sumPersonalDamagesPerEvent, sumMaterialDamagesPerEvent)
head(sumDamagesPerEvent)
## EVTYPE PERSONALDMG MATERIALDMG
## 1 HIGH SURF ADVISORY 1 201
## 2 COASTAL FLOOD 1 1
## 3 FLASH FLOOD 1 51
## 4 LIGHTNING 1 1
## 5 TSTM WIND 1 109
## 6 TSTM WIND (G45) 1 9
In order to determine the most harmful events with respect to population health, we first sort the data using PERSONALDMG as ordering parameter.
arrangedByMortality <- arrange(sumDamagesPerEvent, desc(PERSONALDMG))
head(arrangedByMortality)
## EVTYPE PERSONALDMG MATERIALDMG
## 1 TORNADO 96980 3312277.7
## 2 EXCESSIVE HEAT 8429 1955.4
## 3 TSTM WIND 7462 1445169.2
## 4 FLOOD 7260 1067977.4
## 5 LIGHTNING 6047 606933.4
## 6 HEAT 3038 962.2
It’s useful to represent graphically the most dangerous types of event, so we can get a measure of its relative importance:
barplot(arrangedByMortality$PERSONALDMG[1:6], main = "Mortality of adverse weather events",
names.arg = arrangedByMortality$EVTYPE[1:6],
xlab = "Event type", ylab = "Casualties + Injuries", las = 2)
In order to determine the most harmful events with respect to economic consequences, we first sort the data using MATERIALDMG as ordering. parameter
arrangedByDestructivity <- arrange(sumDamagesPerEvent, desc(MATERIALDMG))
head(arrangedByDestructivity)
## EVTYPE PERSONALDMG MATERIALDMG
## 1 TORNADO 96980 3312277.7
## 2 FLASH FLOOD 2756 1599326.1
## 3 TSTM WIND 7462 1445169.2
## 4 HAIL 1377 1268290.7
## 5 FLOOD 7260 1067977.4
## 6 THUNDERSTORM WIND 1622 943636.6
It’s useful to represent graphically the most destructive types of event, so we can get a measure of its relative importance:
barplot(arrangedByDestructivity$MATERIALDMG[1:6], main = "Destructivity of adverse weather events",
names.arg = arrangedByDestructivity$EVTYPE[1:6],
xlab = "Event type", ylab = "Monetary loss", las = 2)