In this report, we will analyze the economic and health consequences of extreme weather conditions, such as tornados and rain storms. We will use the NOAA Storm Database, which contains records from 1950 to Nov 2011, to address two basic questions: 1) which events cause the most negative impact on people’s health and 2) which event most impact economic activities. The results can potentially be useful for governmental bodies to plan and respond to such events in the future.
Since the data is relatively large (46.9 MB), it’d useful if we could download it directly from the internet and then read it into R. This could be done with the commands below:
temp <- tempfile()
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",temp)
Stormdata <- read.csv(bzfile(temp))
unlink(temp)
Let’s quickly visualize the data
head(Stormdata)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
dim(Stormdata)
## [1] 902297 37
We can see that it’s a large dataset, with 902297 rows and 37 columns. We’ll need to process it before conducting the analyses.
In order to answer the first question, we’ll only need the events related to it. We can use the aggregate function:
pop.health <- aggregate(c(Stormdata$FATALITIES), by = list(Stormdata$EVTYPE), "sum")
colnames(pop.health) <- c("event", "fatalaties")
pop.health <- cbind(pop.health, injuries = aggregate(c(Stormdata$INJURIES), by = list(Stormdata$EVTYPE),
"sum")$x)
Now, we have just three columns in our dataset:
head(pop.health)
## event fatalaties injuries
## 1 HIGH SURF ADVISORY 0 0
## 2 COASTAL FLOOD 0 0
## 3 FLASH FLOOD 0 0
## 4 LIGHTNING 0 0
## 5 TSTM WIND 0 0
## 6 TSTM WIND (G45) 0 0
Since there’re many gaps in the dataset, we can jsut remove rows with 0’s in both columns:
pop.health <- pop.health[pop.health$fatalaties > 0 | pop.health$injuries > 0, ]
rownames(pop.health) <- NULL
To answer this question, we don’t need any analysis, we just need to sort the events by their greatest injuries and fatalities:
pop.health <- head(pop.health[order(pop.health$fatalaties, pop.health$injuries, decreasing = T), ])
pop.health
## event fatalaties injuries
## 184 TORNADO 5633 91346
## 32 EXCESSIVE HEAT 1903 6525
## 42 FLASH FLOOD 978 1777
## 69 HEAT 937 2100
## 123 LIGHTNING 816 5230
## 191 TSTM WIND 504 6957
Now, let’s plot the data as a stacked columns plot that contains both the injuries and fatalities:
plotHealth <- as.matrix(pop.health[, c("fatalaties", "injuries")])
rownames(plotHealth) <- pop.health$event
t(plotHealth)
## TORNADO EXCESSIVE HEAT FLASH FLOOD HEAT LIGHTNING TSTM WIND
## fatalaties 5633 1903 978 937 816 504
## injuries 91346 6525 1777 2100 5230 6957
par(oma = c(4, 1, 0, 1))
barplot(height = t(plotHealth), width = 1, col = 12:13, legend.text = c("Fatalaties",
"Injuries"), main = "Events Most Harmful to Population Health",
ylab = "NUmber of People Affected", las = 3)
econ.impact <- table(Stormdata$EVTYPE[Stormdata$PROPDMGEXP == "B"])
sort(econ.impact[econ.impact > 1], decreasing = T)
##
## HURRICANE/TYPHOON FLOOD HURRICANE TORNADO
## 12 5 3 3
## HURRICANE OPAL STORM SURGE
## 2 2
plotting <- econ.impact[econ.impact > 1]
After modifying the data to account for the bias, we have that the six climatic events that caused the most economic losses (to the scale of billons of US dollars) are those in the figure below:
barplot(rev(sort(plotting)), legend.text = rownames(rev(sort(plotting))), col = 51:56,
axisnames = F, ylab = "Number of Billion USD Events", main = "Events with the Greatest Economic Losses")
As we can see from the data, tornadoes are by far the most dangerous climate event to harm people, killing at least 9.134610^{4} people and injuring more 5633.
The climatic events that most affect economic activities in the US are more or less the same that also cause the most catastrophic losses to human health, i.e., windy events such as hurricances.