Synopsis

In this report, we will analyze the economic and health consequences of extreme weather conditions, such as tornados and rain storms. We will use the NOAA Storm Database, which contains records from 1950 to Nov 2011, to address two basic questions: 1) which events cause the most negative impact on people’s health and 2) which event most impact economic activities. The results can potentially be useful for governmental bodies to plan and respond to such events in the future.

Data Processing

Since the data is relatively large (46.9 MB), it’d useful if we could download it directly from the internet and then read it into R. This could be done with the commands below:

temp <- tempfile()
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",temp)
Stormdata <- read.csv(bzfile(temp))
unlink(temp)

Let’s quickly visualize the data

head(Stormdata)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6
dim(Stormdata)
## [1] 902297     37

We can see that it’s a large dataset, with 902297 rows and 37 columns. We’ll need to process it before conducting the analyses.

In order to answer the first question, we’ll only need the events related to it. We can use the aggregate function:

pop.health <- aggregate(c(Stormdata$FATALITIES), by = list(Stormdata$EVTYPE), "sum")
colnames(pop.health) <- c("event", "fatalaties")
pop.health <- cbind(pop.health, injuries = aggregate(c(Stormdata$INJURIES), by = list(Stormdata$EVTYPE),
    "sum")$x)

Now, we have just three columns in our dataset:

head(pop.health)
##                   event fatalaties injuries
## 1    HIGH SURF ADVISORY          0        0
## 2         COASTAL FLOOD          0        0
## 3           FLASH FLOOD          0        0
## 4             LIGHTNING          0        0
## 5             TSTM WIND          0        0
## 6       TSTM WIND (G45)          0        0

Since there’re many gaps in the dataset, we can jsut remove rows with 0’s in both columns:

pop.health <- pop.health[pop.health$fatalaties > 0 | pop.health$injuries > 0, ]
rownames(pop.health) <- NULL
  1. Which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

To answer this question, we don’t need any analysis, we just need to sort the events by their greatest injuries and fatalities:

pop.health <- head(pop.health[order(pop.health$fatalaties, pop.health$injuries, decreasing = T), ])
pop.health
##              event fatalaties injuries
## 184        TORNADO       5633    91346
## 32  EXCESSIVE HEAT       1903     6525
## 42     FLASH FLOOD        978     1777
## 69            HEAT        937     2100
## 123      LIGHTNING        816     5230
## 191      TSTM WIND        504     6957

Now, let’s plot the data as a stacked columns plot that contains both the injuries and fatalities:

plotHealth <- as.matrix(pop.health[, c("fatalaties", "injuries")])
rownames(plotHealth) <- pop.health$event
t(plotHealth)
##            TORNADO EXCESSIVE HEAT FLASH FLOOD HEAT LIGHTNING TSTM WIND
## fatalaties    5633           1903         978  937       816       504
## injuries     91346           6525        1777 2100      5230      6957
par(oma = c(4, 1, 0, 1))
barplot(height = t(plotHealth), width = 1, col = 12:13, legend.text = c("Fatalaties", 
    "Injuries"), main = "Events Most Harmful to Population Health", 
    ylab = "NUmber of People Affected", las = 3)

  1. which types of events have the greatest economic consequences? Since there were no limits to the economic losses, most of them were of lower impacts. When trying to answer the question, those might bias the results. Therefore, we have to remove them from the data.
econ.impact <- table(Stormdata$EVTYPE[Stormdata$PROPDMGEXP == "B"])
sort(econ.impact[econ.impact > 1], decreasing = T)
## 
## HURRICANE/TYPHOON             FLOOD         HURRICANE           TORNADO 
##                12                 5                 3                 3 
##    HURRICANE OPAL       STORM SURGE 
##                 2                 2
plotting <- econ.impact[econ.impact > 1]

After modifying the data to account for the bias, we have that the six climatic events that caused the most economic losses (to the scale of billons of US dollars) are those in the figure below:

barplot(rev(sort(plotting)), legend.text = rownames(rev(sort(plotting))), col = 51:56, 
    axisnames = F, ylab = "Number of Billion USD Events", main = "Events with the Greatest Economic Losses")

Results

As we can see from the data, tornadoes are by far the most dangerous climate event to harm people, killing at least 9.134610^{4} people and injuring more 5633.

The climatic events that most affect economic activities in the US are more or less the same that also cause the most catastrophic losses to human health, i.e., windy events such as hurricances.