Severe Weather - Health and Economic Consequences

Synopsis

In this report an analysis of the impact of severe weather on public health and finanacial damages in the United States is conducted. The data is from the National Weather Service, the definition of variables is unchanged. The data span from 1950 to 2011, attention is give to aggregate impacts of severe weather, no consideration is given to intertemporal changes. Evidence shows that Tornados are responsible for most deaths and injuries, while flood has the greatest economic consequence. While economic consequeces are easy to measure and aggregate across measures (Property and crop damages in US$), health impacts (fatalities and injuries) are impossible to combine in a meaninful fashion.

Data Processing

We start by loading the libraries and data, using the function

library(ggplot2)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

mydata <- read.csv("repdata%2Fdata%2FStormData.csv.bz2")

A quick check can be done to see the relative importance of missing data.

mean(is.na(mydata))

## [1] 0.05229737

Missing data amount to only 5% of our data points.

We start by determining which events caused the most fatalities and injuries, for this we need to group the variables by event type and sum the corresponding fatalities and injuries.

health_data <- mydata %>% group_by(EVTYPE) %>% summarise(FATALITIES=sum(FATALITIES), INJURIES=sum(INJURIES))
health_data <- arrange(health_data, desc(FATALITIES))
head(health_data)

## Source: local data frame [6 x 3]
## 
##           EVTYPE FATALITIES INJURIES
##           (fctr)      (dbl)    (dbl)
## 1        TORNADO       5633    91346
## 2 EXCESSIVE HEAT       1903     6525
## 3    FLASH FLOOD        978     1777
## 4           HEAT        937     2100
## 5      LIGHTNING        816     5230
## 6      TSTM WIND        504     6957

We’ll create a subset of our dataframe composed only of the top 5 fatality causes, obtained from the previous table.

health_subset <- subset(mydata, EVTYPE == "EXCESSIVE HEAT" | EVTYPE=="TORNADO" | EVTYPE =="FLASH FLOOD" | EVTYPE=="HEAT" | EVTYPE=="LIGHTNING")
health_subset$EVTYPE <- factor(health_subset$EVTYPE)

We now process the data to analyze economic consequences of severe weather. We start by subseting our data frame to include only measures for which we have the dollar amount in thousands, millions or billions.

prop_subset <- subset(mydata, PROPDMGEXP == "M"| PROPDMGEXP == "B"| PROPDMGEXP=="K" | PROPDMGEXP == "m"|PROPDMGEXP == "b" | PROPDMGEXP == "k")

The next step is to convert the amounts to a common dollar measure, namely billions.

levels(prop_subset$PROPDMGEXP)[levels(prop_subset$PROPDMGEXP)=='B'] <- 1
levels(prop_subset$PROPDMGEXP)[levels(prop_subset$PROPDMGEXP)=='b'] <- 1

levels(prop_subset$PROPDMGEXP)[levels(prop_subset$PROPDMGEXP)=='M'] <- 1/1000
levels(prop_subset$PROPDMGEXP)[levels(prop_subset$PROPDMGEXP)=='m'] <- 1/1000

levels(prop_subset$PROPDMGEXP)[levels(prop_subset$PROPDMGEXP)=='K'] <- 1/1000000
levels(prop_subset$PROPDMGEXP)[levels(prop_subset$PROPDMGEXP)=='k'] <- 1/1000000

The variable inherits the factor class, now it’s changed to numeric.

prop_subset$PROPDMGEXP <- as.numeric(levels(prop_subset$PROPDMGEXP))[prop_subset$PROPDMGEXP]

## Warning: NAs introduced by coercion

Finally, we obtain the total amount of propriety damage in billions of dollars.

prop_subset$PROPDAMAGE <- prop_subset$PROPDMG*prop_subset$PROPDMGEXP

Now we can create a dataframe containing the sum of damages caused by each event type.

top_prop <- prop_subset %>% group_by(EVTYPE) %>% summarise(PROP_DAMAGE = sum(PROPDAMAGE))
top_prop <- arrange(top_prop, desc(PROP_DAMAGE))

We replicate the above manipulations to obtain an analogous data frame for crop damages.

crop_subset <- subset(mydata, CROPDMGEXP == "B"| CROPDMGEXP == "M"|CROPDMGEXP == "K" |CROPDMGEXP == "b"|CROPDMGEXP == "m"|CROPDMGEXP == "k")
levels(crop_subset$CROPDMGEXP)[levels(crop_subset$CROPDMGEXP)=='B'] <- 1
levels(crop_subset$CROPDMGEXP)[levels(crop_subset$CROPDMGEXP)=='b'] <- 1

levels(crop_subset$CROPDMGEXP)[levels(crop_subset$CROPDMGEXP)=='M'] <- 1/1000
levels(crop_subset$CROPDMGEXP)[levels(crop_subset$CROPDMGEXP)=='m'] <- 1/1000

levels(crop_subset$CROPDMGEXP)[levels(crop_subset$CROPDMGEXP)=='K'] <- 1/1000000
levels(crop_subset$CROPDMGEXP)[levels(crop_subset$CROPDMGEXP)=='k'] <- 1/1000000

crop_subset$CROPDMGEXP <- as.numeric(levels(crop_subset$CROPDMGEXP))[crop_subset$CROPDMGEXP]

## Warning: NAs introduced by coercion

crop_subset$CROPDAMAGE <- crop_subset$CROPDMG*crop_subset$CROPDMGEXP

top_crop <- crop_subset %>% group_by(EVTYPE) %>% summarise(CROP_DAMAGE = sum(CROPDAMAGE))
top_crop <- arrange(top_crop, desc(CROP_DAMAGE))

We now merge the two data frames, that’ll allow us to get the total damages.

top_damage <- merge(top_prop, top_crop)
top_damage$TOTAL_DAMAGE <- top_damage$PROP_DAMAGE + top_damage$CROP_DAMAGE
top_damage <- arrange(top_damage, desc(TOTAL_DAMAGE))

These are all the transformations needed to conduct the data analysis.

RESULTS

Health Consequences

We start by looking at the top causes of fatalities.

head(health_data)

##           EVTYPE FATALITIES INJURIES
## 1        TORNADO       5633    91346
## 2 EXCESSIVE HEAT       1903     6525
## 3    FLASH FLOOD        978     1777
## 4           HEAT        937     2100
## 5      LIGHTNING        816     5230
## 6      TSTM WIND        504     6957

Note that the table is ordered by Fatalities. How do these relate to Injuries?

cor(health_data$FATALITIES, health_data$INJURIES)

## [1] 0.9438341

The strong correlation suggests that events with high death rates will also have high injury rate.

We can take a look at summary statistics for fatalities and injuries.

summary(health_subset$FATALITIES, health_subset$INJURIES)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##   0.0000   0.0000   0.0000   0.0771   0.0000 583.0000

sd(health_subset$FATALITIES)

## [1] 1.951963

sd(health_subset$INJURIES)

## [1] 12.12004

It’s easy to see from the table on fatalities that Tornados are the number 1 cause of death by a large margin. Below we provide a scatterplot relating the number of injuries and fatalities caused by tornados.

subset1 <- subset(health_subset,  EVTYPE=="TORNADO")
qplot(FATALITIES, INJURIES, data = subset1, main = "Tornado fatalities and Injuries")

Next we examine a similar plot for the other 4 top causes of fatalities

subset2 <- subset(health_subset, EVTYPE == "EXCESSIVE HEAT" | EVTYPE =="FLASH FLOOD" | EVTYPE=="HEAT" | EVTYPE=="LIGHTNING")
qplot(FATALITIES, INJURIES, data = subset2, facets = .~EVTYPE, xlim = c(0,125), main = "Fatalities and Injuries by Event Type" )

## Warning: Removed 1 rows containing missing values (geom_point).

Note that we exclude one of the entries of “HEAT” which caused close to 600 fatalities for it lies completely outside of the range of the other variables.

Economic Consequences

We now proceed to analyze the economic impacts of severe weather. We start by taking a look a some summary statistics.

summary(prop_subset$PROPDMG)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.00    1.00   24.94   10.00 5000.00

sd(prop_subset$PROPDMG)

## [1] 83.65349

summary(crop_subset$PROPDMG)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.00    0.00   14.22    2.00 5000.00

sd(crop_subset$PROPDMG)

## [1] 68.4765

We can now take a look at the main causes of damage.

head(top_damage)

##              EVTYPE PROP_DAMAGE CROP_DAMAGE TOTAL_DAMAGE
## 1             FLOOD   144.65771   5.6619684    150.31968
## 2 HURRICANE/TYPHOON    69.30584   2.6078728     71.91371
## 3           TORNADO    56.93716   0.4149531     57.35211
## 4       STORM SURGE    43.32354   0.0000050     43.32354
## 5              HAIL    15.73227   3.0259544     18.75822
## 6       FLASH FLOOD    16.14081   1.4213171     17.56213

We can ilustrate the total damages using the following plot

barplot(top_damage[1:5,]$TOTAL_DAMAGE, names = c("FLOOD", "Hu./Ty.", "TORNADO", "STORM S.", "HAIL"), main = "Total Damage by Event Type, US$ bn" )

Conclusion

This brief analysis allow us to draw some quick conclusions on the impact of severe heather on health and economic issues. Caution must be exercised in reading these for we didn’t take into account the relative frequency of occurences, nor did we pay attention to the location and population density of the affected areas. Results here serve as a first step towards understanding some of the consequences of severe weather.