Economic and health impact on severe storm & weather events recorded in the NOAA Storm Data set

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

In this document we analyse which type of storm & weather events recorded in the NOAA storm data are most harmful with respect to population health and have the greatest economic consequences. We extract the relevant data from the data set, perform a ranking and determine the top 5 highest ranking events orderd by severity. We find that tornados have the most severe impact on population health and that floods have the highest economic impact.

In addition, we analyse a possible correlation between health impact and economic impact. We find, that there is a good case for a positive linear dependency between these two categories.

Data Processing

We define:

  • The ‘Total health damage’ as ‘Total number of fatalities’ + ‘Total number of Injuries’.
  • The ‘Total economic damage’ as ‘Total amount of property damage’ +‘Total amount of crop damage’

We define the data set for describing the total number of health damage by seleting the data fields EVTYPE, FATALITIES, and INJURIES and the total number of economic damage by selecting the data fields EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP, PROPExponent, CROPExponent. In both cases the field EVTYPE describes the type of storm & weather event.

Load libraries
library(dplyr)
library(plyr)
library(ggplot2)
library(knitr)
library(xtable)

opts_chunk$set(echo=TRUE)
Load dataset
stormdata<-read.csv("repdata-data-StormData.csv.bz2")

Note: dataset downloaded from Mar 18th, 2015.

Prepare dataset for analysis: set scale for amount damaged caused

The data field PROPDMGEXP contains the scale of the amount value in field PROPDMG given by the exponent number base 10. Similary, the data field CROPDMGEXP contains the sclale of the amount value in field CROPDMG given by the exponent number base 10. The coding is as follows:

stormdata$PROPExponent <- mapvalues(stormdata$PROPDMGEXP, levels(stormdata$PROPDMGEXP[1]), 
                                    c(1, 1, 1, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 2, 2, 3, 6, 6))
stormdata$CROPExponent <- mapvalues(stormdata$CROPDMGEXP, levels(stormdata$CROPDMGEXP[1]), 
                                    c(1, 1, 0, 2, 9, 3, 3, 6, 6))
Transform property damage and crop damage based to common scale

No rescale the amount fields to one scale.

stormdata_sub_cash<-transform(stormdata_sub_cash,PropAmtDmg=PROPDMG*10^as.numeric(PROPExponent))
stormdata_sub_cash<-transform(stormdata_sub_cash,CropAmtDmg=CROPDMG*10^as.numeric(CROPExponent))
Aggregate data set by event type
stormdata_sub_aggr_harm<-aggregate(x = stormdata_sub_harm[c("FATALITIES","INJURIES")],
                                     FUN = sum, na.rm=TRUE,
                                     by = list(group.EVTYPE = stormdata_sub_harm$EVTYPE))

stormdata_sub_aggr_cash<-aggregate(x = stormdata_sub_cash[c("PropAmtDmg","CropAmtDmg")],
                              FUN = sum, na.rm=TRUE,
                              by = list(group.EVTYPE = stormdata_sub_cash$EVTYPE))
Calculate totals

Rescale to amount to ‘hundreds’ for the health data and ‘millions’ for the economic data.

stormdata_sub_aggr_harm<-mutate(stormdata_sub_aggr_harm, total.harm = (FATALITIES+INJURIES)*10^(-3))
stormdata_sub_aggr_cash<-mutate(stormdata_sub_aggr_cash, total.AmtDmg = (PropAmtDmg+CropAmtDmg)*10^(-9))
Filter out rows with 0 totals
stormdata_sub_aggr_filter_harm<-filter(stormdata_sub_aggr_harm, total.harm!=0)
stormdata_sub_aggr_filter_cash<-filter(stormdata_sub_aggr_cash, total.AmtDmg!=0)
Add ranking
stormdata_sub_aggr_filter_harm$Ranking <- ave( stormdata_sub_aggr_filter_harm$total.harm, FUN=rank )
stormdata_sub_aggr_filter_cash$Ranking <- ave( stormdata_sub_aggr_filter_cash$total.AmtDmg, FUN=rank )
Normalize ranking
stormdata_sub_aggr_filter_harm$Ranking <- stormdata_sub_aggr_filter_harm$Ranking/max(stormdata_sub_aggr_filter_harm$Ranking)
stormdata_sub_aggr_filter_cash$Ranking <- stormdata_sub_aggr_filter_cash$Ranking/max(stormdata_sub_aggr_filter_cash$Ranking)
List top 5 storm types

After ranking and normalizing the data we can extract the top 5 most damaging storm events:

print(xtable(arrange(stormdata_sub_aggr_filter_harm, desc(total.harm))[1:5,c(1,4,5)], caption = "Top 5 most health hazardous strom events"), type="html", comment=F)
Top 5 most health hazardous strom events
group.EVTYPE total.harm Ranking
1 TORNADO 96.98 1.00
2 EXCESSIVE HEAT 8.43 1.00
3 TSTM WIND 7.46 0.99
4 FLOOD 7.26 0.99
5 LIGHTNING 6.05 0.98
print(xtable(arrange(stormdata_sub_aggr_filter_cash, desc(total.AmtDmg))[1:5,c(1,4,5)], caption = "Top 5 most economical hazardous strom events"), type="html", comment=F)
Top 5 most economical hazardous strom events
group.EVTYPE total.AmtDmg Ranking
1 FLOOD 1468.33 1.00
2 HURRICANE/TYPHOON 694.52 1.00
3 TORNADO 579.74 1.00
4 STORM SURGE 433.24 0.99
5 HAIL 217.51 0.99

Correlation analysis of health vs. economic damage

Having isolated the top 5 most hazardous storm events, we can look at a possible correlation between sorm events which cause health damahe and strom events which cause economic damage. We combine the two datasets by the normalizes ranking number.

Merge health and economic data frames
stormdata_sub_aggr_filter_merge<-merge(stormdata_sub_aggr_filter_harm,stormdata_sub_aggr_filter_cash,by="group.EVTYPE")
Plot health damage vs. economic damage

Now we plot a scatter plot health damage vs. economic damage and add a linear fit line.

ggplot(data=stormdata_sub_aggr_filter_merge, aes(x=Ranking.x, y=Ranking.y)) +
  geom_point(stat="identity",shape=1,) +
#  facet_grid(. ~ supp) +
  geom_smooth(method=lm) +
  xlab("Health damage") +
  ylab("Economic damage") +
  guides(fill=guide_legend(title="Supplement type"))

Perform linear fit analysis on data

In order to determine the goodness of the fit, we perform a linear regression analysis. The result is as follows:

### Combine lists two one data frame conatining all results
fit <- lm(Ranking.x  ~  Ranking.y, data=stormdata_sub_aggr_filter_merge)
print(xtable(summary(fit), caption = "Fit summary"), type="html", comment=F)
Fit summary
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1756 0.0529 3.32 0.0011
Ranking.y 0.5710 0.0740 7.71 0.0000
Goodness of fit analysis

According to the summary and the plots of the linear fit, we can conclude, that there is indeed a poisitive linear correlation between health damage and. economic damage.

print(xtable(anova(fit), caption = "Top 5 most economical hazardous strom events"), type="html", comment=F)
Top 5 most economical hazardous strom events
Df Sum Sq Mean Sq F value Pr(>F)
Ranking.y 1 3.70 3.70 59.48 0.0000
Residuals 161 10.03 0.06
par(mfrow=c(2,2))
plot(fit)

Results

We found that the top 5 most storm & weather events which are most harmful with respect to population health are: 1. Tornado, 2. Excessive Heat, 3. TSTM Wind, 4. Flood, and 5. Lightning (see table above). We also found that that the top 5 most storm & weather events which have the greatest economic consequencesa are: 1. Flood, 2. Hurricane/Typhoon, 3. Tornado, 4. Storm Surge, and 5. Hail.

In addition we investigated the question if the health and economic impact of the strom & weather events are correlated. We conducted a linear fit analysis (see section above) and conclude that there is a positive linear correlation bewteen the two categories (see scatter plot above). This suggests, that weather events which are hazerdous to the health of the pupulation also will have a strong effect on the economy.