Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
In this document we analyse which type of storm & weather events recorded in the NOAA storm data are most harmful with respect to population health and have the greatest economic consequences. We extract the relevant data from the data set, perform a ranking and determine the top 5 highest ranking events orderd by severity. We find that tornados have the most severe impact on population health and that floods have the highest economic impact.
In addition, we analyse a possible correlation between health impact and economic impact. We find, that there is a good case for a positive linear dependency between these two categories.
We define:
We define the data set for describing the total number of health damage by seleting the data fields EVTYPE, FATALITIES, and INJURIES and the total number of economic damage by selecting the data fields EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP, PROPExponent, CROPExponent. In both cases the field EVTYPE describes the type of storm & weather event.
library(dplyr)
library(plyr)
library(ggplot2)
library(knitr)
library(xtable)
opts_chunk$set(echo=TRUE)
stormdata<-read.csv("repdata-data-StormData.csv.bz2")
Note: dataset downloaded from Mar 18th, 2015.
The data field PROPDMGEXP contains the scale of the amount value in field PROPDMG given by the exponent number base 10. Similary, the data field CROPDMGEXP contains the sclale of the amount value in field CROPDMG given by the exponent number base 10. The coding is as follows:
stormdata$PROPExponent <- mapvalues(stormdata$PROPDMGEXP, levels(stormdata$PROPDMGEXP[1]),
c(1, 1, 1, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 2, 2, 3, 6, 6))
stormdata$CROPExponent <- mapvalues(stormdata$CROPDMGEXP, levels(stormdata$CROPDMGEXP[1]),
c(1, 1, 0, 2, 9, 3, 3, 6, 6))
No rescale the amount fields to one scale.
stormdata_sub_cash<-transform(stormdata_sub_cash,PropAmtDmg=PROPDMG*10^as.numeric(PROPExponent))
stormdata_sub_cash<-transform(stormdata_sub_cash,CropAmtDmg=CROPDMG*10^as.numeric(CROPExponent))
stormdata_sub_aggr_harm<-aggregate(x = stormdata_sub_harm[c("FATALITIES","INJURIES")],
FUN = sum, na.rm=TRUE,
by = list(group.EVTYPE = stormdata_sub_harm$EVTYPE))
stormdata_sub_aggr_cash<-aggregate(x = stormdata_sub_cash[c("PropAmtDmg","CropAmtDmg")],
FUN = sum, na.rm=TRUE,
by = list(group.EVTYPE = stormdata_sub_cash$EVTYPE))
Rescale to amount to ‘hundreds’ for the health data and ‘millions’ for the economic data.
stormdata_sub_aggr_harm<-mutate(stormdata_sub_aggr_harm, total.harm = (FATALITIES+INJURIES)*10^(-3))
stormdata_sub_aggr_cash<-mutate(stormdata_sub_aggr_cash, total.AmtDmg = (PropAmtDmg+CropAmtDmg)*10^(-9))
stormdata_sub_aggr_filter_harm<-filter(stormdata_sub_aggr_harm, total.harm!=0)
stormdata_sub_aggr_filter_cash<-filter(stormdata_sub_aggr_cash, total.AmtDmg!=0)
stormdata_sub_aggr_filter_harm$Ranking <- ave( stormdata_sub_aggr_filter_harm$total.harm, FUN=rank )
stormdata_sub_aggr_filter_cash$Ranking <- ave( stormdata_sub_aggr_filter_cash$total.AmtDmg, FUN=rank )
stormdata_sub_aggr_filter_harm$Ranking <- stormdata_sub_aggr_filter_harm$Ranking/max(stormdata_sub_aggr_filter_harm$Ranking)
stormdata_sub_aggr_filter_cash$Ranking <- stormdata_sub_aggr_filter_cash$Ranking/max(stormdata_sub_aggr_filter_cash$Ranking)
After ranking and normalizing the data we can extract the top 5 most damaging storm events:
print(xtable(arrange(stormdata_sub_aggr_filter_harm, desc(total.harm))[1:5,c(1,4,5)], caption = "Top 5 most health hazardous strom events"), type="html", comment=F)
| group.EVTYPE | total.harm | Ranking | |
|---|---|---|---|
| 1 | TORNADO | 96.98 | 1.00 |
| 2 | EXCESSIVE HEAT | 8.43 | 1.00 |
| 3 | TSTM WIND | 7.46 | 0.99 |
| 4 | FLOOD | 7.26 | 0.99 |
| 5 | LIGHTNING | 6.05 | 0.98 |
print(xtable(arrange(stormdata_sub_aggr_filter_cash, desc(total.AmtDmg))[1:5,c(1,4,5)], caption = "Top 5 most economical hazardous strom events"), type="html", comment=F)
| group.EVTYPE | total.AmtDmg | Ranking | |
|---|---|---|---|
| 1 | FLOOD | 1468.33 | 1.00 |
| 2 | HURRICANE/TYPHOON | 694.52 | 1.00 |
| 3 | TORNADO | 579.74 | 1.00 |
| 4 | STORM SURGE | 433.24 | 0.99 |
| 5 | HAIL | 217.51 | 0.99 |
Having isolated the top 5 most hazardous storm events, we can look at a possible correlation between sorm events which cause health damahe and strom events which cause economic damage. We combine the two datasets by the normalizes ranking number.
stormdata_sub_aggr_filter_merge<-merge(stormdata_sub_aggr_filter_harm,stormdata_sub_aggr_filter_cash,by="group.EVTYPE")
Now we plot a scatter plot health damage vs. economic damage and add a linear fit line.
ggplot(data=stormdata_sub_aggr_filter_merge, aes(x=Ranking.x, y=Ranking.y)) +
geom_point(stat="identity",shape=1,) +
# facet_grid(. ~ supp) +
geom_smooth(method=lm) +
xlab("Health damage") +
ylab("Economic damage") +
guides(fill=guide_legend(title="Supplement type"))
In order to determine the goodness of the fit, we perform a linear regression analysis. The result is as follows:
### Combine lists two one data frame conatining all results
fit <- lm(Ranking.x ~ Ranking.y, data=stormdata_sub_aggr_filter_merge)
print(xtable(summary(fit), caption = "Fit summary"), type="html", comment=F)
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 0.1756 | 0.0529 | 3.32 | 0.0011 |
| Ranking.y | 0.5710 | 0.0740 | 7.71 | 0.0000 |
According to the summary and the plots of the linear fit, we can conclude, that there is indeed a poisitive linear correlation between health damage and. economic damage.
print(xtable(anova(fit), caption = "Top 5 most economical hazardous strom events"), type="html", comment=F)
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| Ranking.y | 1 | 3.70 | 3.70 | 59.48 | 0.0000 |
| Residuals | 161 | 10.03 | 0.06 |
par(mfrow=c(2,2))
plot(fit)
We found that the top 5 most storm & weather events which are most harmful with respect to population health are: 1. Tornado, 2. Excessive Heat, 3. TSTM Wind, 4. Flood, and 5. Lightning (see table above). We also found that that the top 5 most storm & weather events which have the greatest economic consequencesa are: 1. Flood, 2. Hurricane/Typhoon, 3. Tornado, 4. Storm Surge, and 5. Hail.
In addition we investigated the question if the health and economic impact of the strom & weather events are correlated. We conducted a linear fit analysis (see section above) and conclude that there is a positive linear correlation bewteen the two categories (see scatter plot above). This suggests, that weather events which are hazerdous to the health of the pupulation also will have a strong effect on the economy.