Summary

We report the weather events impact on economic and population health in United States. The data collected from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database were analysed for this report. According to this analysis the greatest whether event affecting the population health is due to excessive heat. The mortality and morbidity rating (measure in incidence for every 100000) is respectively of 1045 and 2450. The Hurricane/Typhoon event caused the most economic impact which caused an acummulated damage over ** 90 Billions US$ ** (all values were corrected to the lastest year in the dataset, 2011).

Intro

There is great interest in the research of weather events. Since the resources provided to avert critical events are limited, the information of the events with the most impact on the population must be known in order to prioritize available resources to the most critical scenarios first, and to invest in the forecast and warning of those events. In this report data collected from the National Weather Service (NSW). The specification of the columns of the datasets are specified at this link. Due to time constraints many of the sources used to produce these data were unverified and we quote “Accordingly, the NWS does not guarantee the accuracy or validity of the information”, read this report with that information in mind.

Data Processing

The code performed to produce the results contained in this report can be hidden or visualized clicking on this button:

        # Code Visible
url="https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
filename="resources/repdata_data_StormData.csv.bz2"
if(!file.exists(filename)){
        download.file(url,filename)
}

Gathering and aggregating Data

In order to investigate the economic and health impact of the weather events the following columns were selected:

  1. PROPDMG - Property Damage
  2. PROPDMGEXP - Scale (Exponent)
  3. CROPDMG - Crop Damage
  4. CROPDMGEXP - Scale (Exponent)
  5. FATALITIES - Number of deaths
  6. INJURIES - Number of injuries

The weather events registred by the EVTYPE columns are redundant. For example, “HEAT”, “EXCESSIVE HEAT”, “HIGH TEMPERATURE”, all represents the same event. It is necessary to agragate these events into the same category before begining the analysis. The criteria used can be summarized in the following table:

source("scale.R")
library(data.table)
library(ggplot2)
oldw <- getOption("warn")
options(warn=0)
setAs("character","myDate", function(from){ 
        as.Date(gsub(" .*","",as.character(from)),format="%m/%d/%Y")}
      )
## in method for 'coerce' with signature '"character","myDate"': no definition for class "myDate"
options(warn=oldw)
cols=rep("character",37)
cols[2]="myDate"
cols[c(3:8,26,28)]="factor"
cols[c(23:25,27)]="numeric"
#cols="character"

storm.complete=data.table(read.csv("resources/repdata_data_StormData.csv.bz2",colClasses = cols))
mapset=getEvents(storm.complete)
storm=applyMapset(storm.complete,mapset)
reduce.criteria=data.table(NewCategory=mapset[["keys"]], Criteria=mapset[["labels"]])
library(xtable)
print(xtable(reduce.criteria),type='html')
NewCategory Criteria
1 SEA CURRENT|SEA|MARINE
2 LANDSLIDE SLIDE|SLUMP
3 LIGHTNING LIGN|LIGHTNING|LIGHTING
4 WIND WIND|WND
5 FUNNEL CLOUD CLOUD|FUNNEL
6 WATERSPOUT SPOUT
7 DRY DRY|DROUGHT|DRIEST
8 AVALANCHE AVALAN
9 BLIZZARD BLIZZARD
10 FOG|SMOKE FOG|SMOKE
11 DROWNING DROWNING
12 SURF SURF
13 SWELL SWELL
14 GUST GUST
15 DAM DAM
16 DUST DUST
17 TORNADO TORNADO|TORNDAO
18 HURRICANE|TYPHOON HURRICANE|TYPHOON
19 HEAT HEAT|HOT|WARM|HIGH TEMP|RECORD HIGH|RECORD TEMP|TEMPERATURE RECORD
20 COLD WINTER|COLD|FREEZ|COOL|LOW TEMP|FROST|RECORD LOW|HYP|ICE|ICY
21 STORM STORM|TSTM
22 FLOOD URBAN|FLOO|FLDG|STREAM|RISING WAT|HIGH WATER
23 TSUNAMI TSUNAMI|WAVE|TIDE
24 FIRE FIRE
25 HAIL HAIL
26 SLEET SLEET
27 SNOW SNOW
28 VOLCANIC VOLCANIC
29 RAIN RAIN|PRECIP|WET
30 OTHER OTHER

The column “NewCategory” is the new Event that will aggregate others. The column “Criteria” is the key word that is matched against the old events, the category “OTHER” collects the events not matched by the preceeding criteria. We apply each matched content in order and replace the old naming by the new one. It is possible to different Criterias to match the same Event, in that case the last Criteria is the effective one.

Scaling

It can be seen that some exponents were not acurately specified (“?”,“+”,“-”). There are considered to be 0:

levels(storm$CROPDMGEXP)
## [1] ""  "?" "0" "2" "B" "k" "K" "m" "M"
levels(storm$PROPDMGEXP)
##  [1] ""  "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"

These scales are applied to PROPDMG and CROPDMG columns in order to obtain the TOTAL PROPERTY DAMAGE defined as the sum of CROP DAMAGE and PROPERTY DAMAGE. The number of AFFECTED PEOPLE is defined as the sum of the number of FATALITIES and the number of people with INJURIES. Each quantity is grouped by each event:

library(data.table)
storm$CROPDMG=ApplyScale(storm$CROPDMGEXP,storm$CROPDMG)
storm$PROPDMG=ApplyScale(storm$PROPDMGEXP,storm$PROPDMG)
TotalStorm=storm[,.(FATALITIES=sum(FATALITIES),INJURIES=sum(INJURIES),CROP_DAMAGE=sum(CROPDMG),PROPERTY_DAMAGE=sum(PROPDMG)),by=EVTYPE]
total=getTop("TOTAL",TotalStorm,T,5)
totalv=getTopValues("TOTAL",TotalStorm,5)
print(xtable(total),type='html')
SOURCE TOTAL_PROPERTY_DAMAGE AFFECTED_PEOPLE FATALITIES
1 TOTAL FLOOD TORNADO TORNADO
2 TOTAL HURRICANE|TYPHOON WIND HEAT
3 TOTAL STORM HEAT FLOOD
4 TOTAL TORNADO FLOOD WIND
5 TOTAL WIND LIGHTNING LIGHTNING
print(xtable(totalv),type='html')
SOURCE TOTAL_PROPERTY_DAMAGE AFFECTED_PEOPLE FATALITIES
1 TOTAL 180528894734.50 96997.00 5633.00
2 TOTAL 90762452810.00 12831.00 3132.00
3 TOTAL 57843187100.00 12343.00 1557.00
4 TOTAL 57367113946.50 10238.00 1410.00
5 TOTAL 20102055571.51 6049.00 817.00

totalv ```

At first glance it seems that the FLOOD is the main cause of property damage and TORNADO has the main effect on peoples health.However, the Flood damages estimates are not as accurate as other events, and the conclusion may not be reliable, we quote the NWS directives: “The Storm Data preparer must enter monetary damage amounts for flood events, even if it is a ‘guesstimate.’ The U.S. Army Corps of Engineers requires the NWS to provide monetary damage amounts (property and/or crop) resulting from any flood event.”. Therefore HURRICANE/TYPHOON may very well be the main source of weather related damages in the scenario which the flood damages are overestimated.

To make sure that this analysis is realiable we visually explore the distribuition of those events on each year:

library(cowplot)
source("scale.R")
#storm$Year=year(as.Date(gsub(" .*","",as.character(storm$BGN_DATE)),format="%m/%d/%Y"))
storm$Year=year(storm$BGN_DATE)
TotalStormYear=storm[,.(FATALITIES=sum(FATALITIES),INJURIES=sum(INJURIES),CROP_DAMAGE=sum(CROPDMG),PROPERTY_DAMAGE=sum(PROPDMG)),by=.(EVTYPE,Year)]
TotalStormYear[,TOTAL_DAMAGE:=CROP_DAMAGE+PROPERTY_DAMAGE]
TotalStormYear[,TOTAL_HEALTH_DAMAGE:= FATALITIES+INJURIES]

events=unique(as.array(as.matrix(total[,-1,with=FALSE]))[1:15])
datas=c("1950","1972","1982", "1992","2012")
ndatas=as.numeric(datas)

plot_dmg=ggplot(data=TotalStormYear[EVTYPE %in% events],aes(x=Year,y=log(1+TOTAL_DAMAGE),colour=EVTYPE))+geom_point()+ylab("LOG(1+DAMAGES (US$))")+scale_x_continuous(breaks=ndatas,labels=datas)+
        geom_vline(xintercept=1992)+theme(axis.text=element_text(size=10))
plot_health=ggplot(data=TotalStormYear[EVTYPE %in% events],aes(x=Year,y=log(1+TOTAL_HEALTH_DAMAGE),colour=EVTYPE))+geom_point()+scale_x_continuous(breaks=ndatas,labels=datas)+geom_vline(xintercept=1982)+geom_vline(xintercept=1992)+ylab("Log(1+# People )")+theme(axis.text=element_text(size=10))
multiplot(plot_dmg,plot_health)

Figure 1. *** a) Logarithm of the acummulated damages caused by weather events for each event as a function of the Year of ocurrence. b) Logarithm of the number of people injuried or killed due to weather events as a function of the Year of ocurrence.

It is clear from the data that the properties damages was recorded only for TORNADOS until 1992. Similarly FLOOD fatalities were included only after 1982 and other weather related fatalities/injuries were included after 1992. Lets cut the data before 1993 and analyse between FLOOD and HURRICANE damaging events:

flood=ggplot(data=TotalStormYear[EVTYPE %in% c("FLOOD","HURRICANE|TYPHOON","TORNADO") & Year>1992],aes(x=Year,y=TOTAL_DAMAGE,colour=EVTYPE))+geom_line()+ylab("TOTAL DAMAGES (US$)")+geom_point()
most.damaging.event=storm.complete[which.max(storm$PROPDMG)]
flood

Figure 2. Most relevant weather event ocurrencies as a functions of time. There is a remarkable high point of the 2006 flood event.

In the picture above the 2006 FLOOD event is out of character. Investigating this outlier we find that an error probably was made, since the estimated value is about 3 orders of magnitude above other values. According to the USGS open file report 2006 entitled “Storms and Flooding in California in December 2005 and January 2006—A Preliminary Assessment”: “An estimated $300 million in damages were attributed to the storms (California Office of Emergency Services, 2006)”.

That is very different from the record of the NWS:

most.damaging.event[,c(2,23:28,36),with=FALSE]
##      BGN_DATE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1: 2006-01-01          0        0     115          B    32.5          M
##                                                                                                                                                                                                                                                                                                                                                                                           REMARKS
## 1: Major flooding continued into the early hours of January 1st, before the Napa River finally fell below flood stage and the water receeded. Flooding was severe in Downtown Napa from the Napa Creek and the City and Parks Department was hit with $6 million in damage alone. The City of Napa had 600 homes with moderate damage, 150 damaged businesses with costs of at least $70 million.

The remarks of the record itself provides estimates far bellow the 115 Billion US$. I believe that to be a simple recording error, writing a “B”(Billion) instead of “M” (Million). We assume that to be true in the subsequent analysis.

storm[which.max(storm$PROPDMG)]$PROPDMG=storm[which.max(storm$PROPDMG)]$PROPDMG/1000
TotalStorm93=storm[Year >1992,.(FATALITIES=sum(FATALITIES),INJURIES=sum(INJURIES),CROP_DAMAGE=sum(CROPDMG),PROPERTY_DAMAGE=sum(PROPDMG)),by=EVTYPE]
TotalStorm93= TotalStorm93[,TOTAL_PROPERTY_DAMAGE:=PROPERTY_DAMAGE+CROP_DAMAGE]
TotalStorm93=TotalStorm93[,TOTAL_HEALTH_DAMAGE:=FATALITIES+INJURIES]
post92=getTop("POST 92 TOTAL",TotalStorm93,T,5)
post92v=getTopValues("POST 92 TOTAL",TotalStorm93,5)

print(xtable(post92),type='html')
SOURCE TOTAL_PROPERTY_DAMAGE AFFECTED_PEOPLE FATALITIES
1 POST 92 TOTAL HURRICANE|TYPHOON TORNADO HEAT
2 POST 92 TOTAL FLOOD HEAT TORNADO
3 POST 92 TOTAL STORM FLOOD FLOOD
4 POST 92 TOTAL TORNADO WIND WIND
5 POST 92 TOTAL WIND LIGHTNING LIGHTNING
print(xtable(post92v),type='html')
SOURCE TOTAL_PROPERTY_DAMAGE AFFECTED_PEOPLE FATALITIES
1 POST 92 TOTAL 90762452810.00 24949.00 3132.00
2 POST 92 TOTAL 65643894734.50 12343.00 1621.00
3 POST 92 TOTAL 57843187100.00 10238.00 1557.00
4 POST 92 TOTAL 26768915376.50 9242.00 1147.00
5 POST 92 TOTAL 20102055571.51 6049.00 817.00

Now it appears that HEAT is the main cause of fatalities. In these analysis we failed to account for population growth and inflation. To consider this impact a consumer price index and a population table are obtained for this purpose.

Results

source("scale.R")
#Account for inflation
rates=data.table(read.csv("inflation.rates.csv",header = T,sep=";"))
aux=1
rate=rates[which(with(rates,Year>1992 & Year < 2012))]
rate$Acum=0
for(i in (length(rate$Year)-1):1){
        aux=aux*(1+as.numeric(as.character(rate[i+1,]$Ave))/100)
        rate$Acum[i]=aux
}
rate$Acum[length(rate$Year)]=1
rate=rate[,c(1,15),with=F]
economic=storm[Year >1992,.(TOTAL_PROPERTY_DAMAGE=sum(CROPDMG+PROPDMG)),by=.(EVTYPE,Year)]
economic.impact=merge(economic,rate,by="Year",all=T)
economic.impact=economic.impact[,.(TOTAL_PROPERTY_DAMAGE=sum(TOTAL_PROPERTY_DAMAGE*Acum)),by=EVTYPE]
setorder(economic.impact,-TOTAL_PROPERTY_DAMAGE)
ei=melt(head(economic.impact,8),id.vars = "EVTYPE")
cacPalette=c("black","red","blue", "#CCCCFF","dark blue", "dark grey", "cyan", "yellow")
economic.plot=ggplot(data=ei,aes(x=variable ,y=value,fill=EVTYPE,order=value)) + geom_bar(stat="identity")+scale_fill_manual(values=cacPalette)+ylab("Damages (US$)")+xlab("")+ggtitle("Total Acumulated Damage ")+theme(axis.text=element_text(size=10))

#Account for populational growth
growth=data.table(read.csv("population.csv",header =T,sep=";"))
growth=growth[which(with(growth,Year>1992 & Year < 2012))]
growth=growth[,c(1,2),with=F]
health=storm[Year >1992,.(MORTALITIES=sum(FATALITIES),MORBIDITIES=sum(FATALITIES+INJURIES)),by=.(EVTYPE,Year)]
health.impact=merge(health,growth,by="Year",all=T)
health.impact$MORTALITIES=health.impact$MORTALITIES*100000/health.impact$Population
health.impact$MORBIDITIES=health.impact$MORBIDITIES*100000/health.impact$Population
health.impact=health.impact[,.(MORTALITIES=mean(MORTALITIES,na.rm=T),
                               MORBIDITIES=mean(MORBIDITIES,na.rm=T)),by=EVTYPE]
setorder(health.impact,-MORTALITIES,-MORBIDITIES)
hi=melt(head(health.impact,8),id.vars = "EVTYPE")

cbcPalette=c("light blue","#CCCCFF","blue","red", "dark grey","dark blue",  "cyan", "yellow")
health.plot=ggplot(data=hi,aes(x="",y=value,fill=EVTYPE,order=value)) + geom_bar(stat="identity")+scale_fill_manual(values=cbcPalette)+
ylab("# of People per 100000 ")+xlab("")+ggtitle("Time Average of Mortality and Morbidity")+theme(axis.text=element_text(size=10))+facet_wrap( ~variable,nrow=1,scales="free")
multiplot(economic.plot,health.plot)

Figure 3. a) Economic Impact measured by acummulated total damages for each event corrected to the values up to 2011. B) Health impact measured by the numeber of killed (mortality) (for each 100000 people) due to weather events and the number of people affected (killed+ injuried: morbidity).

Conclusion

After reviewing the data and removing several bias, It can be concluded that HURRICANE|TYPHOON is the main event responsible for the economic impact of weather events and Excessive Heat is the event that physically affects the population the most.