We report the weather events impact on economic and population health in United States. The data collected from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database were analysed for this report. According to this analysis the greatest whether event affecting the population health is due to excessive heat. The mortality and morbidity rating (measure in incidence for every 100000) is respectively of 1045 and 2450. The Hurricane/Typhoon event caused the most economic impact which caused an acummulated damage over ** 90 Billions US$ ** (all values were corrected to the lastest year in the dataset, 2011).
There is great interest in the research of weather events. Since the resources provided to avert critical events are limited, the information of the events with the most impact on the population must be known in order to prioritize available resources to the most critical scenarios first, and to invest in the forecast and warning of those events. In this report data collected from the National Weather Service (NSW). The specification of the columns of the datasets are specified at this link. Due to time constraints many of the sources used to produce these data were unverified and we quote “Accordingly, the NWS does not guarantee the accuracy or validity of the information”, read this report with that information in mind.
The code performed to produce the results contained in this report can be hidden or visualized clicking on this button:
# Code Visible
url="https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
filename="resources/repdata_data_StormData.csv.bz2"
if(!file.exists(filename)){
download.file(url,filename)
}
In order to investigate the economic and health impact of the weather events the following columns were selected:
The weather events registred by the EVTYPE columns are redundant. For example, “HEAT”, “EXCESSIVE HEAT”, “HIGH TEMPERATURE”, all represents the same event. It is necessary to agragate these events into the same category before begining the analysis. The criteria used can be summarized in the following table:
source("scale.R")
library(data.table)
library(ggplot2)
oldw <- getOption("warn")
options(warn=0)
setAs("character","myDate", function(from){
as.Date(gsub(" .*","",as.character(from)),format="%m/%d/%Y")}
)
## in method for 'coerce' with signature '"character","myDate"': no definition for class "myDate"
options(warn=oldw)
cols=rep("character",37)
cols[2]="myDate"
cols[c(3:8,26,28)]="factor"
cols[c(23:25,27)]="numeric"
#cols="character"
storm.complete=data.table(read.csv("resources/repdata_data_StormData.csv.bz2",colClasses = cols))
mapset=getEvents(storm.complete)
storm=applyMapset(storm.complete,mapset)
reduce.criteria=data.table(NewCategory=mapset[["keys"]], Criteria=mapset[["labels"]])
library(xtable)
print(xtable(reduce.criteria),type='html')
| NewCategory | Criteria | |
|---|---|---|
| 1 | SEA | CURRENT|SEA|MARINE |
| 2 | LANDSLIDE | SLIDE|SLUMP |
| 3 | LIGHTNING | LIGN|LIGHTNING|LIGHTING |
| 4 | WIND | WIND|WND |
| 5 | FUNNEL CLOUD | CLOUD|FUNNEL |
| 6 | WATERSPOUT | SPOUT |
| 7 | DRY | DRY|DROUGHT|DRIEST |
| 8 | AVALANCHE | AVALAN |
| 9 | BLIZZARD | BLIZZARD |
| 10 | FOG|SMOKE | FOG|SMOKE |
| 11 | DROWNING | DROWNING |
| 12 | SURF | SURF |
| 13 | SWELL | SWELL |
| 14 | GUST | GUST |
| 15 | DAM | DAM |
| 16 | DUST | DUST |
| 17 | TORNADO | TORNADO|TORNDAO |
| 18 | HURRICANE|TYPHOON | HURRICANE|TYPHOON |
| 19 | HEAT | HEAT|HOT|WARM|HIGH TEMP|RECORD HIGH|RECORD TEMP|TEMPERATURE RECORD |
| 20 | COLD | WINTER|COLD|FREEZ|COOL|LOW TEMP|FROST|RECORD LOW|HYP|ICE|ICY |
| 21 | STORM | STORM|TSTM |
| 22 | FLOOD | URBAN|FLOO|FLDG|STREAM|RISING WAT|HIGH WATER |
| 23 | TSUNAMI | TSUNAMI|WAVE|TIDE |
| 24 | FIRE | FIRE |
| 25 | HAIL | HAIL |
| 26 | SLEET | SLEET |
| 27 | SNOW | SNOW |
| 28 | VOLCANIC | VOLCANIC |
| 29 | RAIN | RAIN|PRECIP|WET |
| 30 | OTHER | OTHER |
The column “NewCategory” is the new Event that will aggregate others. The column “Criteria” is the key word that is matched against the old events, the category “OTHER” collects the events not matched by the preceeding criteria. We apply each matched content in order and replace the old naming by the new one. It is possible to different Criterias to match the same Event, in that case the last Criteria is the effective one.
It can be seen that some exponents were not acurately specified (“?”,“+”,“-”). There are considered to be 0:
levels(storm$CROPDMGEXP)
## [1] "" "?" "0" "2" "B" "k" "K" "m" "M"
levels(storm$PROPDMGEXP)
## [1] "" "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
These scales are applied to PROPDMG and CROPDMG columns in order to obtain the TOTAL PROPERTY DAMAGE defined as the sum of CROP DAMAGE and PROPERTY DAMAGE. The number of AFFECTED PEOPLE is defined as the sum of the number of FATALITIES and the number of people with INJURIES. Each quantity is grouped by each event:
library(data.table)
storm$CROPDMG=ApplyScale(storm$CROPDMGEXP,storm$CROPDMG)
storm$PROPDMG=ApplyScale(storm$PROPDMGEXP,storm$PROPDMG)
TotalStorm=storm[,.(FATALITIES=sum(FATALITIES),INJURIES=sum(INJURIES),CROP_DAMAGE=sum(CROPDMG),PROPERTY_DAMAGE=sum(PROPDMG)),by=EVTYPE]
total=getTop("TOTAL",TotalStorm,T,5)
totalv=getTopValues("TOTAL",TotalStorm,5)
print(xtable(total),type='html')
| SOURCE | TOTAL_PROPERTY_DAMAGE | AFFECTED_PEOPLE | FATALITIES | |
|---|---|---|---|---|
| 1 | TOTAL | FLOOD | TORNADO | TORNADO |
| 2 | TOTAL | HURRICANE|TYPHOON | WIND | HEAT |
| 3 | TOTAL | STORM | HEAT | FLOOD |
| 4 | TOTAL | TORNADO | FLOOD | WIND |
| 5 | TOTAL | WIND | LIGHTNING | LIGHTNING |
print(xtable(totalv),type='html')
| SOURCE | TOTAL_PROPERTY_DAMAGE | AFFECTED_PEOPLE | FATALITIES | |
|---|---|---|---|---|
| 1 | TOTAL | 180528894734.50 | 96997.00 | 5633.00 |
| 2 | TOTAL | 90762452810.00 | 12831.00 | 3132.00 |
| 3 | TOTAL | 57843187100.00 | 12343.00 | 1557.00 |
| 4 | TOTAL | 57367113946.50 | 10238.00 | 1410.00 |
| 5 | TOTAL | 20102055571.51 | 6049.00 | 817.00 |
totalv ```
At first glance it seems that the FLOOD is the main cause of property damage and TORNADO has the main effect on peoples health.However, the Flood damages estimates are not as accurate as other events, and the conclusion may not be reliable, we quote the NWS directives: “The Storm Data preparer must enter monetary damage amounts for flood events, even if it is a ‘guesstimate.’ The U.S. Army Corps of Engineers requires the NWS to provide monetary damage amounts (property and/or crop) resulting from any flood event.”. Therefore HURRICANE/TYPHOON may very well be the main source of weather related damages in the scenario which the flood damages are overestimated.
To make sure that this analysis is realiable we visually explore the distribuition of those events on each year:
library(cowplot)
source("scale.R")
#storm$Year=year(as.Date(gsub(" .*","",as.character(storm$BGN_DATE)),format="%m/%d/%Y"))
storm$Year=year(storm$BGN_DATE)
TotalStormYear=storm[,.(FATALITIES=sum(FATALITIES),INJURIES=sum(INJURIES),CROP_DAMAGE=sum(CROPDMG),PROPERTY_DAMAGE=sum(PROPDMG)),by=.(EVTYPE,Year)]
TotalStormYear[,TOTAL_DAMAGE:=CROP_DAMAGE+PROPERTY_DAMAGE]
TotalStormYear[,TOTAL_HEALTH_DAMAGE:= FATALITIES+INJURIES]
events=unique(as.array(as.matrix(total[,-1,with=FALSE]))[1:15])
datas=c("1950","1972","1982", "1992","2012")
ndatas=as.numeric(datas)
plot_dmg=ggplot(data=TotalStormYear[EVTYPE %in% events],aes(x=Year,y=log(1+TOTAL_DAMAGE),colour=EVTYPE))+geom_point()+ylab("LOG(1+DAMAGES (US$))")+scale_x_continuous(breaks=ndatas,labels=datas)+
geom_vline(xintercept=1992)+theme(axis.text=element_text(size=10))
plot_health=ggplot(data=TotalStormYear[EVTYPE %in% events],aes(x=Year,y=log(1+TOTAL_HEALTH_DAMAGE),colour=EVTYPE))+geom_point()+scale_x_continuous(breaks=ndatas,labels=datas)+geom_vline(xintercept=1982)+geom_vline(xintercept=1992)+ylab("Log(1+# People )")+theme(axis.text=element_text(size=10))
multiplot(plot_dmg,plot_health)
It is clear from the data that the properties damages was recorded only for TORNADOS until 1992. Similarly FLOOD fatalities were included only after 1982 and other weather related fatalities/injuries were included after 1992. Lets cut the data before 1993 and analyse between FLOOD and HURRICANE damaging events:
flood=ggplot(data=TotalStormYear[EVTYPE %in% c("FLOOD","HURRICANE|TYPHOON","TORNADO") & Year>1992],aes(x=Year,y=TOTAL_DAMAGE,colour=EVTYPE))+geom_line()+ylab("TOTAL DAMAGES (US$)")+geom_point()
most.damaging.event=storm.complete[which.max(storm$PROPDMG)]
flood
In the picture above the 2006 FLOOD event is out of character. Investigating this outlier we find that an error probably was made, since the estimated value is about 3 orders of magnitude above other values. According to the USGS open file report 2006 entitled “Storms and Flooding in California in December 2005 and January 2006—A Preliminary Assessment”: “An estimated $300 million in damages were attributed to the storms (California Office of Emergency Services, 2006)”.
That is very different from the record of the NWS:
most.damaging.event[,c(2,23:28,36),with=FALSE]
## BGN_DATE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1: 2006-01-01 0 0 115 B 32.5 M
## REMARKS
## 1: Major flooding continued into the early hours of January 1st, before the Napa River finally fell below flood stage and the water receeded. Flooding was severe in Downtown Napa from the Napa Creek and the City and Parks Department was hit with $6 million in damage alone. The City of Napa had 600 homes with moderate damage, 150 damaged businesses with costs of at least $70 million.
The remarks of the record itself provides estimates far bellow the 115 Billion US$. I believe that to be a simple recording error, writing a “B”(Billion) instead of “M” (Million). We assume that to be true in the subsequent analysis.
storm[which.max(storm$PROPDMG)]$PROPDMG=storm[which.max(storm$PROPDMG)]$PROPDMG/1000
TotalStorm93=storm[Year >1992,.(FATALITIES=sum(FATALITIES),INJURIES=sum(INJURIES),CROP_DAMAGE=sum(CROPDMG),PROPERTY_DAMAGE=sum(PROPDMG)),by=EVTYPE]
TotalStorm93= TotalStorm93[,TOTAL_PROPERTY_DAMAGE:=PROPERTY_DAMAGE+CROP_DAMAGE]
TotalStorm93=TotalStorm93[,TOTAL_HEALTH_DAMAGE:=FATALITIES+INJURIES]
post92=getTop("POST 92 TOTAL",TotalStorm93,T,5)
post92v=getTopValues("POST 92 TOTAL",TotalStorm93,5)
print(xtable(post92),type='html')
| SOURCE | TOTAL_PROPERTY_DAMAGE | AFFECTED_PEOPLE | FATALITIES | |
|---|---|---|---|---|
| 1 | POST 92 TOTAL | HURRICANE|TYPHOON | TORNADO | HEAT |
| 2 | POST 92 TOTAL | FLOOD | HEAT | TORNADO |
| 3 | POST 92 TOTAL | STORM | FLOOD | FLOOD |
| 4 | POST 92 TOTAL | TORNADO | WIND | WIND |
| 5 | POST 92 TOTAL | WIND | LIGHTNING | LIGHTNING |
print(xtable(post92v),type='html')
| SOURCE | TOTAL_PROPERTY_DAMAGE | AFFECTED_PEOPLE | FATALITIES | |
|---|---|---|---|---|
| 1 | POST 92 TOTAL | 90762452810.00 | 24949.00 | 3132.00 |
| 2 | POST 92 TOTAL | 65643894734.50 | 12343.00 | 1621.00 |
| 3 | POST 92 TOTAL | 57843187100.00 | 10238.00 | 1557.00 |
| 4 | POST 92 TOTAL | 26768915376.50 | 9242.00 | 1147.00 |
| 5 | POST 92 TOTAL | 20102055571.51 | 6049.00 | 817.00 |
Now it appears that HEAT is the main cause of fatalities. In these analysis we failed to account for population growth and inflation. To consider this impact a consumer price index and a population table are obtained for this purpose.
source("scale.R")
#Account for inflation
rates=data.table(read.csv("inflation.rates.csv",header = T,sep=";"))
aux=1
rate=rates[which(with(rates,Year>1992 & Year < 2012))]
rate$Acum=0
for(i in (length(rate$Year)-1):1){
aux=aux*(1+as.numeric(as.character(rate[i+1,]$Ave))/100)
rate$Acum[i]=aux
}
rate$Acum[length(rate$Year)]=1
rate=rate[,c(1,15),with=F]
economic=storm[Year >1992,.(TOTAL_PROPERTY_DAMAGE=sum(CROPDMG+PROPDMG)),by=.(EVTYPE,Year)]
economic.impact=merge(economic,rate,by="Year",all=T)
economic.impact=economic.impact[,.(TOTAL_PROPERTY_DAMAGE=sum(TOTAL_PROPERTY_DAMAGE*Acum)),by=EVTYPE]
setorder(economic.impact,-TOTAL_PROPERTY_DAMAGE)
ei=melt(head(economic.impact,8),id.vars = "EVTYPE")
cacPalette=c("black","red","blue", "#CCCCFF","dark blue", "dark grey", "cyan", "yellow")
economic.plot=ggplot(data=ei,aes(x=variable ,y=value,fill=EVTYPE,order=value)) + geom_bar(stat="identity")+scale_fill_manual(values=cacPalette)+ylab("Damages (US$)")+xlab("")+ggtitle("Total Acumulated Damage ")+theme(axis.text=element_text(size=10))
#Account for populational growth
growth=data.table(read.csv("population.csv",header =T,sep=";"))
growth=growth[which(with(growth,Year>1992 & Year < 2012))]
growth=growth[,c(1,2),with=F]
health=storm[Year >1992,.(MORTALITIES=sum(FATALITIES),MORBIDITIES=sum(FATALITIES+INJURIES)),by=.(EVTYPE,Year)]
health.impact=merge(health,growth,by="Year",all=T)
health.impact$MORTALITIES=health.impact$MORTALITIES*100000/health.impact$Population
health.impact$MORBIDITIES=health.impact$MORBIDITIES*100000/health.impact$Population
health.impact=health.impact[,.(MORTALITIES=mean(MORTALITIES,na.rm=T),
MORBIDITIES=mean(MORBIDITIES,na.rm=T)),by=EVTYPE]
setorder(health.impact,-MORTALITIES,-MORBIDITIES)
hi=melt(head(health.impact,8),id.vars = "EVTYPE")
cbcPalette=c("light blue","#CCCCFF","blue","red", "dark grey","dark blue", "cyan", "yellow")
health.plot=ggplot(data=hi,aes(x="",y=value,fill=EVTYPE,order=value)) + geom_bar(stat="identity")+scale_fill_manual(values=cbcPalette)+
ylab("# of People per 100000 ")+xlab("")+ggtitle("Time Average of Mortality and Morbidity")+theme(axis.text=element_text(size=10))+facet_wrap( ~variable,nrow=1,scales="free")
multiplot(economic.plot,health.plot)
After reviewing the data and removing several bias, It can be concluded that HURRICANE|TYPHOON is the main event responsible for the economic impact of weather events and Excessive Heat is the event that physically affects the population the most.