A brief study of consequences of Storm Events in the USA

Abstract

In this work we analize the NOAA Storm Database to answer which meteorological events are the more dangerous, both in human and economical losses. We conclude that by far, tornados are the most harmful of all events measured in human lives (fatalites and injuries) followed by heat and flood events. Also, the latter events causes the greatest accumulated monetary loss when the property and crop damages are combined. These findings must be corroborated by doing a deep cleaning of the data.

Introduction

Severe weather events, such as tornados or extreme rainfall can cause losses in lifes and livelihoods when they occur. There is a growing body of evidence that this events will be even more extreme in the near future, due in part to the global climate change. This has prompted new research on how to deal with the risks [1],on the base that: To know is to be prepared. In this work it is examined the NOAA Storm Database to answer which types of events were the most harmful in the United States, during the time period from 1950 to Nov. 2011. The analysis will focus in four specific measurements of loss:

Human Health
```
+ Fatalities

+ Injuries
```
Economic Loss
```
+ Property Damage

+ Crop Damage
```

Data Preparation

The data file was downloaded on Feb 21, 2015 from the URL shown below. Beaware, it is a 46.9 MB file and it takes time to download and to read in R. Code bellow check if these steps have already been taken. Here is the code to read the data and save a working environment for easy access.

# Prepare R, set libraries to be used
library(xtable, quietly = TRUE, warn.conflicts = FALSE)
library(lattice, quietly = TRUE, warn.conflicts = FALSE)
library(pander, quietly = TRUE, warn.conflicts = FALSE)
library(scales, quietly = TRUE, warn.conflicts = FALSE)

# Check for data files and read/load in R

if (!file.exists("storm.csv.bz2")) {
    download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", 
        "storm.csv.bz2", mode = "wb")
    storm <- read.csv("storm.csv.bz2")
    save(storm, file = "storm.RData")
} else if (file.exists("storm.RData")) {
    load("storm.RData")
} else {
    stop("No data file in this folder, please check!")
}

The storm data frame has 37 variables on 902297 events. The columns relevant for the analysis are named: STATE,EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG and CROPDMGEXP.

After looking at the data, using the str function, one can see that the classification variable EVTYPE has 985 levels and the identification variable STATE has 72 abbreviation names. To address this, one must make some pre-processing of the data. One of the problems is the use of lowercase and upper case letters, that can be easily solved in R, but there are some compound event names that correspond to different meteorological that happened at the same time; for example there are 73 events that start with the word “[Ss]ummary”.To change these names to match the **NOAA* directives for event reporting [2], requires a lot of data cleansing. Since this is a homework for reproducible research, the data will be used as is. This would mean that the results are underestimating the reported total for a given event, and it should be investigated if necessary.

Another issue happens with the economic damage data. One could assume is in nominal US dollars, i.e they are not corrected for inflation, so it is not wise to simply sum the amounts to obtain a number for total loss. However, it escapes the scope of this work to change the monetary damage into **real* US dollars, so it won’t be done. But one thing that is needed is to combine the columns DMG and DMGEXP to get the proper order of magnitude for the monetary loss.

In the docummentation for this problem it is stated that DMGEXP should be read as:

[] Multiply by 1
[hH] Multiply by 100
[kK] Multiply by 1,000
[mM] Multiply by 1,000,000
[bB] Multiply by 1,000,000,000

It is worth noticing that the DMGEXP has characters that are not in the list shown above, and those records will be treated as if the multiplier for the losses were equal to one.

# Data in dataframe <storm>

use.col <- c("STATE","EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")
storm.red<- storm[,use.col]

storm.red$EVTYPE<-toupper(storm.red$EVTYPE)  #Use Uppercase characters
storm.red$PROPDMGEXP<-toupper(storm.red$PROPDMGEXP)
storm.red$CROPDMGEXP<-toupper(storm.red$CROPDMGEXP)

# Make numeric columms of dmg fix for magnitude
attach(storm.red)
propdmg<-PROPDMG
propdmg[PROPDMGEXP=="H"]<-propdmg[PROPDMGEXP=="H"]*100
propdmg[PROPDMGEXP=="K"]<-propdmg[PROPDMGEXP=="K"]*1000
propdmg[PROPDMGEXP=="M"]<-propdmg[PROPDMGEXP=="M"]*1000000
propdmg[PROPDMGEXP=="B"]<-propdmg[PROPDMGEXP=="B"]*1000000000

cropdmg<-CROPDMG
cropdmg[CROPDMGEXP=="H"]<-cropdmg[CROPDMGEXP=="H"]*100
cropdmg[CROPDMGEXP=="K"]<-cropdmg[CROPDMGEXP=="K"]*1000
cropdmg[CROPDMGEXP=="M"]<-cropdmg[CROPDMGEXP=="M"]*1000000
cropdmg[CROPDMGEXP=="B"]<-cropdmg[CROPDMGEXP=="B"]*1000000000

storm.red$propdmg <-propdmg
storm.red$cropdmg <-cropdmg
detach(2)

Results

Note that all of the results shown along with the text are the result of inline code, (surrounding code with justs one back ticks and the letter r), but the code is not shown in the knitted output.

Fatalities and Injuries

In total there are 6,974 events (0.77 %) that report one or more fatalities, for a total of 15145 deaths and 17604 events (1.95 %) that report one or more injuries, for a total of 140528. To find out which events are the most dangereous for humans one can look at how many fatalities or injuries are for each type of events. In the table below it is shown the top 15 events.

#fatalities
x1<-with(storm.red,aggregate(FATALITIES, list(EVTYPE), sum))
y1<-x1[x1[,2]>0,]  # USe only those with fatalities
y1<-y1[order(y1[,2],decreasing=TRUE),]  
#injuries
x2<-with(storm.red,aggregate(INJURIES, list(EVTYPE), sum))
y2<-x2[x2[,2]>0,]
y2<-y2[order(y2[,2],decreasing=TRUE),]

top.events<-cbind(y1[1:15,],y2[1:15,])
colnames(top.events)<-c("Hazard; Fatalities","Victims", "Hazard; Injuries","Victims")
rownames(top.events)<-NULL
pander(top.events, caption="Top events affecting humans in the USA, period 1950-2011.  Left correspond to fatality victims, right to injuries")

Top events affecting humans in the USA, period 1950-2011. Left correspond to fatality victims, right to injuries
Hazard; Fatalities	Victims	Hazard; Injuries	Victims
TORNADO	5633	TORNADO	91346
EXCESSIVE HEAT	1903	TSTM WIND	6957
FLASH FLOOD	978	FLOOD	6789
HEAT	937	EXCESSIVE HEAT	6525
LIGHTNING	816	LIGHTNING	5230
TSTM WIND	504	HEAT	2100
FLOOD	470	ICE STORM	1975
RIP CURRENT	368	FLASH FLOOD	1777
HIGH WIND	248	THUNDERSTORM WIND	1488
AVALANCHE	224	HAIL	1361
WINTER STORM	206	WINTER STORM	1321
RIP CURRENTS	204	HURRICANE/TYPHOON	1275
HEAT WAVE	172	HIGH WIND	1137
EXTREME COLD	162	HEAVY SNOW	1021
THUNDERSTORM WIND	133	WILDFIRE	911

Notice that some of the rows of the table correspond to the same type of weather condition, for example “EXCESSIVE HEAT” and “HEAT”. This must be changed for a better understanding of the hazards.

As can be seen in the table above, tornados are the main cause of injuries and fatalities in USA. Notice that the second cause of mortality is excesive heat and it can be seen that more than half of deaths occured in one event, located in Chicago IL.

To see which states are most affected by the top conditions we looked at the following plot, where the number of fatalities caused by the five top events are plotted by states.

Notice that for the plot, only the records with a propper abbreviate state name are used.

top.events.death<-top.events[1:5,1]
bystate<-with(storm.red,aggregate(FATALITIES,list(STATE,EVTYPE),sum))
colnames(bystate)<-c("STATE","EVTYPE","FATALITIES")

bas<-bystate[(bystate$STATE %in% state.abb)&(bystate$EVTYPE %in% top.events.death),]
barchart(FATALITIES~EVTYPE|STATE,data=bas,layout=c(5,4),scales=list(x=list(rot=45)))

Notice that some states are “safer” than others.

Economic losses

We proceed with the economic losses by looking at the top events.

# property
x1 <- with(storm.red, aggregate(propdmg, list(EVTYPE), sum))
y1 <- x1[x1[, 2] > 0, ]  # USe only those with damage
y1 <- y1[order(y1[, 2], decreasing = TRUE), ]
# crops
x2 <- with(storm.red, aggregate(cropdmg, list(EVTYPE), sum))
y2 <- x2[x2[, 2] > 0, ]
y2 <- y2[order(y2[, 2], decreasing = TRUE), ]

top.events <- cbind(y1[1:15, 1], comma(round(y1[1:15, 2]/1000, 0)), y2[1:15, 
    1], comma(round(y2[1:15, 2]/1000, 0)))

colnames(top.events) <- c("Hazard", "Property", "Hazard", "Crops")
rownames(top.events) <- NULL

pander(top.events, caption = "Top events affecting Properties and Crops in the USA, period 1950-2011.  Left correspond to Property losses, right to Crops. Nominal Currency in Thousands of US Dollars")

Top events affecting Properties and Crops in the USA, period 1950-2011. Left correspond to Property losses, right to Crops. Nominal Currency in Thousands of US Dollars
Hazard	Property	Hazard	Crops
FLOOD	144,657,710	DROUGHT	13,972,566
HURRICANE/TYPHOON	69,305,840	FLOOD	5,661,968
TORNADO	56,937,161	RIVER FLOOD	5,029,459
STORM SURGE	43,323,536	ICE STORM	5,022,114
FLASH FLOOD	16,140,812	HAIL	3,025,954
HAIL	15,732,268	HURRICANE	2,741,910
HURRICANE	11,868,319	HURRICANE/TYPHOON	2,607,873
TROPICAL STORM	7,703,891	FLASH FLOOD	1,421,317
WINTER STORM	6,688,497	EXTREME COLD	1,312,973
HIGH WIND	5,270,046	FROST/FREEZE	1,094,186
RIVER FLOOD	5,118,946	HEAVY RAIN	733,400
WILDFIRE	4,765,114	TROPICAL STORM	678,346
STORM SURGE/TIDE	4,641,188	HIGH WIND	638,571
TSTM WIND	4,484,958	TSTM WIND	554,007
ICE STORM	3,944,928	EXCESSIVE HEAT	492,402

Important Note: As can be seen in the table above, there is needed for data cleansing, to be able to rank the events properly. For example, for crop losses, the combined loss for “FLOOD”, “RIVER FLOOD” and “FLASH FLOOD” is almost as big as the “DROUGHT” loss. Neverless, the table gives information about the harmful events.

Finally, one can add up the property and economic loss to get an overall hazard rating. The following plot shows the top 15 causes.

totalEL<-with(storm.red,propdmg+cropdmg) #Total economic loss
x <-aggregate(totalEL,list(storm.red$EVTYPE),sum)
y<-x[x[,2]>0,]  # USe only those with damage
y<-y[order(y[,2],decreasing=TRUE),]  
colnames(y)<-c("EVTYPE","LOSS")

bas1<-factor(y$EVTYPE,levels=as.character(y$EVTYPE),ordered=TRUE)
y$EVTYPE1<-bas1
barchart(EVTYPE1~LOSS/10^9,data=y[1:15,],main="Monetary Losses (in Billions of US Dollars)",xlab="LOSS")

Clearly the type “FLOOD/FLASH FLOOD/RIVER FLOOD” is a major contributor for the monetary losses over the years.

Findings and some words of caution

There are some interesting facts that can be drawn from the analysis. One of them is that there is difference between the type of events that causes the most damage either if the focus is in the human loss or in the economical loss.

As it has been written in the preceding paragraphs, the results shown must be revised, given the fact that factor variables such as EVTYPE and STATE have many levels that are not in the usual “dictionary”. It has to be notice that we have not use spatial or temporal variables, that could give more information.

Final words for Coursera Evaluators

Dear evaluator: I apologize for the spelling errors you surely spotted while reading this work. English is not my first language and I didn’t have the time to run the text in a grammar/spelling corrector. I hope those have not make you wonder if this is written in english or not.

Reproducibility information

sessionInfo()

## R version 3.2.3 (2015-12-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 10586)
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] scales_0.3.0    pander_0.6.0    lattice_0.20-33 xtable_1.8-2   
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.2      digest_0.6.9     plyr_1.8.3       grid_3.2.3      
##  [5] formatR_1.2.1    magrittr_1.5     evaluate_0.8     stringi_1.0-1   
##  [9] rmarkdown_0.9.2  tools_3.2.3      stringr_1.0.0    munsell_0.4.2   
## [13] colorspace_1.2-6 htmltools_0.3    knitr_1.12.3

Readings

[1] IPCC, 2012: Summary for Policymakers. In: Managing the Risks of Extreme Events and Disasters to Advance Climate Change Adaptation [Field, C.B., V. Barros, T.F. Stocker, D. Qin, D.J. Dokken, K.L. Ebi, M.D. Mastrandrea, K.J. Mach, G.-K. Plattner, S.K. Allen, M. Tignor, and P.M. Midgley (eds.)]. A Special Report of Working Groups I and II of the Intergovernmental Panel on Climate Change. Cambridge University Press, Cambridge, UK, and New York, NY, USA, pp. 1-19. Downloaded from http://www.ipcc-wg2.gov/SREX/images/uploads/SREX-SPMbrochure_FINAL.pdf, Sep 21, 2016.

[2] NOAA, 2007, STORM DATA PREPARATION Downloaded from https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf, Sep 25, 2016