title: “Storm data analysis”
author: “I. Llatas”
date: “February 28, 2016”
output: html_document

A brief study of consequences of Storm Events in the USA

Abstract

In this work we analize the NOAA Storm Database to answer which meteorological events are the more dangerous, both in human and economical losses. We conclude that by far, tornados are the most harmful of all events measured in human lives (fatalites and injuries) followed by heat and flood events. Also, the latter events causes the greatest accumulated monetary loss when the property and crop damages are combined. These findings must be corroborated by doing a deep cleaning of the data.

Introduction

Severe weather events, such as tornados or extreme rainfall can cause losses in lifes and livelihoods when they occur. There is a growing body of evidence that this events will be even more extreme in the near future, due in part to the global climate change. This has prompted new research on how to deal with the risks [1],on the base that: To know is to be prepared. In this work it is examined the NOAA Storm Database to answer which types of events were the most harmful in the United States, during the time period from 1950 to Nov. 2011. The analysis will focus in four specific measurements of loss:

  • Human Health

    + Fatalities
    
    + Injuries
  • Economic Loss

    + Property Damage
    
    + Crop Damage

Data Preparation

The data file was downloaded on Feb 21, 2015 from the URL shown below. Beaware, it is a 46.9 MB file and it takes time to download and to read in R. Code bellow check if these steps have already been taken. Here is the code to read the data and save a working environment for easy access.

# Prepare R, set libraries to be used
library(xtable, quietly = TRUE, warn.conflicts = FALSE)
library(lattice, quietly = TRUE, warn.conflicts = FALSE)
library(pander, quietly = TRUE, warn.conflicts = FALSE)
library(scales, quietly = TRUE, warn.conflicts = FALSE)
# Check for data files and read/load in R

if (!file.exists("storm.csv.bz2")) {
    download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", 
        "storm.csv.bz2", mode = "wb")
    storm <- read.csv("storm.csv.bz2")
    save(storm, file = "storm.RData")
} else if (file.exists("storm.RData")) {
    load("storm.RData")
} else {
    stop("No data file in this folder, please check!")
}

The storm data frame has 37 variables on 902297 events. The columns relevant for the analysis are named: STATE,EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG and CROPDMGEXP.

After looking at the data, using the str function, one can see that the classification variable EVTYPE has 985 levels and the identification variable STATE has 72 abbreviation names. To address this, one must make some pre-processing of the data. One of the problems is the use of lowercase and upper case letters, that can be easily solved in R, but there are some compound event names that correspond to different meteorological that happened at the same time; for example there are 73 events that start with the word “[Ss]ummary”.To change these names to match the **NOAA* directives for event reporting [2], requires a lot of data cleansing. Since this is a homework for reproducible research, the data will be used as is. This would mean that the results are underestimating the reported total for a given event, and it should be investigated if necessary.

Another issue happens with the economic damage data. One could assume is in nominal US dollars, i.e they are not corrected for inflation, so it is not wise to simply sum the amounts to obtain a number for total loss. However, it escapes the scope of this work to change the monetary damage into **real* US dollars, so it won’t be done. But one thing that is needed is to combine the columns DMG and DMGEXP to get the proper order of magnitude for the monetary loss.

In the docummentation for this problem it is stated that DMGEXP should be read as:

  • [] Multiply by 1

  • [hH] Multiply by 100

  • [kK] Multiply by 1,000

  • [mM] Multiply by 1,000,000

  • [bB] Multiply by 1,000,000,000

It is worth noticing that the DMGEXP has characters that are not in the list shown above, and those records will be treated as if the multiplier for the losses were equal to one.

# Data in dataframe <storm>

use.col <- c("STATE","EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")
storm.red<- storm[,use.col]

storm.red$EVTYPE<-toupper(storm.red$EVTYPE)  #Use Uppercase characters
storm.red$PROPDMGEXP<-toupper(storm.red$PROPDMGEXP)
storm.red$CROPDMGEXP<-toupper(storm.red$CROPDMGEXP)

# Make numeric columms of dmg fix for magnitude
attach(storm.red)
propdmg<-PROPDMG
propdmg[PROPDMGEXP=="H"]<-propdmg[PROPDMGEXP=="H"]*100
propdmg[PROPDMGEXP=="K"]<-propdmg[PROPDMGEXP=="K"]*1000
propdmg[PROPDMGEXP=="M"]<-propdmg[PROPDMGEXP=="M"]*1000000
propdmg[PROPDMGEXP=="B"]<-propdmg[PROPDMGEXP=="B"]*1000000000

cropdmg<-CROPDMG
cropdmg[CROPDMGEXP=="H"]<-cropdmg[CROPDMGEXP=="H"]*100
cropdmg[CROPDMGEXP=="K"]<-cropdmg[CROPDMGEXP=="K"]*1000
cropdmg[CROPDMGEXP=="M"]<-cropdmg[CROPDMGEXP=="M"]*1000000
cropdmg[CROPDMGEXP=="B"]<-cropdmg[CROPDMGEXP=="B"]*1000000000

storm.red$propdmg <-propdmg
storm.red$cropdmg <-cropdmg
detach(2)

Results

Note that all of the results shown along with the text are the result of inline code, (surrounding code with justs one back ticks and the letter r), but the code is not shown in the knitted output.

Fatalities and Injuries

In total there are 6,974 events (0.77 %) that report one or more fatalities, for a total of 15145 deaths and 17604 events (1.95 %) that report one or more injuries, for a total of 140528. To find out which events are the most dangereous for humans one can look at how many fatalities or injuries are for each type of events. In the table below it is shown the top 15 events.

#fatalities
x1<-with(storm.red,aggregate(FATALITIES, list(EVTYPE), sum))
y1<-x1[x1[,2]>0,]  # USe only those with fatalities
y1<-y1[order(y1[,2],decreasing=TRUE),]  
#injuries
x2<-with(storm.red,aggregate(INJURIES, list(EVTYPE), sum))
y2<-x2[x2[,2]>0,]
y2<-y2[order(y2[,2],decreasing=TRUE),]

top.events<-cbind(y1[1:15,],y2[1:15,])
colnames(top.events)<-c("Hazard; Fatalities","Victims", "Hazard; Injuries","Victims")
rownames(top.events)<-NULL
pander(top.events, caption="Top events affecting humans in the USA, period 1950-2011.  Left correspond to fatality victims, right to injuries")
Top events affecting humans in the USA, period 1950-2011. Left correspond to fatality victims, right to injuries
Hazard; Fatalities Victims Hazard; Injuries Victims
TORNADO 5633 TORNADO 91346
EXCESSIVE HEAT 1903 TSTM WIND 6957
FLASH FLOOD 978 FLOOD 6789
HEAT 937 EXCESSIVE HEAT 6525
LIGHTNING 816 LIGHTNING 5230
TSTM WIND 504 HEAT 2100
FLOOD 470 ICE STORM 1975
RIP CURRENT 368 FLASH FLOOD 1777
HIGH WIND 248 THUNDERSTORM WIND 1488
AVALANCHE 224 HAIL 1361
WINTER STORM 206 WINTER STORM 1321
RIP CURRENTS 204 HURRICANE/TYPHOON 1275
HEAT WAVE 172 HIGH WIND 1137
EXTREME COLD 162 HEAVY SNOW 1021
THUNDERSTORM WIND 133 WILDFIRE 911

Notice that some of the rows of the table correspond to the same type of weather condition, for example “EXCESSIVE HEAT” and “HEAT”. This must be changed for a better understanding of the hazards.

As can be seen in the table above, tornados are the main cause of injuries and fatalities in USA. Notice that the second cause of mortality is excesive heat and it can be seen that more than half of deaths occured in one event, located in Chicago IL.

To see which states are most affected by the top conditions we looked at the following plot, where the number of fatalities caused by the five top events are plotted by states.

Notice that for the plot, only the records with a propper abbreviate state name are used.

top.events.death<-top.events[1:5,1]
bystate<-with(storm.red,aggregate(FATALITIES,list(STATE,EVTYPE),sum))
colnames(bystate)<-c("STATE","EVTYPE","FATALITIES")

bas<-bystate[(bystate$STATE %in% state.abb)&(bystate$EVTYPE %in% top.events.death),]
barchart(FATALITIES~EVTYPE|STATE,data=bas,layout=c(5,4),scales=list(x=list(rot=45)))

Notice that some states are “safer” than others.

Economic losses

We proceed with the economic losses by looking at the top events.

# property
x1 <- with(storm.red, aggregate(propdmg, list(EVTYPE), sum))
y1 <- x1[x1[, 2] > 0, ]  # USe only those with damage
y1 <- y1[order(y1[, 2], decreasing = TRUE), ]
# crops
x2 <- with(storm.red, aggregate(cropdmg, list(EVTYPE), sum))
y2 <- x2[x2[, 2] > 0, ]
y2 <- y2[order(y2[, 2], decreasing = TRUE), ]

top.events <- cbind(y1[1:15, 1], comma(round(y1[1:15, 2]/1000, 0)), y2[1:15, 
    1], comma(round(y2[1:15, 2]/1000, 0)))

colnames(top.events) <- c("Hazard", "Property", "Hazard", "Crops")
rownames(top.events) <- NULL

pander(top.events, caption = "Top events affecting Properties and Crops in the USA, period 1950-2011.  Left correspond to Property losses, right to Crops. Nominal Currency in Thousands of US Dollars")
Top events affecting Properties and Crops in the USA, period 1950-2011. Left correspond to Property losses, right to Crops. Nominal Currency in Thousands of US Dollars
Hazard Property Hazard Crops
FLOOD 144,657,710 DROUGHT 13,972,566
HURRICANE/TYPHOON 69,305,840 FLOOD 5,661,968
TORNADO 56,937,161 RIVER FLOOD 5,029,459
STORM SURGE 43,323,536 ICE STORM 5,022,114
FLASH FLOOD 16,140,812 HAIL 3,025,954
HAIL 15,732,268 HURRICANE 2,741,910
HURRICANE 11,868,319 HURRICANE/TYPHOON 2,607,873
TROPICAL STORM 7,703,891 FLASH FLOOD 1,421,317
WINTER STORM 6,688,497 EXTREME COLD 1,312,973
HIGH WIND 5,270,046 FROST/FREEZE 1,094,186
RIVER FLOOD 5,118,946 HEAVY RAIN 733,400
WILDFIRE 4,765,114 TROPICAL STORM 678,346
STORM SURGE/TIDE 4,641,188 HIGH WIND 638,571
TSTM WIND 4,484,958 TSTM WIND 554,007
ICE STORM 3,944,928 EXCESSIVE HEAT 492,402

Important Note: As can be seen in the table above, there is needed for data cleansing, to be able to rank the events properly. For example, for crop losses, the combined loss for “FLOOD”, “RIVER FLOOD” and “FLASH FLOOD” is almost as big as the “DROUGHT” loss. Neverless, the table gives information about the harmful events.

Finally, one can add up the property and economic loss to get an overall hazard rating. The following plot shows the top 15 causes.

totalEL<-with(storm.red,propdmg+cropdmg) #Total economic loss
x <-aggregate(totalEL,list(storm.red$EVTYPE),sum)
y<-x[x[,2]>0,]  # USe only those with damage
y<-y[order(y[,2],decreasing=TRUE),]  
colnames(y)<-c("EVTYPE","LOSS")

bas1<-factor(y$EVTYPE,levels=as.character(y$EVTYPE),ordered=TRUE)
y$EVTYPE1<-bas1
barchart(EVTYPE1~LOSS/10^9,data=y[1:15,],main="Monetary Losses (in Billions of US Dollars)",xlab="LOSS")

Clearly the type “FLOOD/FLASH FLOOD/RIVER FLOOD” is a major contributor for the monetary losses over the years.

Findings and some words of caution

There are some interesting facts that can be drawn from the analysis. One of them is that there is difference between the type of events that causes the most damage either if the focus is in the human loss or in the economical loss.

As it has been written in the preceding paragraphs, the results shown must be revised, given the fact that factor variables such as EVTYPE and STATE have many levels that are not in the usual “dictionary”. It has to be notice that we have not use spatial or temporal variables, that could give more information.

Final words for Coursera Evaluators

Dear evaluator: I apologize for the spelling errors you surely spotted while reading this work. English is not my first language and I didn’t have the time to run the text in a grammar/spelling corrector. I hope those have not make you wonder if this is written in english or not.

Reproducibility information

sessionInfo()
## R version 3.2.3 (2015-12-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 10586)
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] scales_0.3.0    pander_0.6.0    lattice_0.20-33 xtable_1.8-2   
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.2      digest_0.6.9     plyr_1.8.3       grid_3.2.3      
##  [5] formatR_1.2.1    magrittr_1.5     evaluate_0.8     stringi_1.0-1   
##  [9] rmarkdown_0.9.2  tools_3.2.3      stringr_1.0.0    munsell_0.4.2   
## [13] colorspace_1.2-6 htmltools_0.3    knitr_1.12.3

Readings

[1] IPCC, 2012: Summary for Policymakers. In: Managing the Risks of Extreme Events and Disasters to Advance Climate Change Adaptation [Field, C.B., V. Barros, T.F. Stocker, D. Qin, D.J. Dokken, K.L. Ebi, M.D. Mastrandrea, K.J. Mach, G.-K. Plattner, S.K. Allen, M. Tignor, and P.M. Midgley (eds.)]. A Special Report of Working Groups I and II of the Intergovernmental Panel on Climate Change. Cambridge University Press, Cambridge, UK, and New York, NY, USA, pp. 1-19. Downloaded from http://www.ipcc-wg2.gov/SREX/images/uploads/SREX-SPMbrochure_FINAL.pdf, Sep 21, 2016.

[2] NOAA, 2007, STORM DATA PREPARATION Downloaded from https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf, Sep 25, 2016