An Analysis of the Consequences of Adverse Weather Events on Population Health and Economy Across the United States


Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and crop and property damage. The purpose of this analysis is to explore and identify consequences of the adverse weather events that have caused (1) the greatest number of fatalities and injuries to the US population (population health), (2) as well as inflicted the maximum damage to the US economy (damage on properties and crops). We are using the U.S. National Oceanic and Atmospheric Administration’s (NOAA) Storm Database as the input for this analysis. Our analysis shows that between the years 1950 and 2011 in the US:

  1. Tornadoes are most harmful with respect to population health (tornado is the weather event associated with highest number of lives lost (5,636) as well as causing highest number of injuries (91,407)).
  2. Floods have the greatest overall economic consequences (180.1 Billion USD), followed by hurricane (90.76 Billion USD) and tornado (57.42 Billion USD).
  3. Floods have the greatest economic consequences with respect to property damage (167.7 Billion USD) while draughts have the greatest economic consequences with respect to crops damage (13.97 Billion USD).

1. Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

System's Settings

In order to ensure that all the strings returned by R are in English, we set appropriate locale for language (US English).

## [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"

To ensure that all of our code and results are shown in an analysis document we set appropriate global options for knitr.

### Set Global Options for knitr
opts_chunk$set(echo = TRUE, results='markup' )

Load Required Packages

We load all necessary packages for our analysis.

require(utils)
require(R.utils)
## Loading required package: R.utils
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.6.1 (2014-01-04) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.18.0 (2014-02-22) successfully loaded. See ?R.oo for help.
## 
## Attaching package: 'R.oo'
## 
## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods
## 
## The following objects are masked from 'package:base':
## 
##     attach, detach, gc, load, save
## 
## R.utils v1.32.4 (2014-05-14) successfully loaded. See ?R.utils for help.
## 
## Attaching package: 'R.utils'
## 
## The following object is masked from 'package:utils':
## 
##     timestamp
## 
## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, inherits, isOpen, parse, warnings
require(data.table)
## Loading required package: data.table
require(ggplot2)
## Loading required package: ggplot2
require(grid)
## Loading required package: grid
require(xtable)
## Loading required package: xtable

2. Data Processing

2.1 The Raw Data

Download and unzip the U.S. National Oceanic and Atmospheric Administration’s (NOAA) Storm Database.

# Download and unzip input file if it does not exist in current directory

filename <- "repdata_data_StormData.csv"
filename.zip <- paste0(filename, ".bz2")
url<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"


if (file.exists(filename) == FALSE) {
    if (file.exists(filename.zip) == FALSE) {
        download.file(url, destfile = filename.zip, method = "curl",
                      quiet = TRUE)    
    }

    bunzip2(filename.zip)
}

First, we define the appropriate classes and then load the input file into data frame.

cclasses <- c("numeric", "character", "character", "character", "numeric", "character", 
              "character", "character", "numeric", "character",  "character",
              "character",  "character", "numeric", "character", "numeric", "character", 
              "character", "numeric", "numeric", "character", "numeric", "numeric",
              "numeric", "numeric", "character", "numeric", "character", "character", 
              "character", "character","numeric", "numeric","numeric", "numeric",
            "character", "numeric")

stormData <- read.table( filename, header=TRUE, sep=",", colClasses=cclasses,
               stringsAsFactors=FALSE, comment.char="")

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years are to be considered more complete. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The following list synthesizes the attributes of data set.

names(stormData)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
dim(stormData)
## [1] 902297     37

As we can see the data set consists of 902297 observations with 37 attributes.

2.2 Subset the Data

For our analysis two questions are of interest:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

Since, the ultimate goal of this analysis is to address the impact of general types of events on population health and economic consequences, we subset the raw data with necessary variables for computational purposes.

To answer the first question we will be using the two attributes:

To answer the second question we will be using the four attributes:

data <- stormData[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]                 

To make the data tidy we transform all the variable names in lower case.

colnames(data) <- tolower(names(data)) 
sapply(data, function(missing) any(is.na(missing))) 
##     evtype fatalities   injuries    propdmg propdmgexp    cropdmg 
##      FALSE      FALSE      FALSE      FALSE      FALSE      FALSE 
## cropdmgexp 
##      FALSE

As we can see, there are no missing values in our data, however, as discussed below, we have to do some data cleaning and transformations for event type, and health and economic attributes.

2.3 Processing the Data for Event Type

data$evtype <- as.factor(data$evtype)
eventTypes <- levels(data$evtype)
length(eventTypes)
## [1] 985

There is 985 levels in the evtype attribute. Take note that we are not showing the evtype attribute levels, because display would be 985 lines long and would be completely opaque. However, by analyzing the event types, it is clear that we can break them down in fewer categories, since many of them are expressed with multiple names (e.g. “Urban flood”, “Urban Flood”, “Urban Flooding”, “URBAN FLOODING”, “URBAN FLOOD” , …) and there are some levels which are month summaries (e.g., “Summary August 11”, “Summary of April 27”, …) and irrelevant characters (e.g., “?”). Therefore some cleaning is necessary.

## Remove observations with the `evtype` values "Summary" and "?".
data <- data[grep("SUMMARY", data$evtype, invert=TRUE),]
data <- data[grep("Summary", data$evtype, invert=TRUE),]
data <- data[grep("\\?", data$evtype, invert=TRUE),]

We then use grep() to trim down the hundreds of different values input into the evtype column.

data$evtype <- tolower(data$evtype)

data$evtype[grep("fog|vog", data$evtype)] <- "Dense Fog"
data$evtype[grep("dense smoke|smoke", data$evtype)] <- "Dense Smoke"
data$evtype[grep("heavy snow|snow", data$evtype)] <- "Heavy Snow"
data$evtype[grep("high surf|blow-out tide|surf|swells|high tides|high seas|high waves|heavy seas|rough seas|rogue wave", data$evtype)] <- "High Surf"
data$evtype[grep("drought|dryness|driest month|dry conditions|record dryness|excessively dry|dry weather|dry spell|mild and dry pattern|mild/dry pattern|dry pattern|record dry month|hot pattern|dry hot weather|below normal precipitation|dry|unseasonably dry", data$evtype)] <- "Drought"
data$evtype[grep("astronomical", data$evtype)] <- "Astronomical Low/High Tide"
data$evtype[grep("avalanche|avalance", data$evtype)] <- "Avalanche"
data$evtype[grep("blizzard", data$evtype)] <- "Blizzard"
data$evtype[grep("urban|fld|small stream and|small stream", data$evtype)] <- "Urban Flooding"
data$evtype[grep("coastal.|cstl", data$evtype)] <- "Coastal Flood"
data$evtype[grep("debris flow|lands[ .]?lides|land[ .]?slide|mud[ .]?slides|mud[ .]?slide|landslump|rock slide", data$evtype)] <- "Debris Flow"
data$evtype[grep("dust devil|dust storm|dust", data$evtype)] <- "Dust Storm"
data$evtype[grep("excessive heat|unusually warm|excessive precipitation|very warm|prolong warmth|hot weather|record warm|unusual/record warmth|unusual warmth|abnormal warmth|unseasonably hot|hot spell|record warm temps\\.|hot and dry|hot pattern|hot/dry pattern|warm dry conditions|record high temperature[s]?|record warmth|extreme heat|heat|unseasonably warm|warm weather", data$evtype)] <- "Excessive Heat"
data$evtype[grep("[ .]?flash flood[ing]?|flash floooding|unseasonably wet|flood/flash|flood|high water|rapidly rising water", data$evtype)] <- "Flash Flood"
data$evtype[grep("frost/freeze|freeze|frost", data$evtype)] <- "Frost/Freeze"
data$evtype[grep("funnel cloud|funnel", data$evtype)] <- "Funnel Cloud"
data$evtype[grep("hail", data$evtype)] <- "Hail"
data$evtype[grep("heavy rain|excessive wetness|abnormally wet|monthly precipitation|record precipitation|extremely wet|remnants of floyd|early rain|rain \\(heavy\\)|prolonged rain|metro storm|wet month|rain damage|wet year|torrential rain|wet weather|excessive rain|rain$|normal precipitation|wall cloud|hvy rain|heavy precip[ai]tation|record rainfall|rainstorm|unseasonal rain|heavy shower|torrential rainfall|excessive rainfall", data$evtype)] <- "Heavy Rain"
data$evtype[grep("high wind|high$", data$evtype)] <- "High Wind"
data$evtype[grep("hurricane|typhoon", data$evtype)] <- "Hurricane"
data$evtype[grep("ice storm", data$evtype)] <- "Ice Storm"
data$evtype[grep("lake-effect snow", data$evtype)] <- "Lake-Effect Snow"
data$evtype[grep("lakeshore flood", data$evtype)] <- "Lakeshore Flood"
data$evtype[grep("lightning|lighting|ligntning", data$evtype)] <- "Lightning"
data$evtype[grep("rip current[s]?", data$evtype)] <- "Rip Current"
data$evtype[grep("seiche", data$evtype)] <- "Seiche"
data$evtype[grep("sleet|mix[ed]?$|freezing drizzle|freezing rain|mixed percipitation|mixed precip|freezing spray", data$evtype)] <- "Sleet"
data$evtype[grep("storm surge", data$evtype)] <- "Storm Surge/Tide"
data$evtype[grep("strong wind|wnd$|gusty wind|wind|winds", data$evtype)] <- "Strong Wind"
data$evtype[grep("thunderstorm wind[s]?|thundeer.|^([ .])?tstm|thunderstorm[s]?|thunderstrom|thundertorm|thuderstorm|thunderestorm", data$evtype)] <- "Thunderstorm Wind"
data$evtype[grep("tornado|gustnado|torndao", data$evtype)] <- "Tornado"
data$evtype[grep("microburst|micoburst", data$evtype)] <- "Microburst"
data$evtype[grep("ice|glaze|icy", data$evtype)] <- "Ice Storm"
data$evtype[grep("erosion|erosin", data$evtype)] <- "Beach/Coastal Erosion"
data$evtype[grep("extreme cold|wind[ .]?chill|record low|cold|cool|hyperthermia/exposure|unseasonal low temp|low temperature", data$evtype)] <- "Cold/Wind Chill"
data$evtype[grep("downburst", data$evtype)] <- "Downburst"
data$evtype[grep("dam break|dam failure", data$evtype)] <- "Dam Break"
data$evtype[grep("landspout", data$evtype)] <- "Landspout"
data$evtype[grep("marine accident|marine mishap", data$evtype)] <- "Marine Accident"
data$evtype[grep("other|apache|\\?|other/unknown|summary|southeast|monthly temperature|no severe weather|red flag criteria|northern lights|severe turbulence|record temperatures|excessive$|mild pattern|temperature record|record temperature", data$evtype)] <- "Other/Unknown"
data$evtype[grep("drowning", data$evtype)] <- "Drowning"
data$evtype[grep("none", data$evtype)] <- "Other/Unknown"
data$evtype[grep("tropical depression", data$evtype)] <- "Tropical Depression"
data$evtype[grep("tropical storm", data$evtype)] <- "Tropical Storm"
data$evtype[grep("tsunami", data$evtype)] <- "Tsunami"
data$evtype[grep("tstm", data$evtype)] <- "Thunderstorm Winds"
data$evtype[grep("volcanic ash|volcanic eruption", data$evtype)] <- "Volcanic Ash"
data$evtype[grep("waterspout|water spout|wayterspout", data$evtype)] <- "Waterspout"
data$evtype[grep("wild[ .]?fire[s]|forest fire[s]|fire[s]?$|red flag fire wx", data$evtype)] <- "Wildfire"
data$evtype[grep("winter storm", data$evtype)] <- "Winter Storm"
data$evtype[grep("winter weather", data$evtype)] <- "Winter Weather"

In this way hundreds of different values for the evtype attribute (i.e. 985 levels) have been reduce to 47 values.

unique(data$evtype)
##  [1] "Tornado"                    "Strong Wind"               
##  [3] "Hail"                       "Heavy Rain"                
##  [5] "Heavy Snow"                 "Flash Flood"               
##  [7] "Winter Storm"               "High Wind"                 
##  [9] "Cold/Wind Chill"            "Hurricane"                 
## [11] "Lightning"                  "Dense Fog"                 
## [13] "Rip Current"                "Thunderstorm Wind"         
## [15] "Funnel Cloud"               "Excessive Heat"            
## [17] "Waterspout"                 "Blizzard"                  
## [19] "Frost/Freeze"               "Coastal Flood"             
## [21] "High Surf"                  "Ice Storm"                 
## [23] "Avalanche"                  "Marine Accident"           
## [25] "Other/Unknown"              "Dust Storm"                
## [27] "Sleet"                      "Urban Flooding"            
## [29] "Wildfire"                   "Debris Flow"               
## [31] "Drought"                    "Microburst"                
## [33] "Downburst"                  "Winter Weather"            
## [35] "Storm Surge/Tide"           "Tropical Storm"            
## [37] "Dam Break"                  "Beach/Coastal Erosion"     
## [39] "monthly rainfall"           "Volcanic Ash"              
## [41] "Seiche"                     "Tropical Depression"       
## [43] "Landspout"                  "Dense Smoke"               
## [45] "Astronomical Low/High Tide" "Drowning"                  
## [47] "Tsunami"

So, after data cleaning and evtype attribute filtering, our data set consists of of 902220 observations with 7 attributes and 47 different values for the evtype attribute:

names(data)
## [1] "evtype"     "fatalities" "injuries"   "propdmg"    "propdmgexp"
## [6] "cropdmg"    "cropdmgexp"
dim(data)
## [1] 902220      7
length(unique(data$evtype))
## [1] 47

2.4 Processing Data for Economic Impact Analysis

First we inspect economic related columns for missing data.

sum(is.na(data$propdmg))
## [1] 0
sum(is.na(data$cropdmg))
## [1] 0

As we can see we do not have any missing values within propdmg and cropdmg columns.

The attributes propdmg and cropdmg represent amount of damage in USD. The numbers expressed have to be scaled according to the the units of expression represented in propdmgexp and cropdmgexp. Therefore, to calculate the cost we have to multiply propdmg * 10propdmgexp and cropdmg * 10cropdmgexp .

unique(data$propdmgexp)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
unique(data$cropdmgexp)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"

As we can see propdmgexp and cropdmgexp have values in numbers and letters (e.g., “K” or “k” = 1000, 5 = 105, etc). There are also other values (i.e., “”, “?”, “+” and “-”). In order to convert these characters to numbers, we write a function that takes a character from propdmgexp and cropdmgexp and returns a number which is 10 to the power of character.

# Function `convert.exp` takes a character c (exponent) and returns a number
# which is 10 to the power c. Valid values of c are h or H (hundred), 
# k or K (thousand), m or M (million), b or B (billion), and numbers 0-9. For any other values
# of c a value of 0 is returned.

convert.exp <- function(c) { 
    exp <- switch( EXPR = tolower(c),
        "0" = 0, "1" = 1, "2" = 2, "3" = 3,
        "4" = 4, "5" = 5, "6" = 6, "7" = 7, "8" = 8, "9" = 9,
        "h" = 2, "k" = 3, "m" = 6, "b" = 9, 
        0 )

    return(10 ^ exp)
}

Now we can calculate the cost for property damage and crop damage.

propdmgCost = (data$propdmg * vapply(data$propdmgexp, convert.exp, 1.0))
cropdmgCost = (data$cropdmg * vapply(data$cropdmgexp, convert.exp, 1.0))

summary(propdmgCost)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.00e+00 0.00e+00 0.00e+00 4.75e+05 5.00e+02 1.15e+11
summary(cropdmgCost)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.00e+00 0.00e+00 0.00e+00 5.44e+04 0.00e+00 5.00e+09

We create new Econ.costData data frame with columns eventType, propdmgCost and cropdmgCost, which we will use in the analysis of the impact of adverse events on economy.

Econ.costData <- data.frame("eventType"=data$evtype, "propdmgCost" =propdmgCost, "cropdmgCost" = cropdmgCost)
summary(Econ.costData)
##        eventType       propdmgCost        cropdmgCost      
##  Strong Wind:341768   Min.   :0.00e+00   Min.   :0.00e+00  
##  Hail       :290399   1st Qu.:0.00e+00   1st Qu.:0.00e+00  
##  Flash Flood: 81463   Median :0.00e+00   Median :0.00e+00  
##  Tornado    : 60707   Mean   :4.75e+05   Mean   :5.44e+04  
##  High Wind  : 21927   3rd Qu.:5.00e+02   3rd Qu.:0.00e+00  
##  Heavy Snow : 17705   Max.   :1.15e+11   Max.   :5.00e+09  
##  (Other)    : 88251

Since values less than or equal to zero are not relevant, we form a subset in which a number of property damage or crop damage is greater than zero.

Econ.costData <- subset(Econ.costData, (propdmgCost > 0 | cropdmgCost > 0))
summary(Econ.costData)
##        eventType       propdmgCost        cropdmgCost   
##  Strong Wind:120750   Min.   :0.00e+00   Min.   :0e+00  
##  Tornado    : 39392   1st Qu.:2.50e+03   1st Qu.:0e+00  
##  Flash Flood: 31568   Median :1.00e+04   Median :0e+00  
##  Hail       : 26498   Mean   :1.75e+06   Mean   :2e+05  
##  Lightning  : 10373   3rd Qu.:4.00e+04   3rd Qu.:0e+00  
##  High Wind  :  5996   Max.   :1.15e+11   Max.   :5e+09  
##  (Other)    : 10453
dim(Econ.costData)
## [1] 245030      3

So our final Econ.costData data frame consists of of 245030 observations with 3 attributes.

2.5 Processing Data for Health Impact Analysis

First we inspect health related columns for missing data.

sum(is.na(data$fatalities))
## [1] 0
sum(is.na(data$injuries))
## [1] 0

As we can see we do not have any missing values within fatalities and injuries columns. We create new Health.Data data frame with columns eventType, fatalities and injuries, which we will use in the analysis of the impact of adverse events on health. Since values less than or equal to zero are not relevant, we form a subset in which a number of fatalities or injuries is greater than zero.

Health.Data <- data.frame("eventType"=data$evtype, "fatalities" = data$fatalities, "injuries" = data$injuries)
Health.Data <- subset(Health.Data, fatalities > 0 | injuries >0)

summary(Health.Data)
##           eventType      fatalities       injuries     
##  Tornado       :7934   Min.   :  0.0   Min.   :   0.0  
##  Strong Wind   :4487   1st Qu.:  0.0   1st Qu.:   1.0  
##  Lightning     :3308   Median :  0.0   Median :   1.0  
##  Flash Flood   :1406   Mean   :  0.7   Mean   :   6.4  
##  Excessive Heat: 943   3rd Qu.:  1.0   3rd Qu.:   3.0  
##  Rip Current   : 637   Max.   :583.0   Max.   :1700.0  
##  (Other)       :3214
dim(Health.Data)
## [1] 21929     3

So our final Health.Data data frame consists of of 21929 observations with 3 attributes.

Now we are prepared for analysis.

3. Results

3.1 Public Health Harm Analysis

In this section the question of interest is:

1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

We first create a summary data table with columns deaths and injuries.

healthSummary <- as.data.table(Health.Data)
setkey(healthSummary, eventType)

healthSummary <- healthSummary[ , list(deaths = sum(fatalities),
                                       injuries= sum(injuries),
                                       number = .N),
                               keyby = eventType ]

The table below shows the 5 most common causes of death.

# Get top 5 causes of death.

deaths <- healthSummary[head(order(-deaths), nrow(healthSummary)), 
            list(Event = eventType, Deaths = deaths)]

deaths[6,] <- list(Event = "Other", Deaths = sum(deaths[6:nrow(deaths),Deaths]))

deaths <- head(deaths,6)



# Print as a HTML table.

print(xtable(deaths, caption = "Table 1: Top 5 causes of death"), type="html", caption.placement="top")
Table 1: Top 5 causes of death
Event Deaths
1 Tornado 5636.00
2 Excessive Heat 3143.00
3 Flash Flood 1522.00
4 Strong Wind 1119.00
5 Lightning 817.00
6 Other 2908.00

The table below shows the 5 most common causes of injuries.

# Get top 5 causes of injuries.

injuries <- healthSummary[head(order(-injuries), nrow(healthSummary)), 
            list(Event = eventType, Injuries = injuries)]

injuries[6,] <- list(Event = "Other", Injuries = sum(injuries[6:nrow(injuries),Injuries]))

injuries <- head(injuries,6)

# Print as a HTML table.

print(xtable(injuries, caption = "Table 2: Top 5 causes of injuries"), type="html", caption.placement="top")
Table 2: Top 5 causes of injuries
Event Injuries
1 Tornado 91407.00
2 Strong Wind 9878.00
3 Excessive Heat 9228.00
4 Flash Flood 8597.00
5 Lightning 5232.00
6 Other 16186.00

The public health harm by weather event types is shown below in Figure 1.

# Deaths subplot
pDeaths <- qplot(data=deaths, x= reorder(Event, -Deaths), y= Deaths,
                 xlab = "",
                 ylab="Number of fatalities",
                 main="Five most common causes of deaths")
pDeaths <- pDeaths + geom_histogram(stat="identity") + aes(fill=Event)
pDeaths <- pDeaths + theme(axis.text.x = element_text(angle = 20, hjust = 1))

#Injuries subplot
pInjuries <- qplot(data=injuries,x= reorder(Event, -Injuries), y= Injuries,
                 xlab = "",
                 ylab="Number of injuries",
                 main="Five most common causes of injuries")
pInjuries <- pInjuries + geom_histogram(stat="identity") + aes(fill=Event )
pInjuries <- pInjuries + theme(axis.text.x = element_text(angle = 20, hjust = 1))

#Print the two subplots

grid.newpage()
pushViewport(viewport(layout = grid.layout(2, 1)))
print(pDeaths, vp = viewport(layout.pos.row= 1, layout.pos.col = 1))
print(pInjuries, vp = viewport(layout.pos.row= 2, layout.pos.col = 1))

plot of chunk unnamed-chunk-24

Figure 1 This figure shows the 5 most severe weather events in the US during the documented time interval. Most fatalities (top panel) had been caused by tornados, excessive heath and to some degree by floods. The injuries (bottom panel), by far, had been caused by tornados. To a minor degree, strong wind and thunderstorms can also be made responsible for many deaths between the years 1950 and 2011. Note that Other bar represents the sum of all the remaining events and is added as the sixth bar. Also note, that all numbers are cumulative for observed period.

The number of deaths caused by tornado is 5636 (that is 37.2 % of total deaths), and the number of injuries caused by tornado is 91407 (that is 65 % of total crops damage).

# The number of deaths caused by tornado
deaths[1,Deaths]
## [1] 5636
# ... % of total number of deaths
round(deaths[1,Deaths]/sum(deaths[,Deaths])*100,1)
## [1] 37.2
# The number of injuries caused by tornado
injuries[1,Injuries]
## [1] 91407
# ... % of total crops damage
round(injuries[1,Injuries]/sum(injuries[,Injuries])*100,1)
## [1] 65

As it can be seen in Table 1, Table 2 and Figure 1, tornado is the most harmful weather event in terms of both injuries and fatalities, and thus in total harm as well between the years 1950 and 2011 in the US.

3.2 Economic Consequences Analysis

In this section the question of interest is:

2. Across the United States, which types of events have the greatest economic consequences?

We first create a summary data table with columns deaths and injuries.

economicsSummary <- as.data.table(Econ.costData)
setkey(economicsSummary, eventType)

economicsSummary <- economicsSummary[ , list(propdmgCost = sum(propdmgCost),
                                       cropdmgCost= sum(cropdmgCost),
                                       number = .N),
                               keyby = eventType ]

The table below shows the 5 financially most severe weather events in relation to property damage.

# Get the 5 financially most severe weather events in relation to  property damage.

propDamage <- economicsSummary[head(order(-propdmgCost), nrow(economicsSummary)), 
            list(Event = eventType, Damage = round(propdmgCost/(10^9),2))]



propDamage[6,] <- list(Event = "Other", Damage = sum(propDamage[6:nrow(propDamage),Damage]))

propDamage <- head(propDamage,6)



# Print as a HTML table.

print(xtable(propDamage, caption = "Table 3: The 5 financially most severe weather events in relation to property damage (cost in Billions USD)"), type="html", caption.placement="top")
Table 3: The 5 financially most severe weather events in relation to property damage (cost in Billions USD)
Event Damage
1 Flash Flood 167.73
2 Hurricane 85.26
3 Tornado 57.00
4 Storm Surge/Tide 47.96
5 Hail 17.62
6 Other 52.64

The table below shows the 5 financially most severe weather events in relation to crop damage.

# Get the 5 financially most severe weather events in relation to crop damage.

cropDamage <- economicsSummary[head(order(-cropdmgCost), nrow(economicsSummary)), 
            list(Event = eventType, Damage = round(cropdmgCost/(10^9),2))]

cropDamage[6,] <- list(Event = "Other", Damage = round((sum(cropDamage[6:nrow(cropDamage),Damage])),2))

cropDamage <- head(cropDamage,6)
# Print as a HTML table.

print(xtable(cropDamage, caption = "Table 4: The 5 financially most severe weather events in relation to crop damage (cost in Billions USD)"), type="html", caption.placement="top")
Table 4: The 5 financially most severe weather events in relation to crop damage (cost in Billions USD)
Event Damage
1 Drought 13.97
2 Flash Flood 12.38
3 Hurricane 5.51
4 Ice Storm 5.02
5 Hail 3.11
6 Other 9.09

The property and crops damage by weather event types is shown below in Figure 2.

# Property damage subplot
pProperty <- qplot(data=propDamage, x= reorder(Event, -Damage), y= Damage,
                 xlab = "",
                 ylab="Cost [in Billion USD]",
                 main="The 5 financially most severe weather events in relation to propery damage")
pProperty <- pProperty + geom_histogram(stat="identity") + aes(fill=Event)
pProperty <- pProperty + theme(axis.text.x = element_text(angle = 20, hjust = 1))

#Crops damage subplot
pCrop <- qplot(data=cropDamage,x= reorder(Event, -Damage), y= Damage,
                 xlab = "",
                 ylab="Cost [in Billion USD]",
                 main="The 5 financially most severe weather events in relation to crop damage")
pCrop <- pCrop + geom_histogram(stat="identity") + aes(fill=Event)
pCrop <- pCrop + theme(axis.text.x = element_text(angle = 20, hjust = 1))

#Print the two subplots

grid.newpage()
pushViewport(viewport(layout = grid.layout(2, 1)))
print(pProperty, vp = viewport(layout.pos.row= 1, layout.pos.col = 1))
print(pCrop, vp = viewport(layout.pos.row= 2, layout.pos.col = 1))

plot of chunk unnamed-chunk-29

Figure 2 This figure shows the average property damage (top panel) and the average crops damage (bottom panel) of the 5 financially most severe weather events in the US. Floods have the greatest economic consequences regarding properties (top panel) while extreme temperatures (draughts) have the greatest economic consequences regarding crops between the years 1950 and 2011. Note that Other bar represents the sum of all the remaining events and is added as the sixth bar. Also note, that all numbers are cumulative for observed period.

The cost of property damage caused by floods is 167.73 Billions USD (that is 39.2 % of total property damage), and the cost of crops damage caused by droughts is 13.97 Billions USD (that is 28.5 % of total crops damage).

# The cost of property damage caused by floods (in Billions USD)
propDamage[1,Damage]
## [1] 167.7
# ... % of total property damage
round(propDamage[1,Damage]/sum(propDamage[,Damage])*100,1)
## [1] 39.2
# The cost of crops damage caused by floods (in Billions USD)
cropDamage[1,Damage]
## [1] 13.97
# ... % of total crops damage
round(cropDamage[1,Damage]/sum(cropDamage[,Damage])*100,1)
## [1] 28.5

As it can be seen in Table 3, Table 4 and Figure 2, floods have the greatest economic consequences regarding properties while extreme temperatures (draughts) have the greatest economic consequences regarding crops between the years 1950 and 2011 in the US.

Overall economic consequences

The table below shows the 5 events that have the greatest overall economic consequences.

# Get the 5 financially most severe weather events in relation to overall economic consequences.

econDamage <- economicsSummary[head(order(-cropdmgCost), nrow(economicsSummary)), 
            list(Event = eventType, Damage = round((propdmgCost+cropdmgCost)/(10^9),2))]
econDamage <- econDamage[order(-Damage),]
econDamage[6,] <- list(Event = "Other", Damage = round((sum(econDamage[6:nrow(econDamage),Damage])),2))
econDamage <- head(econDamage,6)

# Check if total economic damage is sum of property damage and crops damage
round(sum(econDamage[,Damage]),1) == round(sum(propDamage[,Damage]) + sum(cropDamage[,Damage]), 1)

[1] TRUE

# Print as a HTML table.

print(xtable(econDamage, caption = "Table 5: The 5 most severe weather events that have the greatest overall economic consequences (cost in Billions USD)"), type="html", caption.placement="top")
Table 5: The 5 most severe weather events that have the greatest overall economic consequences (cost in Billions USD)
Event Damage
1 Flash Flood 180.11
2 Hurricane 90.76
3 Tornado 57.42
4 Storm Surge/Tide 47.97
5 Hail 20.74
6 Other 80.34

The overall economic damage by weather event types is shown below in Figure 3.

# Overall economic damage plot
pEconomic <- qplot(data=econDamage, x= reorder(Event, -Damage), y= Damage,
                 xlab = "",
                 ylab="Cost [in Billion USD]",
                 main="The 5 financially most severe weather events 
                 that have the greatest overall economic consequences")
pEconomic <- pEconomic + geom_histogram(stat="identity") + aes(fill=Event)
pEconomic <- pEconomic + theme(axis.text.x = element_text(angle = 20, hjust = 1))
print(pEconomic)

plot of chunk unnamed-chunk-32

Figure 3 This figure shows the average overall economic damage of the 5 financially most severe weather events in the US. Floods have the greatest economic consequences, followed by hurricane and tornado. All other events together have smaller economic consequences than floods and also smaller economic consequences than hurricen. Note that Other bar represents the sum of all the remaining events and is added as the sixth bar. Also note, that all numbers are cumulative for observed period.

The cost of overall economic damage caused by floods is 180.11 Billions USD (that is 37.7 % of total overall economic damage).The cost of overall economic damage caused by hurricane is 90.76 Billions USD (that is 19 % of total overall economic damage). The cost of overall economic damage caused by tornado is 57.42 Billions USD (that is 12 % of total overall economic damage).

# The cost of overall economic damage caused by floods (in Billions USD)
econDamage[1,Damage]
## [1] 180.1
# ... % of total overall economic damage
round(econDamage[1,Damage]/sum(econDamage[,Damage])*100,1)
## [1] 37.7
# The cost of overall economic damage caused by hurricane (in Billions USD)
econDamage[2,Damage]
## [1] 90.76
# ... % of total overall economic damage
round(econDamage[2,Damage]/sum(econDamage[,Damage])*100,1)
## [1] 19
# The cost of overall economic damage caused by tornado (in Billions USD)
econDamage[3,Damage]
## [1] 57.42
# ... % of total overall economic damage
round(econDamage[3,Damage]/sum(econDamage[,Damage])*100,1)
## [1] 12

As it can be seen in Table 5 and Figure 3, floods have the greatest overall economic consequences between the years 1950 and 2011 in the US, followed by hurricane and tornado.

4. Conclusion

Our analysis of the data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) Storm Databaseshows that:

  1. Tornado is the most harmful weather event in terms of both injuries and fatalities, and thus in total harm as well, between the years 1950 and 2011 in the US.

  2. Floods have the greatest overall economic consequences between the years 1950 and 2011 in the US, followed by hurricane and tornado.

  3. Floods have the greatest economic consequences regarding properties, while extreme temperatures (droughts) have the greatest economic consequences regarding crops between the years 1950 and 2011 in the US.