The Impact of Severe Weather Events: tornado worst for public health, floods worst for the economy

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. The main focus of this analysis is to unveil which types of weather events are the most dangerous for public health and for the economy. Focusing on the data spanning over the last 30 years of the dataset (1981-2011), we found out that the most damaging events for public health are tornadoes, which are worst in terms of fatalities and worst by far in terms of injuries. On the other hand, from the economic perspective, floods were the most damaging, mostly due to losses on property.

Data Processing

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

First, it is necessary to load the some packages.

library(ggplot2)
library(plyr)
library(reshape2)

To begin with our analysis, we download and unzip the data.

setInternet2(TRUE)
##Setting environment
if (!file.exists("./data")){
  dir.create("./data")  
}
## Create temporary file
f <- "./data/StormData.csv.bz2"
url<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
## Download the data
download.file(url, f)
## Import the data into R
data <- read.csv(bzfile(f), stringsAsFactors = FALSE)

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. As the measurements might have been prone to large errors in the past years, we chose to include only the last thirty years of data in our analysis.

# Set format
data$BGN_DATE <- as.Date(data$BGN_DATE, format = "%m/%d/%Y %H:%M:%S")

# Restrict to the last 30 years
data30 <- data[data$BGN_DATE >= as.Date("1981-01-01"), ]

Processing the categories data

It is necessary to note that the data categories are not in a perfect shape - there are cases of misspeling and duplication of categories.
Initially we tried to converting all characters to upper case.
Later we tried to simplify the data presentation using a standard in 9 categories, with a matrix 14x2 containing the most significative words representing the data.
In particular:
- SNOW category gathers events containing SNOW and BLIZZARD
- WIND category gathers events containing WIND, TSTM WIND, STORM
- WARM category gathers events containing HEAT and WARM
- OTHER category gathers events not contemplated in previous categories

# Clean Event Type...
data30$EVTYPE <- toupper(data30$EVTYPE)

Set the the new name and index of loop

#Set name of event
stdET <- matrix(c(c("TORNADO", "HAIL", "TSTM WIND", "RAIN", "FIRE", "FLOOD","SNOW",
"COLD", "WARM", "WIND", "BLIZZARD","STORM", "WINTER","HEAT"),c("TORNADO", "HAIL", "WIND", 
"RAIN", "FIRE", "FLOOD","SNOW","COLD", "WARM", "WIND", "SNOW","WIND","COLD","WARM")),
nrow=14,ncol=2)
#Set index of loop
n30 <- nrow(data30) # nrow of of data30
t <- dim(stdET)[1]  # length of stdET standard
NoT <- "OTHER"      # no stdET word match in EVTYPE

The following routine substitutes generic event with standard one

## Substitute generic event with STD EVENT 30 years
for (i in 1:n30) {
  flag = FALSE
  for (j in 1:t){
    if (grepl(stdET[j,1],data30$EVTYPE[i])) {
      data30$EVTYPE[i] <- stdET[j,2]
      flag = TRUE
    }
  }
  if(!flag){
    data30$EVTYPE[i] <- NoT
  }    
}

This method reduce the type of weather events but can require an excessive time usage, so it has not been used in this presentation

Processing the public health damage data

The public health damage data need to be summarized so as to show the number of injuries and fatalities by event name. Finally, top 10 events resulting in injuries and top 10 events resulting in fatalities are selected. Finally, the data is melt by the reshape2 package so that we can utilize it later in ggplot charts.

# Make sums of injuries and fatalities
sumhealth <- ddply(data30, .(EVTYPE), summarise, fatalities = sum(FATALITIES), 
                   injuries = sum(INJURIES))

## Select ten most harmful events
topfatalities <- head(sumhealth[order(sumhealth$fatalities, decreasing = T), 
                                ], n = 10)[, c(1, 2)]
topinjuries <- head(sumhealth[order(sumhealth$injuries, decreasing = T), ], 
                    n = 10)[, c(1, 3)]

## Prepare data for the barchart
forchart1 <- melt(topfatalities)
## Using EVTYPE as id variables
forchart2 <- melt(topinjuries)
## Using EVTYPE as id variables

Processing the economic damage data

The economic damage data is present in the form of a base and a multiplier (in the form of abbreviations). Hence, we multiply the base numbers by multipliers.

# Property damage multiplier: prepare and use to multiply the damage
data30$PROPDMGEXP[is.na(data30$PROPDMGEXP)] <- 0
data30$PROPDMGEXP[data30$PROPDMGEXP == ""] <- 1
data30$PROPDMGEXP[grep("[-+?]", data30$PROPDMGEXP)] <- 1
data30$PROPDMGEXP[grep("[Hh]", data30$PROPDMGEXP)] <- 100
data30$PROPDMGEXP[grep("[Kk]", data30$PROPDMGEXP)] <- 1000
data30$PROPDMGEXP[grep("[Mm]", data30$PROPDMGEXP)] <- 1e+06
data30$PROPDMGEXP[grep("[Bb]", data30$PROPDMGEXP)] <- 1e+09
data30$PROPDMGEXP <- as.numeric(data30$PROPDMGEXP)
data30$PROPDMG <- data30$PROPDMGEXP * data30$PROPDMG

# Crop damage multiplier: prepare and use to multiply the damage
data30$CROPDMGEXP[is.na(data30$CROPDMGEXP)] <- 0
data30$CROPDMGEXP[data30$CROPDMGEXP == ""] <- 1
data30$CROPDMGEXP[grep("[-+?]", data30$CROPDMGEXP)] <- 1
data30$CROPDMGEXP[grep("[Hh]", data30$CROPDMGEXP)] <- 100
data30$CROPDMGEXP[grep("[Kk]", data30$CROPDMGEXP)] <- 1000
data30$CROPDMGEXP[grep("[Mm]", data30$CROPDMGEXP)] <- 1e+06
data30$CROPDMGEXP[grep("[Bb]", data30$CROPDMGEXP)] <- 1e+09
data30$CROPDMGEXP <- as.numeric(data30$CROPDMGEXP)
data30$CROPDMG <- data30$CROPDMGEXP * data30$CROPDMG

Similarly to the health data processing, the economic damage figures are first summarized according to the type of event. Subsequently, top 10 events with the highest economic impact (defined as damage to crops plus damage to property) were selected.
Finaly, the data was prepared for ggplots with the melt function.

# Make sums f injuries and fatalities
sumecon <- ddply(data30, .(EVTYPE), summarise, cropdmg = sum(CROPDMG), propdmg = sum(PROPDMG))
sumecon$totaldamage <- sumecon$cropdmg + sumecon$propdmg

## Select top 10
topecon <- head(sumecon[order(sumecon$totaldamage, decreasing = T), ], n = 10)

## Prepare data for the barchart
forchart3 <- melt(topecon)
## Using EVTYPE as id variables
forchart3 <- forchart3[forchart3$variable != "totaldamage", ]

Results

Question 1: Public health

The following table and chart present the 10 most damaging events from the perspective of fatalities.

##             EVTYPE fatalities
## 758        TORNADO       2246
## 116 EXCESSIVE HEAT       1903
## 138    FLASH FLOOD        978
## 243           HEAT        937
## 418      LIGHTNING        816
## 779      TSTM WIND        504
## 154          FLOOD        470
## 524    RIP CURRENT        368
## 320      HIGH WIND        248
## 19       AVALANCHE        224
## Make the barchart
ggplot(forchart1, aes(x = factor(forchart1$EVTYPE), y = forchart1$value, fill = 
variable)) + geom_bar(stat = "identity", position = "dodge") + 
theme(axis.text.x = element_text(angle = -270), plot.title = 
element_text(face = "bold")) + labs(x = "Weather event", y = 
"Number of injuries") + theme(legend.position = "none") + ggtitle("Fatalities")

The event with the highest number of fatalities during the last 30 years of the dataset was TORNADO followed by EXCESSIVE HEAT.

The following table and chart present the 10 most damaging events from the perspective of injuries.

##                EVTYPE injuries
## 758           TORNADO    36814
## 779         TSTM WIND     6957
## 154             FLOOD     6789
## 116    EXCESSIVE HEAT     6525
## 418         LIGHTNING     5230
## 243              HEAT     2100
## 387         ICE STORM     1975
## 138       FLASH FLOOD     1777
## 685 THUNDERSTORM WIND     1488
## 212              HAIL     1361
## Make the barchart
ggplot(forchart2, aes(x = factor(forchart2$EVTYPE), y = forchart2$value, fill =
variable)) + geom_bar(stat = "identity", position = "dodge") + theme(axis.text.x
= element_text(angle = -270), plot.title = element_text(face = "bold")) + 
labs(x = "Weather event", y = "Number of injuries") + theme(legend.position = 
"none") + ggtitle("Injuries")

The event with the highest number of injuries during the last 30 years of the dataset was TORNADO.

To sum up, TORNADO seems to be the most damaging event from the perspective of public health.

Question 2: Economic damage

The following table and chart present the 10 most damaging events from the perspective of economic damage.

##                EVTYPE     cropdmg      propdmg  totaldamage
## 154             FLOOD  5661968450 144657709807 150319678257
## 372 HURRICANE/TYPHOON  2607872800  69305840000  71913712800
## 599       STORM SURGE        5000  43323536000  43323541000
## 758           TORNADO   414953110  41485762374  41900715484
## 212              HAIL  3025954453  15732267427  18758221880
## 138       FLASH FLOOD  1421317100  16140812294  17562129394
## 84            DROUGHT 13972566000   1046106000  15018672000
## 363         HURRICANE  2741910000  11868319010  14610229010
## 529       RIVER FLOOD  5029459000   5118945500  10148404500
## 387         ICE STORM  5022113500   3944927810   8967041310
## Make the barchart
ggplot(forchart3, aes(x = factor(forchart3$EVTYPE), y = forchart3$value, fill = 
variable)) + geom_bar(stat = "identity") + theme(axis.text.x = element_text(
angle = -270), plot.title = element_text(face = "bold")) + labs(x = 
"Weather event", y = "Economic damage (USD)") + scale_fill_discrete(name = "Type of
damage", labels = c("Crop", "Property")) + theme(legend.position = "top") + 
ggtitle("Economic impact")

The event with the highest economic damage during the last 30 years of the dataset was by FLOODS.