Author: Pradeep K. Pant, ppant@cpan.org

Synopsis

This paper is an atempt to give an overview of the NOAA Storm Database and answer some basic questions about severe weather events. It uses data from U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. It try to analyses which types of events are most harmful with respect to population health and which are for economic loss. We find that tornado is most harmful for population health considering both injuries and fatality. For economic losses in general, flood stands first, followed by hurricane/typhoon. For crop loss, it is drought and for housing loss, it is tornado again. Tornado’s are by far the highest cause for injuries (#1), and second in fatalities, whilst heat & drought cause the most fatalities.

Data Processing

Download and load data

This is taking considerable amount of time to load so we’ll cache the data

# Loading and preprocessing the data
# Load CRAN modules 
library(downloader)
library(knitr)
library(datasets)
library(ggplot2)
library(rmarkdown)

# Step 1: Download the storm data set if not avaliable in default location

Url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"

# Check if bz2 has already been downloaded working directory?
if(!file.exists("stormData.csv.bz2")){
  download.file(Url,destfile="stormData.csv.bz2",mode="wb")
  }

# Read the .CSV file in R data structure 
stormData <- read.csv("stormData.csv.bz2")

At first, to address the first question of weather events that are most harmful to population, we look at total number of injuries and fatalities by different weather events.

# load package
library(plyr)
# Calculate injuries
injuries <- ddply(stormData, .(EVTYPE), summarize, sum.injuries = sum(INJURIES,na.rm=TRUE))
injuries <- injuries[order(injuries$sum.injuries, decreasing = TRUE), ]

List the top 5 events which are causing most damage (fatality/injury/economic damage)

# print 4 top injury causing events
head(injuries)
##             EVTYPE sum.injuries
## 834        TORNADO        91346
## 856      TSTM WIND         6957
## 170          FLOOD         6789
## 130 EXCESSIVE HEAT         6525
## 464      LIGHTNING         5230
## 275           HEAT         2100

We see that tornado is the most harmful event with injuries of more than 91 thousands. This can be represented in the below figure:

library(ggplot2)
# plot Event type vs Number of injuries
ggplot(injuries[1:6, ], aes(EVTYPE, sum.injuries, fill = EVTYPE,alpha=0.5)) + geom_bar(stat = "identity") + 
  xlab("Event Type") + ylab("Number of Injuries") + ggtitle("Injuries by Event type") + coord_flip()

Now will check for the events which have highest fatality rate

library(dplyr)
# check event with highest fatality rate
fatalities <- ddply(stormData, .(EVTYPE), summarize, sum = sum(FATALITIES))
fatalities <- fatalities[order(fatalities$sum, decreasing = TRUE), ]
head(fatalities, 5)
##             EVTYPE  sum
## 834        TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153    FLASH FLOOD  978
## 275           HEAT  937
## 464      LIGHTNING  816

quite visible that tornado again causing fatalities of more than 5 thousands followed by excessive heat causing close to 2 thousands fatalities.

We again provide a figure below to give a a more clear picture.

library(ggplot2)
ggplot(fatalities[1:6, ], aes(EVTYPE, sum, fill=EVTYPE,alpha=0.3)) + geom_bar(stat = "identity") + 
  xlab("Event Type") + ylab("Number of Fatalities") + ggtitle("Fatalities by Event type") + coord_flip()

For economic consequences, we will analyse property damage followed by crop damage and then total damage. Let’s focus on property damage first. We start by looking at various exponents for PROPDMGEXP.

unique(stormData$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M

Convert lower case characters to upper case characters and also replace symbols other than character of numeric values to 0.

stormData$PROPDMGEXP <- toupper(stormData$PROPDMGEXP)
stormData$PROPDMGEXP[stormData$PROPDMGEXP %in% c("", "+", "-", "?")] = "0"

As PROPDMGEXP stands for the power of 10, we convert ‘B’ standing for billions to 9, ‘M’ standing for millions to 6, ‘K’ standing for thousands to 3 and ‘H’ for hundreds to 2.

stormData$PROPDMGEXP[stormData$PROPDMGEXP %in% c("B")] = "9"
stormData$PROPDMGEXP[stormData$PROPDMGEXP %in% c("M")] = "6"
stormData$PROPDMGEXP[stormData$PROPDMGEXP %in% c("K")] = "3"
stormData$PROPDMGEXP[stormData$PROPDMGEXP %in% c("H")] = "2"

Fetch full property damage by converting PROPDMGEXP to numeric values and calculating total damage by multiplying the damage by the corresponding exponent.

stormData$PROPDMGEXP <- 10^(as.numeric(stormData$PROPDMGEXP))
damage.property <- stormData$PROPDMG * stormData$PROPDMGEXP
stormData=as.data.frame(cbind(stormData,damage.property))

Let’s make a new dataset of property damage arranged according to events type and look at the first 6 major events in terms of economic loss.

Damage.property <- ddply(stormData, .(EVTYPE), summarize, damage.property = sum(damage.property, na.rm = TRUE))
# Sort the Damage dataset
Damage.property <- Damage.property[order(Damage.property$damage.property, decreasing = T), ]
# Show the first 6 most damaging types
head(Damage.property)
##                EVTYPE damage.property
## 170             FLOOD    144657709807
## 411 HURRICANE/TYPHOON     69305840000
## 834           TORNADO     56947380677
## 670       STORM SURGE     43323536000
## 153       FLASH FLOOD     16822673979
## 244              HAIL     15735267513

Quite visible that flood is the major damaging event for housing in terms of economic loss with a total amount of more than 144 billion. This is followed by hurricane/typhoon and tornado.

Now we will look at which event is most devastating economically for crops. As with the economic computation, we take the similar steps and look at the most damaging event for crops in terms of economic loss.

Let’s have a look at various exponents for CROPDMGEXP.

unique(stormData$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M

As two levels have lower characters, we convert them to upper character. Also we replace symbols other than character of numeric values to 0.

stormData$CROPDMGEXP <- toupper(stormData$CROPDMGEXP)
stormData$CROPDMGEXP[stormData$CROPDMGEXP %in% c("", "?")] = "0"

As PROPDMGEXP stands for the power of 10, we convert ‘B’ standing for billions to 9, ‘M’ standing for millions to 6, ‘K’ standing for thousands to 3 and ‘H’ for hundreds to 2.

stormData$CROPDMGEXP[stormData$CROPDMGEXP %in% c("B")] = "9"
stormData$CROPDMGEXP[stormData$CROPDMGEXP %in% c("M")] = "6"
stormData$CROPDMGEXP[stormData$CROPDMGEXP %in% c("K")] = "3"
stormData$CROPDMGEXP[stormData$CROPDMGEXP %in% c("H")] = "2"

Now we get the full crop damage by converting PROPDMGEXP to numeric values and calculating total damage by multiplying the damage by the corresponding exponent.

stormData$CROPDMGEXP <- 10^(as.numeric(stormData$CROPDMGEXP))
damage.crop = stormData$CROPDMG * stormData$CROPDMGEXP
stormData=as.data.frame(cbind(stormData,damage.crop))

Now we make a new dataset of crop damage arranged according to events type and look at the first 6 major events in terms of economic loss.

library(dplyr)
Damage.crop = ddply(stormData, .(EVTYPE), summarize, damage.crop = sum(damage.crop, na.rm = TRUE))
# Sort the Damage.crop dataset
Damage.crop = Damage.crop[order(Damage.crop$damage.crop, decreasing = T), ]
# Show the first 6 most damaging types
head(Damage.crop)
##          EVTYPE damage.crop
## 95      DROUGHT 13972566000
## 170       FLOOD  5661968450
## 590 RIVER FLOOD  5029459000
## 427   ICE STORM  5022113500
## 244        HAIL  3025954473
## 402   HURRICANE  2741910000

Let’s also look at a chart to have a quick look at these figures.

ggplot(Damage.crop[1:6, ], aes(EVTYPE, damage.crop, fill = EVTYPE, alpha=0.5)) + geom_bar(stat = "identity") + 
  xlab("Event Type") + ylab("Total damages") + ggtitle("Total damages by Event type") + coord_flip()

We see that drought is the worst factor for agriculture causing more than 13 billion dollars. This is followed by flood causing more than 5 billion dollars.

Now if we want to look at the econmoic losses at aggregate, we need to add the losses from property and crop and then look at which type is most devastating, flood or drought.

Let’s compute total damage first and combine it to data.Then we just need to segregate losses according to event types.

library(dplyr)
total.damage = damage.property + damage.crop
stormData=as.data.frame(cbind(stormData,total.damage))
Damage.total = ddply(stormData, .(EVTYPE), summarize, damage.total = sum(total.damage, na.rm = TRUE))
# Sort the Damage.crop dataset
Damage.total = Damage.total[order(Damage.total$damage.total, decreasing = T), ]

Let’s have a look at first 6 most damaging types

head(Damage.total)
##                EVTYPE damage.total
## 170             FLOOD 150319678257
## 411 HURRICANE/TYPHOON  71913712800
## 834           TORNADO  57362333947
## 670       STORM SURGE  43323541000
## 244              HAIL  18761221986
## 153       FLASH FLOOD  18243991079

We see that it is flood with a whooping loss of more than 150 billions followed by hurricane/typhoon with an estimate of more than 71 billions.In terms of total losses, drought-main economic loss event is not even among the loss inducing six events.

Result

As per our analysis given here flood is the big cause for economic loss while tornado is for population health. If agriculture is the main concern, then drought may be the most concerning factor for economy. But for the economy in general, flood becomes the main loss factor. Concerning human health, more priorities naturally will go towards addressing tornado as it claims most lives and causes injuries.