This paper is an atempt to give an overview of the NOAA Storm Database and answer some basic questions about severe weather events. It uses data from U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. It try to analyses which types of events are most harmful with respect to population health and which are for economic loss. We find that tornado is most harmful for population health considering both injuries and fatality. For economic losses in general, flood stands first, followed by hurricane/typhoon. For crop loss, it is drought and for housing loss, it is tornado again. Tornado’s are by far the highest cause for injuries (#1), and second in fatalities, whilst heat & drought cause the most fatalities.
This is taking considerable amount of time to load so we’ll cache the data
# Loading and preprocessing the data
# Load CRAN modules
library(downloader)
library(knitr)
library(datasets)
library(ggplot2)
library(rmarkdown)
# Step 1: Download the storm data set if not avaliable in default location
Url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
# Check if bz2 has already been downloaded working directory?
if(!file.exists("stormData.csv.bz2")){
download.file(Url,destfile="stormData.csv.bz2",mode="wb")
}
# Read the .CSV file in R data structure
stormData <- read.csv("stormData.csv.bz2")
At first, to address the first question of weather events that are most harmful to population, we look at total number of injuries and fatalities by different weather events.
# load package
library(plyr)
# Calculate injuries
injuries <- ddply(stormData, .(EVTYPE), summarize, sum.injuries = sum(INJURIES,na.rm=TRUE))
injuries <- injuries[order(injuries$sum.injuries, decreasing = TRUE), ]
List the top 5 events which are causing most damage (fatality/injury/economic damage)
# print 4 top injury causing events
head(injuries)
## EVTYPE sum.injuries
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
We see that tornado is the most harmful event with injuries of more than 91 thousands. This can be represented in the below figure:
library(ggplot2)
# plot Event type vs Number of injuries
ggplot(injuries[1:6, ], aes(EVTYPE, sum.injuries, fill = EVTYPE,alpha=0.5)) + geom_bar(stat = "identity") +
xlab("Event Type") + ylab("Number of Injuries") + ggtitle("Injuries by Event type") + coord_flip()
Now will check for the events which have highest fatality rate
library(dplyr)
# check event with highest fatality rate
fatalities <- ddply(stormData, .(EVTYPE), summarize, sum = sum(FATALITIES))
fatalities <- fatalities[order(fatalities$sum, decreasing = TRUE), ]
head(fatalities, 5)
## EVTYPE sum
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
quite visible that tornado again causing fatalities of more than 5 thousands followed by excessive heat causing close to 2 thousands fatalities.
We again provide a figure below to give a a more clear picture.
library(ggplot2)
ggplot(fatalities[1:6, ], aes(EVTYPE, sum, fill=EVTYPE,alpha=0.3)) + geom_bar(stat = "identity") +
xlab("Event Type") + ylab("Number of Fatalities") + ggtitle("Fatalities by Event type") + coord_flip()
For economic consequences, we will analyse property damage followed by crop damage and then total damage. Let’s focus on property damage first. We start by looking at various exponents for PROPDMGEXP.
unique(stormData$PROPDMGEXP)
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
Convert lower case characters to upper case characters and also replace symbols other than character of numeric values to 0.
stormData$PROPDMGEXP <- toupper(stormData$PROPDMGEXP)
stormData$PROPDMGEXP[stormData$PROPDMGEXP %in% c("", "+", "-", "?")] = "0"
As PROPDMGEXP stands for the power of 10, we convert ‘B’ standing for billions to 9, ‘M’ standing for millions to 6, ‘K’ standing for thousands to 3 and ‘H’ for hundreds to 2.
stormData$PROPDMGEXP[stormData$PROPDMGEXP %in% c("B")] = "9"
stormData$PROPDMGEXP[stormData$PROPDMGEXP %in% c("M")] = "6"
stormData$PROPDMGEXP[stormData$PROPDMGEXP %in% c("K")] = "3"
stormData$PROPDMGEXP[stormData$PROPDMGEXP %in% c("H")] = "2"
Fetch full property damage by converting PROPDMGEXP to numeric values and calculating total damage by multiplying the damage by the corresponding exponent.
stormData$PROPDMGEXP <- 10^(as.numeric(stormData$PROPDMGEXP))
damage.property <- stormData$PROPDMG * stormData$PROPDMGEXP
stormData=as.data.frame(cbind(stormData,damage.property))
Let’s make a new dataset of property damage arranged according to events type and look at the first 6 major events in terms of economic loss.
Damage.property <- ddply(stormData, .(EVTYPE), summarize, damage.property = sum(damage.property, na.rm = TRUE))
# Sort the Damage dataset
Damage.property <- Damage.property[order(Damage.property$damage.property, decreasing = T), ]
# Show the first 6 most damaging types
head(Damage.property)
## EVTYPE damage.property
## 170 FLOOD 144657709807
## 411 HURRICANE/TYPHOON 69305840000
## 834 TORNADO 56947380677
## 670 STORM SURGE 43323536000
## 153 FLASH FLOOD 16822673979
## 244 HAIL 15735267513
Quite visible that flood is the major damaging event for housing in terms of economic loss with a total amount of more than 144 billion. This is followed by hurricane/typhoon and tornado.
Now we will look at which event is most devastating economically for crops. As with the economic computation, we take the similar steps and look at the most damaging event for crops in terms of economic loss.
Let’s have a look at various exponents for CROPDMGEXP.
unique(stormData$CROPDMGEXP)
## [1] M K m B ? 0 k 2
## Levels: ? 0 2 B k K m M
As two levels have lower characters, we convert them to upper character. Also we replace symbols other than character of numeric values to 0.
stormData$CROPDMGEXP <- toupper(stormData$CROPDMGEXP)
stormData$CROPDMGEXP[stormData$CROPDMGEXP %in% c("", "?")] = "0"
As PROPDMGEXP stands for the power of 10, we convert ‘B’ standing for billions to 9, ‘M’ standing for millions to 6, ‘K’ standing for thousands to 3 and ‘H’ for hundreds to 2.
stormData$CROPDMGEXP[stormData$CROPDMGEXP %in% c("B")] = "9"
stormData$CROPDMGEXP[stormData$CROPDMGEXP %in% c("M")] = "6"
stormData$CROPDMGEXP[stormData$CROPDMGEXP %in% c("K")] = "3"
stormData$CROPDMGEXP[stormData$CROPDMGEXP %in% c("H")] = "2"
Now we get the full crop damage by converting PROPDMGEXP to numeric values and calculating total damage by multiplying the damage by the corresponding exponent.
stormData$CROPDMGEXP <- 10^(as.numeric(stormData$CROPDMGEXP))
damage.crop = stormData$CROPDMG * stormData$CROPDMGEXP
stormData=as.data.frame(cbind(stormData,damage.crop))
Now we make a new dataset of crop damage arranged according to events type and look at the first 6 major events in terms of economic loss.
library(dplyr)
Damage.crop = ddply(stormData, .(EVTYPE), summarize, damage.crop = sum(damage.crop, na.rm = TRUE))
# Sort the Damage.crop dataset
Damage.crop = Damage.crop[order(Damage.crop$damage.crop, decreasing = T), ]
# Show the first 6 most damaging types
head(Damage.crop)
## EVTYPE damage.crop
## 95 DROUGHT 13972566000
## 170 FLOOD 5661968450
## 590 RIVER FLOOD 5029459000
## 427 ICE STORM 5022113500
## 244 HAIL 3025954473
## 402 HURRICANE 2741910000
Let’s also look at a chart to have a quick look at these figures.
ggplot(Damage.crop[1:6, ], aes(EVTYPE, damage.crop, fill = EVTYPE, alpha=0.5)) + geom_bar(stat = "identity") +
xlab("Event Type") + ylab("Total damages") + ggtitle("Total damages by Event type") + coord_flip()
We see that drought is the worst factor for agriculture causing more than 13 billion dollars. This is followed by flood causing more than 5 billion dollars.
Now if we want to look at the econmoic losses at aggregate, we need to add the losses from property and crop and then look at which type is most devastating, flood or drought.
Let’s compute total damage first and combine it to data.Then we just need to segregate losses according to event types.
library(dplyr)
total.damage = damage.property + damage.crop
stormData=as.data.frame(cbind(stormData,total.damage))
Damage.total = ddply(stormData, .(EVTYPE), summarize, damage.total = sum(total.damage, na.rm = TRUE))
# Sort the Damage.crop dataset
Damage.total = Damage.total[order(Damage.total$damage.total, decreasing = T), ]
Let’s have a look at first 6 most damaging types
head(Damage.total)
## EVTYPE damage.total
## 170 FLOOD 150319678257
## 411 HURRICANE/TYPHOON 71913712800
## 834 TORNADO 57362333947
## 670 STORM SURGE 43323541000
## 244 HAIL 18761221986
## 153 FLASH FLOOD 18243991079
We see that it is flood with a whooping loss of more than 150 billions followed by hurricane/typhoon with an estimate of more than 71 billions.In terms of total losses, drought-main economic loss event is not even among the loss inducing six events.
As per our analysis given here flood is the big cause for economic loss while tornado is for population health. If agriculture is the main concern, then drought may be the most concerning factor for economy. But for the economy in general, flood becomes the main loss factor. Concerning human health, more priorities naturally will go towards addressing tornado as it claims most lives and causes injuries.