In this analysis we try to identify which types of events are most harmful with respect to the health of the population and which types of events have the greatest economic consequences?
We obtained data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
Using data from 1950 to 2011, we found tornadoes are the most harmful to the population, causing death or injury to and average of 1,800 people each year. Floods have the greatest economic consequences, causing on average $12.2 billion in damages to property and crops.
We first read in the NOAA storm database. The data is in a csv file.
## Check for the csv file and download it if not present
csv <- 'storm.csv'
if(!file.exists(csv)){
## Check for the bzip archive and download it if not present
temp <- 'storm.data.bz2'
if(!file.exists(temp)){
download.file('http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2',temp)
}
library(R.utils)
bunzip2(temp, destname='storm.csv')
}
storm.data <- read.csv('storm.csv', nrows=902297, stringsAsFactors = FALSE)
dim(storm.data)
## [1] 902297 37
There are 902,297 observations and 37 variables, however we do not need all of them. So we will select a subset of the variables for this analysis.
library(dplyr)
storm.data <- select(storm.data, BGN_DATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
The data set has 985 different event types. This is due to inconsistent spellings. We set out to standardize the event types by taking a searching for the messy name and replacing it with a standardized name.
## This data frame has the standard name and the messy search term
event.categories <- data.frame(standard = c('Avalanche','Blizzard','Coastal Flood', 'Cold/Wind Chill', 'Cold/Wind Chill', 'Debris Flow', 'Dense Fog', 'Dense Smoke', 'Drought', 'Dust Devil', 'Dust Storm', 'Heat', 'Excessive Heat', 'Extreme Cold/Wind Chill', 'Extreme Cold/Wind Chill', 'Flood','Flash Flood', 'Freezing Fog', 'Frost/Freeze', 'Frost/Freeze', 'Funnel Cloud', 'Hail', 'Heavy Rain', 'Heavy Snow', 'High Surf', 'High Wind', 'Hurricane/Typhoon', 'Hurricane/Typhoon', 'Ice Storm', 'Lakeshore Flood', 'Lake-Effect Snow', 'Lightning', 'Marine Hail', 'Marine High Wind', 'Marine Strong Wind', 'Marine Thunderstorm Wind', 'Rip Current', 'Seiche', 'Sleet', 'Storm Tide', 'Strong Wind', 'Thunderstorm Wind', 'Tornado', 'Tropical Depression', 'Tropical Storm', 'Tsunami', 'Volcanic Ash', 'Waterspout', 'Wildfire', 'Winter Storm', 'Winter Weather'), messy = c('avala', 'blizzard', 'coastal flood','cold', 'wind chill', 'debris', 'dense fog', 'dense smoke', 'drought', 'dust devil', 'dust storm', 'heat', 'excessive heat', 'extreme cold', 'extreme wind chill', 'flood', 'flash flood', 'freezing fog' ,'frost', 'freeze', 'funnel cloud', 'hail', 'heavy rain', 'heavy snow', 'high surf', 'high wind', 'hurricane', 'typhoon', 'ice storm', 'lakeshore flood', 'effect snow', 'lightning', 'marine hail', 'marine high wind', 'marine strong wind', 'marine thunderstorm wind', 'rip current', 'seiche', 'sleet', 'storm tide', 'strong wind', 'thunderstorm wind', 'tornado', 'tropical depression', 'tropical storm', 'tsunami', 'volcanic ash', 'waterspout', 'wildfire', 'winter storm', 'winter weather'))
## Make the search term lower case letters
storm.data$EVTYPE <- tolower(storm.data$EVTYPE)
storm.data$event.type <- 'NA' # default value
## Step through each of the categories and search for the messy term
for(i in 1:nrow(event.categories)){
standard <- as.character(event.categories[i,1])
messy <- as.character(event.categories[i,2])
## Update the event type with the standardized name if we find the messy name
if(nrow(storm.data[grep(messy, storm.data$EVTYPE),]) > 0){
storm.data[grep(messy, storm.data$EVTYPE),]$event.type <- standard
}
}
There are 237003 observations that we were not able to categorize using the process above. We removed these observations from the data set.
storm.data <- storm.data[!storm.data$event.type=='NA',]
After removing the observations that were not categorized we are left with 665294 observations in our data set.
The events in the data set occurred in different years. We want to take this into account therefore we need to extract the year the event occurred in.
## Clean up the begining date
storm.data$begin.date <- as.Date(substr(storm.data$BGN_DATE, 1, nchar(storm.data$BGN_DATE)-8), '%m/%d/%Y')
## Extract the year
storm.data$year <- as.numeric(format(storm.data$begin.date, '%Y'))
Severe weather can be responsible for both crop and property damage. In order to measure which type of events have the greatest economic consequences we need the total of the property crop damage.
The first step is to scale the data. We begin by standardizing the scaling variable:
## Standardize the property damage units
storm.data$PROPDMGEXP <- tolower(storm.data$PROPDMGEXP)
## Standardize the crop damage units
storm.data$CROPDMGEXP <- tolower(storm.data$CROPDMGEXP)
Next we create a function that will help scale the data. If the scaling variable is equal to ‘h’ then it is in hundreds; if it is a ‘k’ it is in thousands; etc.
## This function helps with scaling the data
scaleData <- function(number,scale){
if(scale == 'h'){
return(number * 100)
} else if (scale == 'k'){
return(number * 1000)
} else if (scale == 'm'){
return(number * 1000000)
} else if (scale == 'b'){
return(number * 1000000000)
} else{
return(number)
}
}
We use this function to scale the data.
## Scale the property damage
storm.data$property <- 0 # Default value
storm.data$property <- apply(storm.data, 1, function(x) {scaleData(as.numeric(x[5]), x[6])})
## Scale the crop damage
storm.data$crops <- 0 # Default value
storm.data$crops <- apply(storm.data, 1, function(x) {scaleData(as.numeric(x[7]), x[8])})
We then sum the property damage and crop damage amounts to get the amount of damage related to the event.
storm.data$damage = storm.data$property + storm.data$crops
There is one more adjustment that must be made. Since $1 million in 1950 is not the same as $1 million in 2011 we need to adjust for inflation.
We use the Consumer Price Index - All Urban Consumers (BLS series id CUUR0000SA0) to adjust the damage amount into inflation adjusted amounts. We downloaded the information from the BLS Website by using the series id and selecting data from 1950 to 1913 (the year with the latest annual estimate available).
cpi <- read.csv('cpi.csv', skip=11)
First we created the number we need to use to adjust for inflation.
## Get the latest annual estimate
inflation.base <- cpi[cpi$Year == max(cpi$Year),c('Annual')]
## Divide the inflation base by the annual estimate to create the inflation adjustment factor
cpi$inflation.factor <- inflation.base / cpi$Annual
We will merge the inflation factor with the storm data and adjust the economic damage measure by inflation.
cpi <- select(cpi, Year, inflation.factor)
## Variable need to be in lower case
names(cpi) <- c('year','inflation.factor')
## Load the data into a data table to combine them
library(data.table)
storm.data <- as.data.table(storm.data)
cpi <- as.data.table(cpi)
storm.data <- merge(storm.data, cpi, by='year')
## Create the inflation adjusted damage measure
storm.data <- mutate(storm.data, inf.adj.damage=damage * inflation.factor)
Severe weather can cause both injury and death. The amount of harm to the health of the population is measured in terms of both the death and injury caused by the events.
storm.data$harm <- storm.data$INJURIES + storm.data$FATALITIES
Severe weather can cause both damage to crops and property. It can also cause injury and death to the population. In this analysis we want to calculate what type of event is the most damaging in economic and population terms. Some events are catastrophic in nature causing death and injury to 1742 people or more than $132 billion in damage. Others have no damage to both people and property.
To accurately assess what type is the most damaging we take the average amount of the damage and weight it by the average number of event in a year. This approach mitigates the impact of high cost by low probability events.
We begin by calculating the average number of events by type on an annual basis.
events <- storm.data %.%
group_by(event.type, year) %.%
summarise(num.events = n()) %.%
group_by(event.type) %.%
summarise(avg.num.events = mean(num.events))
We then compute the average amount of damage to the population and the economy on an annual basis, merge it with our average number of events, and weight the damage by the number of events.
damage <- storm.data %.%
group_by(event.type, year) %.%
summarise(harm = mean(harm), inf.adj.damage = mean(inf.adj.damage)) %.%
group_by(event.type) %.%
summarise(harm = mean(harm), inf.adj.damage = mean(inf.adj.damage))
## Merge it with the average number of events
damage <- merge(damage, events)
## Weight the damage by the number of events
damage <- damage %.%
mutate(weighted.harm=round(harm * avg.num.events,0)) %.%
mutate(weighted.inf.adj.damage = round(inf.adj.damage * avg.num.events,0))
We find that tornadoes are the type of events that is the most harmful, causing on average death or injury to about 1,800 people each year.
harm.analysis <- damage %.%
arrange(desc(weighted.harm)) %.%
select(event.type, weighted.harm)
library(ggplot2)
ggplot(harm.analysis[harm.analysis$weighted.harm > 0 ], aes(x=reorder(event.type, weighted.harm), y=weighted.harm)) +
geom_bar(stat='identity') +
theme(axis.title.y = element_blank()) +
ylab('People Hurt or Killed Each Year') +
ggtitle('Figure 1: Number of People Harmed by Weather Events') +
coord_flip()
setnames(harm.analysis, 'event.type', 'Event Type')
setnames(harm.analysis, 'weighted.harm','Average Number of People Hurt or Killed Each Year')
head(harm.analysis)
## Event Type Average Number of People Hurt or Killed Each Year
## 1: Tornado 1806
## 2: Thunderstorm Wind 808
## 3: Excessive Heat 475
## 4: Flood 353
## 5: Winter Weather 349
## 6: Lightning 312
We find that floods are the type of events that has the greatest economic consequence, causing on average $12.2 billion in damages to property and crops each year.
econ.analysis <- damage %.%
arrange(desc(weighted.inf.adj.damage)) %.%
select(event.type, weighted.inf.adj.damage) %.%
mutate(weighted.inf.adj.damage=round(weighted.inf.adj.damage / 1000000, 1))
ggplot(econ.analysis[econ.analysis$weighted.inf.adj.damage >= 1 ], aes(x=reorder(event.type, weighted.inf.adj.damage), y=weighted.inf.adj.damage)) +
geom_bar(stat='identity') +
theme(axis.title.y = element_blank()) +
ylab(paste0('Millions of ',max(cpi$year),' Dollars')) +
ggtitle('Figure 2: Damage to Property and Crops') +
coord_flip()
setnames(econ.analysis, 'event.type', 'Event Type')
var.name <- paste0('Average Damage to Property and Crops (millions ',max(cpi$year),' $)')
setnames(econ.analysis, 'weighted.inf.adj.damage', var.name)
head(econ.analysis)
## Event Type
## 1: Flood
## 2: Winter Storm
## 3: Hurricane/Typhoon
## 4: Tornado
## 5: Drought
## 6: Flash Flood
## Average Damage to Property and Crops (millions 2013 $)
## 1: 12260
## 2: 7905
## 3: 3065
## 4: 2538
## 5: 1511
## 6: 1366