Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many weather severe events can result in fatalities, injuries, property damage, and crop damage and preventing such outcomes to the extent possible is a key concern.
This paper explores the National Oceanic and Atmospheric Administration (NOAA) storm dataset. The dataset tracks the type and characteristics of major storms and weather events in the USA, including but not limited to where the occur, when they occur, estimates of any fatalities, injuries, and property damage. The dataset timespan of the dataset runs from the year 1950 and ends in November 2011.
The aim of this paper is to attempt to answer the following two key questions
Across the United States, which types of events are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
Based on the answers to these questions decisions can be made on the most effective way to spend available funds to minimise the impact of storms and other severe weather events.
In response to the question “Across the United States, which types of events are most harmful with respect to population health?” the analysis seems to show that Tornados have the most impact on public health followed by excessive heat.
In response to the question “Across the United States, which types of events have the greatest economic consequences?”, flooding seems to have the biggest effect on the economey when we measure imapct using estimation of damage caused.
As advised in the methodolgy section further analysis is required to ensure findings are robust and accurate.
A potential method of calculating the effect on public health would be to use the Years of potential life lost (YPLL) metric (4), using this metric a estimate of years of life lost due to fatality can be calculated for a given storm event, this can be added to the average number of months lost due to a specific type of injury during a storm event. These two metrics combined will then give us an estimate of the overall impact due to a Storm event on the Health of a population. The dataset provided for analysis and used in this paper doesn’t appear to have the required granularity to complete analysis in this manor.
Based on the data available and due to time constraints for this paper an simple health impact metric (SHIM) will be constructed which will weight injuries and fatalities with perceived impact on the overall health of the population. A injury will be counted as one unit within the metric, a fatality will be counted as twenty units within the metric. The fatality weighting is an arbitrary multiplier that has been applied as fatalities have a greater impact on the health of a population due to . This metric will then be summed across all storms reported, grouped by the type of weather event, the data will then be sorted and the event with the largest SHIM will be the most impaction event with regards to health.
A potential method of calculating cost to the economy can be taken by taking cost of damage due to a particular event and combining that with location, the cost can then be weighted by location with damage to major metropolitan areas, infrastructure and industrial facilities weighted as having a larger impact on the overall economy than damage to rural areas. The justification for this is impact on the economy of damage to major metropolitan area like New York City will be greater than that of a small town which has limited output.
Based on the data available and due to time constraints the economic impact of a storm or weather event will be determined by summing the cost of damage to property and crops. The the total damage per storm in USD will be calculated and then summed across all storms reported, grouped by the type of weather event, the data will then be sorted and the event with the largest sum total being the most impactful.
For the methods selected there are a number of issues around the non application of location bias, not using an estimate Years of potential life lost (YPLL) caused by injury and fatality, not taking into account the fact that health has an impact on the economic out and not weighting the damage cost.
A follow on analysis is recommended obtaining data to calculate an estimate Years of potential life lost (YPLL) caused by injury and fatality using actual data to inform weightings. The data guide contains useful information on typical types of injuries that occur due to a storm or weather event. It should also be considered that injury and fatality will have an economic impact due to loss of output from an injured or deceased member of a population.
The following is pre processing which is not related to a specific question, it is required for both datasets
#Install packagfes
#install.packages("ggplot2")
#install.packages("plyr")
#install.packages("R.utils")
#load libraries we will need later
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.2
library(plyr)
## Warning: package 'plyr' was built under R version 3.2.2
library(R.utils)
## Warning: package 'R.utils' was built under R version 3.2.2
## Loading required package: R.oo
## Warning: package 'R.oo' was built under R version 3.2.2
## Loading required package: R.methodsS3
## Warning: package 'R.methodsS3' was built under R version 3.2.2
## R.methodsS3 v1.7.0 (2015-02-19) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.19.0 (2015-02-27) successfully loaded. See ?R.oo for help.
##
## Attaching package: 'R.oo'
##
## The following objects are masked from 'package:methods':
##
## getClasses, getMethods
##
## The following objects are masked from 'package:base':
##
## attach, detach, gc, load, save
##
## R.utils v2.1.0 (2015-05-27) successfully loaded. See ?R.utils for help.
##
## Attaching package: 'R.utils'
##
## The following object is masked from 'package:utils':
##
## timestamp
##
## The following objects are masked from 'package:base':
##
## cat, commandArgs, getOption, inherits, isOpen, parse, warnings
#Download the file
file.url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
file.dest <- "./dataStorm/repdata_data_StormData.csv.bz2"
file.bzfile <- "./dataStorm/repdata_data_StormData.csv.bz2"
file.dataset <- "./dataStorm/repdata_data_StormData.csv"
#Check if data directory exists
if(!dir.exists("./dataStorm"))
{
dir.create("dataStorm", showWarnings = TRUE, recursive = FALSE, mode = "0777")
}
# download from the URL
# Enable in RStudio - doesn't work in markdown
#download.file(file.url, file.dest, quiet=TRUE)
## Unzipping file into folder "Data" in wd
bunzip2 (file.bzfile, overwrite=TRUE, remove=FALSE, destname=file.dataset)
#read the csv file
repdata <- read.csv(file=file.dataset, header=TRUE, sep=",",na.strings = "NA" )
#Subset what we need for the analysis based on the cookbook
harm <- repdata[,c('STATE', 'EVTYPE', 'FATALITIES', 'INJURIES', 'PROPDMG', 'PROPDMGEXP','CROPDMG', 'CROPDMGEXP')]
#Lets look at the data first to understand whats there
head(harm)
## STATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 AL TORNADO 0 15 25.0 K 0
## 2 AL TORNADO 0 0 2.5 K 0
## 3 AL TORNADO 0 2 25.0 K 0
## 4 AL TORNADO 0 2 2.5 K 0
## 5 AL TORNADO 0 2 2.5 K 0
## 6 AL TORNADO 0 6 2.5 K 0
#Clean the Event Type up
harm$EVTYPE <- toupper(harm$EVTYPE)
#Additional cleansing and standardization is recommended but not applied due to time constraints
#Check for missing data
sum (is.na (harm))
## [1] 0
#Produce initial set of summary statistics
summary(harm)
## STATE EVTYPE FATALITIES
## TX : 83728 Length:902297 Min. : 0.0000
## KS : 53440 Class :character 1st Qu.: 0.0000
## OK : 46802 Mode :character Median : 0.0000
## MO : 35648 Mean : 0.0168
## IA : 31069 3rd Qu.: 0.0000
## NE : 30271 Max. :583.0000
## (Other):621339
## INJURIES PROPDMG PROPDMGEXP CROPDMG
## Min. : 0.0000 Min. : 0.00 :465934 Min. : 0.000
## 1st Qu.: 0.0000 1st Qu.: 0.00 K :424665 1st Qu.: 0.000
## Median : 0.0000 Median : 0.00 M : 11330 Median : 0.000
## Mean : 0.1557 Mean : 12.06 0 : 216 Mean : 1.527
## 3rd Qu.: 0.0000 3rd Qu.: 0.50 B : 40 3rd Qu.: 0.000
## Max. :1700.0000 Max. :5000.00 5 : 28 Max. :990.000
## (Other): 84
## CROPDMGEXP
## :618413
## K :281832
## M : 1994
## k : 21
## 0 : 19
## B : 9
## (Other): 9
The following data processing is relevant to the question posed “Across the United States, which types of events are most harmful with respect to population health?” in that it prepares and aggregates data used to answer that question
#Pre Processing for the health impact
#Create the aggregates and the weighting
harmtotal <- aggregate(harm[, c('FATALITIES','INJURIES')], by=list(harm$EVTYPE), FUN=sum, na.rm=TRUE)
harmtotal$FATALITIESWEIGHTED <- harmtotal$FATALITIES * 20
harmtotal$Total <- harmtotal$FATALITIESWEIGHTED + harmtotal$INJURIES
harmtotal <- rename(harmtotal, c("Group.1"="Event"))
#Get the top 15 fatalities, injury and weighted health metric
fatal15 <- harmtotal[order(-harmtotal$FATALITIES),][1:15, ]
injury15 <- harmtotal[order(-harmtotal$INJURIES),][1:15, ]
weighted15 <- harmtotal[order(-harmtotal$Total),][1:15, ]
The following data processing is relevant to the question posed “Across the United States, which types of events have the greatest economic consequences?” in that it prepares and aggregates data used to answer that question
#Get unique values for property multiplier
unique(harm$PROPDMGEXP)
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
#Set the multi*plier values
harm$PROPMULT[harm$PROPDMGEXP == "K"] <- 1000
harm$PROPMULT[harm$PROPDMGEXP == "M"] <- 10 ^ 6
harm$PROPMULT[harm$PROPDMGEXP == ""] <- 1
harm$PROPMULT[harm$PROPDMGEXP == "B"] <- 10 ^ 9
harm$PROPMULT[harm$PROPDMGEXP == "m"] <- 10 ^ 6
harm$PROPMULT[harm$PROPDMGEXP == "+"] <- 0
harm$PROPMULT[harm$PROPDMGEXP == "0"] <- 1
harm$PROPMULT[harm$PROPDMGEXP == "5"] <- 10 ^ 5
harm$PROPMULT[harm$PROPDMGEXP == "6"] <- 10 ^ 6
harm$PROPMULT[harm$PROPDMGEXP == "?"] <- 0
harm$PROPMULT[harm$PROPDMGEXP == "4"] <- 10000
harm$PROPMULT[harm$PROPDMGEXP == "2"] <- 100
harm$PROPMULT[harm$PROPDMGEXP == "3"] <- 1000
harm$PROPMULT[harm$PROPDMGEXP == "h"] <- 100
harm$PROPMULT[harm$PROPDMGEXP == "7"] <- 10 ^ 7
harm$PROPMULT[harm$PROPDMGEXP == "H"] <- 100
harm$PROPMULT[harm$PROPDMGEXP == "-"] <- 0
harm$PROPMULT[harm$PROPDMGEXP == "1"] <- 10
harm$PROPMULT[harm$PROPDMGEXP == "8"] <- 10 ^ 8
#Check for NA's to ensure that all multipliers have been mapped
sum(is.na(harm$PROPMULT))
## [1] 0
#Get unique values for crop multiplier
unique(harm$CROPDMGEXP)
## [1] M K m B ? 0 k 2
## Levels: ? 0 2 B k K m M
#Apply crop multipliers
harm$CROPMULT[harm$CROPDMGEXP == ""] <- 1
harm$CROPMULT[harm$CROPDMGEXP == "M"] <- 10 ^ 6
harm$CROPMULT[harm$CROPDMGEXP == "K"] <- 1000
harm$CROPMULT[harm$CROPDMGEXP == "m"] <- 10 ^ 6
harm$CROPMULT[harm$CROPDMGEXP == "B"] <- 10 ^ 9
harm$CROPMULT[harm$CROPDMGEXP == "?"] <- 0
harm$CROPMULT[harm$CROPDMGEXP == "0"] <- 0
harm$CROPMULT[harm$CROPDMGEXP == "k"] <- 1000
harm$CROPMULT[harm$CROPDMGEXP == "2"] <- 100
#Check for NA's to ensure that all multipliers have been mapped
sum(is.na(harm$CROPMULT))
## [1] 0
#Now we go and multiply the values out and create a total
harm$CROPValue <- harm$CROPMULT * harm$CROPDMG
harm$PROPValue <- harm$PROPMULT * harm$PROPDMG
harm$TotalCost <- harm$CROPValue + harm$PROPValue
#Create a sum table
costtotal <- aggregate(harm[, c('CROPValue','PROPValue', 'TotalCost')], by=list(harm$EVTYPE), FUN=sum, na.rm=TRUE)
costtotal <- rename(costtotal, c("Group.1"="Event"))
#Get the top 15 valye for crop destruction, property destruction and the total
crop15 <- costtotal[order(-costtotal$CROPValue),][1:15, ]
Property15 <- costtotal[order(-costtotal$PROPValue),][1:15, ]
TotalDamage15 <- costtotal[order(-costtotal$TotalCost),][1:15, ]
The question set out was “Across the United States, which types of events are most harmful with respect to population health?”
The following table shows the top 15 aggregated fatalities and weighted fatalities grouped by Storm Event Type. The values at the top of the table are those which have the most fatalities and therefore cause the most harm in relation to fatalities
fatal15[,c(1,3,4)]
## Event INJURIES FATALITIESWEIGHTED
## 758 TORNADO 91346 112660
## 116 EXCESSIVE HEAT 6525 38060
## 138 FLASH FLOOD 1777 19560
## 243 HEAT 2100 18740
## 418 LIGHTNING 5230 16320
## 779 TSTM WIND 6957 10080
## 154 FLOOD 6789 9400
## 524 RIP CURRENT 232 7360
## 320 HIGH WIND 1137 4960
## 19 AVALANCHE 170 4480
## 888 WINTER STORM 1321 4120
## 525 RIP CURRENTS 297 4080
## 245 HEAT WAVE 379 3440
## 125 EXTREME COLD 231 3240
## 685 THUNDERSTORM WIND 1488 2660
The following table shows the top 15 total number of injuries grouped by Storm Event Type. The values at the top of the table are those which have the most injuries and therefore cause the most impact due to lost recovery time for the population
injury15[,c(1,2)]
## Event FATALITIES
## 758 TORNADO 5633
## 779 TSTM WIND 504
## 154 FLOOD 470
## 116 EXCESSIVE HEAT 1903
## 418 LIGHTNING 816
## 243 HEAT 937
## 387 ICE STORM 89
## 138 FLASH FLOOD 978
## 685 THUNDERSTORM WIND 133
## 212 HAIL 15
## 888 WINTER STORM 206
## 372 HURRICANE/TYPHOON 64
## 320 HIGH WIND 248
## 274 HEAVY SNOW 127
## 875 WILDFIRE 75
The following is a graphical representation of the top 15 injury and actual fatality data in the form of a bar chart, please note the weighted value is not expressed here.
#Set parameters to have a panel plot
par(mfrow = c(1, 2))
#Injuries
barplot(injury15$INJURIES,
names.arg=injury15$Event,
ylim= c(0,max(injury15$INJURIES) + 20000),
col=heat.colors(15),
cex.names= 0.8,
las = 2,
mar=c(10, 10, 5, 5),
ylab="Injuries Per Event Type",
main="Health Impact Due to Injuries")
#Fatalities
barplot(fatal15$FATALITIES,
names.arg=fatal15$Event,
ylim= c(0,max(injury15$INJURIES) + 20000),
col=heat.colors(15),
cex.names= 0.8,
las = 2,
mar=c(20, 20, 5, 5),
ylab="Weighted Health Metric",
main="Health Impact due to Fatalities")
The following data table details the top 15 summed weighted fatalities and injuries (i.e. the SHIM) grouped by Storm Event Type. The values at the top of the table are those which have the the most impactful SHIM
weighted15[,c(1,5)]
## Event Total
## 758 TORNADO 204006
## 116 EXCESSIVE HEAT 44585
## 418 LIGHTNING 21550
## 138 FLASH FLOOD 21337
## 243 HEAT 20840
## 779 TSTM WIND 17037
## 154 FLOOD 16189
## 524 RIP CURRENT 7592
## 320 HIGH WIND 6097
## 888 WINTER STORM 5441
## 19 AVALANCHE 4650
## 525 RIP CURRENTS 4377
## 685 THUNDERSTORM WIND 4148
## 245 HEAT WAVE 3819
## 387 ICE STORM 3755
The following is a graphical representation the top 15 summed weighted fatalities and injuries (i.e. the SHIM) grouped by Storm Event Type. This gives an overall impact on health within the population per storm event type
#Weigthed
barplot(weighted15$Total,
names.arg=weighted15$Event,
ylim= c(0,max(weighted15$Total) + 5000),
col=heat.colors(15),
cex.names= 0.8,
las = 2,
mar=c(20, 20, 5, 5),
ylab="Injuries Per Event Type",
main="Aggregated Health Impact due to Injuries and Weighted Fatalities")
##Impact on the Economy
The question set out was “Across the United States, which types of events have the greatest economic consequences?”
The following data table shows estimated value of damage to crops per Storm event, the storm event with the largest value has the biggest amount of damage.
crop15[,c(1,2)]
## Event CROPValue
## 84 DROUGHT 13972566000
## 154 FLOOD 5661968450
## 529 RIVER FLOOD 5029459000
## 387 ICE STORM 5022113500
## 212 HAIL 3025954453
## 363 HURRICANE 2741910000
## 372 HURRICANE/TYPHOON 2607872800
## 138 FLASH FLOOD 1421317100
## 125 EXTREME COLD 1312973000
## 187 FROST/FREEZE 1094186000
## 254 HEAVY RAIN 733399800
## 772 TROPICAL STORM 678346000
## 320 HIGH WIND 638571300
## 779 TSTM WIND 554007350
## 116 EXCESSIVE HEAT 492402000
The following data table shows estimated value of damage to property per Storm event, the storm event with the largest value has the biggest amount of damage.
Property15[,c(1,3)]
## Event PROPValue
## 154 FLOOD 144657709807
## 372 HURRICANE/TYPHOON 69305840000
## 758 TORNADO 56947380617
## 599 STORM SURGE 43323536000
## 138 FLASH FLOOD 16822673979
## 212 HAIL 15735267513
## 363 HURRICANE 11868319010
## 772 TROPICAL STORM 7703890550
## 888 WINTER STORM 6688497251
## 320 HIGH WIND 5270046260
## 529 RIVER FLOOD 5118945500
## 875 WILDFIRE 4765114000
## 600 STORM SURGE/TIDE 4641188000
## 779 TSTM WIND 4484958495
## 387 ICE STORM 3944927860
The following data table shows sum of the estimated value of damage to property and crops per Storm event, the storm event with the largest value has the biggest amount of damage.
TotalDamage15[,c(1,4)]
## Event TotalCost
## 154 FLOOD 150319678257
## 372 HURRICANE/TYPHOON 71913712800
## 758 TORNADO 57362333727
## 599 STORM SURGE 43323541000
## 212 HAIL 18761221966
## 138 FLASH FLOOD 18243991079
## 84 DROUGHT 15018672000
## 363 HURRICANE 14610229010
## 529 RIVER FLOOD 10148404500
## 387 ICE STORM 8967041360
## 772 TROPICAL STORM 8382236550
## 888 WINTER STORM 6715441251
## 320 HIGH WIND 5908617560
## 875 WILDFIRE 5060586800
## 779 TSTM WIND 5038965845
The following is a graphical representation the sum of the estimated value of damage to property and crops per Storm event. This provides an overall measure of the impact to the economy for an event
barplot(TotalDamage15$TotalCost,
names.arg=TotalDamage15$Event,
ylim= c(0,max(TotalDamage15$TotalCost) + 5000),
col=heat.colors(15),
cex.names= 0.8,
las = 2,
mar=c(20, 20, 5, 5),
ylab="Property and Crop Damage USD",
main="Top 15 Storm Events for Property and Crop Damage")