Synopsis :

This work aims to identify which types of weather events are most costly to population health, as measured by injuries and deaths, and the economy, as measured by USD of damage. Data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database is utilised in the analysis. This database contains, amongst other measurements, details of fatalities, injuries, and property damage generated by different storm “Event Types”. Since it is troublesome to estimate the human cost of an injury compared to a fatality, these outcomes are initally treated seperately.

The specific questions this work aims to answer are as follows:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?
setwd("C:/Users/Mike/Dropbox/MOOC/R Programming/RR/CP2")
StormData <- read.csv('StormData.csv')
StormData$BGN_DATE <- as.POSIXct(StormData$BGN_DATE, format="%m/%d/%Y %H:%M:%S")
StormData <- mutate(StormData, Year = year(BGN_DATE))
## Warning: package 'bindrcpp' was built under R version 3.3.3

Data Processing

The data was loaded into R using ‘read.csv’. According to the supplied documents the number of injuries and death is given by “FATALITIES” and “INJURIES”. The monetary estimate of the storm damage is given by “PROPDMG” “PROPDMGEXP” “CROPDMG” “CROPDMGEXP, where the values ternmated by EXP are exponent values. The values held in the exponent columns were fairly irregular; for example 1000’s could be represented by a”k" a “K” or a “3”. These exponents were converted to numeric values using rules derivded from (“RPubs - How to Handle Exponent Value of PROPDMGEXP and CROPDMGEXP” 2017), and used to calculate absolute costs for property and crop damage.

Monetary and human costs were then summed across all years to find the most important event types negatively influencing economic and population health. This was necessary since for plotting and tabulation purposes it was not possble to consider each of 985 used in the “EVTYPE” column. Subsets of the original data, were “gathered” into a single column, so that trends in the relative importance of storm events over time could be evaluated. In the gathered dataframes, only the 10 most important event types for each catogory were considered.

setwd("C:/Users/Mike/Dropbox/MOOC/R Programming/RR/CP2")
StormData <- read.csv('StormData.csv')
StormData$BGN_DATE <- as.POSIXct(StormData$BGN_DATE, format="%m/%d/%Y %H:%M:%S")
StormData <- mutate(StormData, Year = year(BGN_DATE))


#fix wierd exponents
StormData$PropDamageMult[StormData$PROPDMGEXP == "B"] <- 1e9
StormData$PropDamageMult[StormData$PROPDMGEXP == "m"] <- 1e6
StormData$PropDamageMult[StormData$PROPDMGEXP == "M"] <- 1e6
StormData$PropDamageMult[StormData$PROPDMGEXP == "K"] <- 1e3
StormData$PropDamageMult[StormData$PROPDMGEXP == "h"] <- 100
StormData$PropDamageMult[StormData$PROPDMGEXP == "H"] <- 100
StormData$PropDamageMult[StormData$PROPDMGEXP == "8"] <- 10e8
StormData$PropDamageMult[StormData$PROPDMGEXP == "7"] <- 10e7
StormData$PropDamageMult[StormData$PROPDMGEXP == "6"] <- 10e6
StormData$PropDamageMult[StormData$PROPDMGEXP == "5"] <- 10e5
StormData$PropDamageMult[StormData$PROPDMGEXP == "4"] <- 10e4
StormData$PropDamageMult[StormData$PROPDMGEXP == "3"] <- 10e3
StormData$PropDamageMult[StormData$PROPDMGEXP == "2"] <- 10e2
StormData$PropDamageMult[StormData$PROPDMGEXP == "1"] <- 10
StormData$PropDamageMult[StormData$PROPDMGEXP == "0"] <- 1
StormData$PropDamageMult[StormData$PROPDMGEXP == "-"] <- 0
StormData$PropDamageMult[StormData$PROPDMGEXP == "?"] <- 0
StormData$PropDamageMult[StormData$PROPDMGEXP == "+"] <- 1
StormData$PropDamageMult[StormData$PROPDMGEXP == ""] <- 1

StormData$CropDamageMult[StormData$CROPDMGEXP == "B"] <- 1e9
StormData$CropDamageMult[StormData$CROPDMGEXP == "m"] <- 1e6
StormData$CropDamageMult[StormData$CROPDMGEXP == "M"] <- 1e6
StormData$CropDamageMult[StormData$CROPDMGEXP == "K"] <- 1e3
StormData$CropDamageMult[StormData$CROPDMGEXP == "k"] <- 1e3
StormData$CropDamageMult[StormData$CROPDMGEXP == "2"] <- 100
StormData$CropDamageMult[StormData$CROPDMGEXP == "0"] <- 1
StormData$CropDamageMult[StormData$CROPDMGEXP == "?"] <- 0
StormData$CropDamageMult[StormData$CROPDMGEXP == ""] <- 1

MyStormData <- StormData %>% transmute(Year = Year, Event_Type = EVTYPE, Fatalities = FATALITIES, Injuries = INJURIES, Property_Damage = PropDamageMult * PROPDMG, Crop_Damage = CropDamageMult*CROPDMG)

CropDamageByYear_Total <- with(MyStormData, tapply(Crop_Damage, list(Event_Type, Year), sum, na.rm=TRUE))
CropDamageByYear_Total <- replace(CropDamageByYear_Total, is.na(CropDamageByYear_Total), 0)
PropDamageByYear_Total <- with(MyStormData, tapply(Property_Damage, list(Event_Type, Year), sum, na.rm=TRUE))
PropDamageByYear_Total <- replace(PropDamageByYear_Total, is.na(PropDamageByYear_Total), 0)
InjuriesByYear_Total <- with(MyStormData, tapply(Injuries, list(Event_Type, Year), sum, na.rm=TRUE))
InjuriesByYear_Total <- replace(InjuriesByYear_Total, is.na(InjuriesByYear_Total), 0)
FatalitiesByYear_Total <- with(MyStormData, tapply(Fatalities, list(Event_Type, Year), sum, na.rm=TRUE))
FatalitiesByYear_Total <- replace(FatalitiesByYear_Total, is.na(FatalitiesByYear_Total), 0)

ImportantForFatalities <- sort(rowSums(FatalitiesByYear_Total), decreasing = TRUE)
ImportantForInjuries <- sort(rowSums(InjuriesByYear_Total), decreasing = TRUE)
ImportantForProperty <- sort(rowSums(PropDamageByYear_Total), decreasing = TRUE)
ImportantForCrops <- sort(rowSums(CropDamageByYear_Total), decreasing = TRUE)

MyFatalityStormData <- MyStormData[MyStormData$Event_Type %in% as.factor(names(sort(ImportantForFatalities, decreasing = TRUE)[1:10])),]
MyFatalityStormData$Event_Type <- factor(MyFatalityStormData$Event_Type)
MyFatalitiesByYear_Total <- with(MyFatalityStormData, tapply(Fatalities, list(Event_Type, Year), sum, na.rm=TRUE))
MyFatalitiesByYear_Total <- replace(MyFatalitiesByYear_Total, is.na(MyFatalitiesByYear_Total), 0)
FatalYears <- colnames(MyFatalitiesByYear_Total)
Fatalfactor_names <- rownames(MyFatalitiesByYear_Total)
FatalFrame <- data.frame(t(MyFatalitiesByYear_Total))
FatalFrame$Year <-FatalYears
FatalFrame_toplot <- gather(FatalFrame, key = Event_Type, value=Number, 1:10)

MyInjuryStormData <- MyStormData[MyStormData$Event_Type %in% as.factor(names(sort(ImportantForInjuries, decreasing = TRUE)[1:10])),]
MyInjuryStormData$Event_Type <- factor(MyInjuryStormData$Event_Type)
MyInjuriesByYear_Total <- with(MyInjuryStormData, tapply(Injuries, list(Event_Type, Year), sum, na.rm=TRUE))
MyInjuriesByYear_Total <- replace(MyInjuriesByYear_Total, is.na(MyInjuriesByYear_Total), 0)
InjuryYears <- colnames(MyInjuriesByYear_Total)
Injuryfactor_names <- rownames(MyInjuriesByYear_Total)
InjuryFrame <- data.frame(t(MyInjuriesByYear_Total))
InjuryFrame$Year <-InjuryYears
InjuryFrame_toplot <- gather(InjuryFrame, key = Event_Type, value=Number, 1:10)

MyPropStormData <- MyStormData[MyStormData$Event_Type %in% as.factor(names(sort(ImportantForProperty, decreasing = TRUE)[1:10])),]
MyPropStormData$Event_Type <- factor(MyPropStormData$Event_Type)
MyPropByYear_Total <- with(MyPropStormData, tapply(Property_Damage, list(Event_Type, Year), sum, na.rm=TRUE))
MyPropByYear_Total <- replace(MyPropByYear_Total, is.na(MyPropByYear_Total), 0)
PropYears <- colnames(MyPropByYear_Total)
Propfactor_names <- rownames(MyPropByYear_Total)
PropFrame <- data.frame(t(MyPropByYear_Total))
PropFrame$Year <-PropYears
PropFrame_toplot <- gather(PropFrame, key = Event_Type, value=Number, 1:10)

MyCropStormData <- MyStormData[MyStormData$Event_Type %in% as.factor(names(sort(ImportantForCrops, decreasing = TRUE)[1:10])),]
MyCropStormData$Event_Type <- factor(MyCropStormData$Event_Type)
MyCropByYear_Total <- with(MyCropStormData, tapply(Crop_Damage, list(Event_Type, Year), sum, na.rm=TRUE))
MyCropByYear_Total <- replace(MyCropByYear_Total, is.na(MyCropByYear_Total), 0)
CropYears <- colnames(MyCropByYear_Total)
Cropfactor_names <- rownames(MyCropByYear_Total)
CropFrame <- data.frame(t(MyCropByYear_Total))
CropFrame$Year <-CropYears
CropFrame_toplot <- gather(CropFrame, key = Event_Type, value=Number, 1:10)

FatalFrame_toplot$Type <- as.factor('Fatalities')
FatalFrame_toplot$Event_Type <- as.factor(FatalFrame_toplot$Event_Type)
levels(FatalFrame_toplot$Event_Type) <- sort(Fatalfactor_names)


InjuryFrame_toplot$Type <- as.factor('Injuries')
InjuryFrame_toplot$Event_Type <- as.factor(InjuryFrame_toplot$Event_Type)
levels(InjuryFrame_toplot$Event_Type) <- sort(Injuryfactor_names)


PropFrame_toplot$Type <- as.factor('Property Damage')
PropFrame_toplot$Event_Type <- as.factor(PropFrame_toplot$Event_Type)
levels(PropFrame_toplot$Event_Type) <- sort(Propfactor_names)

CropFrame_toplot$Type <- as.factor('Crop Damage')
CropFrame_toplot$Event_Type <- as.factor(CropFrame_toplot$Event_Type)
levels(CropFrame_toplot$Event_Type) <- sort(Cropfactor_names)


Bad_For_Living_Thing <- rbind(FatalFrame_toplot, InjuryFrame_toplot)
Bad_For_Living_Thing$Event_Type <- factor(Bad_For_Living_Thing$Event_Type)
Bad_For_NonLiving_Thing <- rbind(PropFrame_toplot, CropFrame_toplot)
Bad_For_NonLiving_Thing$Event_Type <- factor(Bad_For_NonLiving_Thing$Event_Type)

Bad_For_Living_Thing_t <- with(Bad_For_Living_Thing, tapply(Number, list(Event_Type, Year), sum, na.rm=TRUE))

LivingYears <- colnames(Bad_For_Living_Thing_t)
Livingfactor_names <- rownames(Bad_For_Living_Thing_t)
LivingFrame <- data.frame(t(Bad_For_Living_Thing_t))
LivingFrame$Year <-LivingYears
LivingFrame_toplot <- gather(LivingFrame, key = Event_Type, value=Number, 1:10)
LivingFrame_toplot$Event_Type <- as.factor(LivingFrame_toplot$Event_Type)
levels(LivingFrame_toplot$Event_Type) <- as.factor(Livingfactor_names)

Bad_For_NonLiving_Thing_t <- with(Bad_For_NonLiving_Thing, tapply(Number, list(Event_Type, Year), sum, na.rm=TRUE))
NonLivingYears <- colnames(Bad_For_NonLiving_Thing_t)
NonLivingfactor_names <- rownames(Bad_For_NonLiving_Thing_t)
NonLivingFrame <- data.frame(t(Bad_For_NonLiving_Thing_t))
NonLivingFrame$Year <-NonLivingYears
NonLivingFrame_toplot <- gather(NonLivingFrame, key = Event_Type, value=Number, 1:10)
NonLivingFrame_toplot$Event_Type <- as.factor(NonLivingFrame_toplot$Event_Type)
levels(NonLivingFrame_toplot$Event_Type) <- as.factor(NonLivingfactor_names)

Results

It appears as though, prior to 1993, the monetary cost of crop damage was not measured. Similarly, almost all injuries and deaths up to 1993 are attributed to tornados and ‘THUNDERSTORM WIND’. Since the climate and atmosphere have not fundementally changed over this period, it is likely that injuries and deaths from other sources were probably not recorded until 1993. Additionally, this would suggest that tornados and thunderstorms are probably the most significant storm events effecting human health.

If we consider only the years 1995 to 2011, then the relative importance of the most important weather events can be summarised as follows:

MonetaryDamagePlot <- ggplot(Bad_For_Living_Thing) + geom_point(aes(x=Year,y=Number, colour=Event_Type)) + 
  theme_bw() + facet_grid(Type ~ .) + 
  scale_x_discrete('', breaks = seq(as.numeric(min(Bad_For_Living_Thing$Year)),
   + as.numeric(max(Bad_For_Living_Thing$Year)),by = 10)) + facet_wrap(~Type, scales = "free")+
  scale_y_discrete('Number of Occasions') +theme(panel.spacing.x=unit(2, "lines"),panel.spacing.y=unit(1, "lines")) + labs(color = 'Event Type') +  ggtitle('Accumulated Injuries, Deaths, Property & Crop Damage for Significant Storm Events Per Year in the United States from 1951 to 2011 ')

HumanDamagePlot <- ggplot(Bad_For_NonLiving_Thing) + geom_point(aes(x=Year,y=Number/1e9, colour=Event_Type)) + 
  theme_bw() + facet_grid(Type ~ .) + 
  scale_x_discrete('Year', breaks = seq(as.numeric(min(Bad_For_Living_Thing$Year)),
   + as.numeric(max(Bad_For_Living_Thing$Year)),by = 10)) + facet_wrap(~Type, scales = "free")+
  scale_y_continuous('Monetary Cost (Billion USD)') +theme(panel.spacing.x=unit(2, "lines"),panel.spacing.y=unit(1, "lines")) + labs(color = 'Event Type')

grid.newpage()
grid.draw(rbind(ggplotGrob(MonetaryDamagePlot), ggplotGrob(HumanDamagePlot), size = "last"))
Figure 1: Each point on the scatter plots below represent the sum of the event shown in grey over the whole year. It's clear there is a divide  pre and post 1993, presumably due changes in the dilligence with which database was updated.

Figure 1: Each point on the scatter plots below represent the sum of the event shown in grey over the whole year. It’s clear there is a divide pre and post 1993, presumably due changes in the dilligence with which database was updated.

There are no obvious trends the frequency of injuries and deaths or monetary damage from 1993 to 2011. This indicates that the data is relatively complete. For the remainder of the analysis, to provide a balanced picture of the reltive importance of the different storm events, only data from 1995 to 2011 will be considered.

ggplot(subset(LivingFrame_toplot, Year >= 1995), 
       aes(x = reorder(Event_Type,-Number), y = Number, fill=Event_Type)) +
  geom_bar(stat ="identity") +ylab("Number") + theme_bw(base_size = 16) + theme(axis.text.x=element_text(angle=20,hjust=1)) + xlab('Event Type') + labs(fill = 'Event Type') + ggtitle('Total Injuries and Deaths by Storm type in the United States from 1995 to 2011.')
Figure 2: Each bar in the plot above represents the sum of injuries and deaths in the United States over the period 1995 to 2011

Figure 2: Each bar in the plot above represents the sum of injuries and deaths in the United States over the period 1995 to 2011

It seems as though tornados are the most significant weather event causing deaths and injuries (~27000 from 1995 to 2011) although, curiously, huricanes are much less significant. Compared to property and crops, it would appear that humans are far more vulerable to lightning and excessive heat.

ggplot(subset(NonLivingFrame_toplot, Year >= 1995), 
       aes(x = reorder(Event_Type,-Number), y = Number/1e9, fill=Event_Type)) +
  geom_bar(stat ="identity") +ylab("USD (Billion)") + theme_bw(base_size = 16) + theme(axis.text.x=element_text(angle=20,hjust=1)) + xlab('Event Type') + labs(fill = 'Event Type') + ggtitle('Total Monetary Crop and Property Damage Cost by Storm type in the United States from 1995 to 2011')
Figure 3: Each bar in the plot above represents the sum of crop and property damage cost in the United States over the period 1995 to 2011.

Figure 3: Each bar in the plot above represents the sum of crop and property damage cost in the United States over the period 1995 to 2011.

It would seem as though weather events involving an excess of water, i.e. flooding, storm surges, flash floods etc, are disproportionately significant for property damage compared to human health. Notably storm surges, which don’t crack the top 10 for injuries and deaths, are the 3rd most significant weather event for property damage ( costing ~45 Billion USD from 1995 to 2011). Tornados however, remain bad news for property, and are the most significant cause of damage.

References

“RPubs - How to Handle Exponent Value of PROPDMGEXP and CROPDMGEXP.” 2017. Accessed November 24. https://rpubs.com/flyingdisc/PROPDMGEXP.