Storm and other weather events: public health and economic impact

Synopsis

In this report we will examine data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which tracks the characteristics of major storms and weather events in the US. We will use this data to determine the types of events that, across the United States, are most harmful with respect to population health and the types of events that have the greatest economic consequences. Identifying these types of events allows for more efficient resource allocation to prepare for weather events and minimize their public health and economic impact. From the data we find that the events with the most significant public health impact are tornado and excessive heat on an aggregate basis (tsunamis, hurricanes/typhoons and excessive heat on a per incident basis), while the types of events that have the largest economic impact and droughts, hurricanes/typhoons and flooding.

Loading and processing the data

Reading the data

URL<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destinationfile<- "repdata%2Fdata%2FStormData.csv.bz2"

if (!file.exists(destinationfile)) {
  download.file(URL, destfile = destinationfile, method = "curl")
}

stormdata<-read.csv("repdata%2Fdata%2FStormData.csv.bz2")

Data processing

To prepare data for analysis, we will convert date colums and explicitly state property and crop damage figures (e.g. convert damage estimates from 10K to 10,000). Additionally, we note that some event types are duplicated due to small differences in character strings (e.g. FLOODING and FLOOD), we correct some of these issues.

library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
stormdata$BGN_DATE<-mdy_hms(stormdata$BGN_DATE)
stormdata$EVTYPE<-as.character(stormdata$EVTYPE)
stormdata$EVTYPE<-toupper(stormdata$EVTYPE)

for (i in 1:902297) {
  if (stormdata$EVTYPE[i] == "GUSTY WIND") {
    stormdata$EVTYPE[i]<- "GUSTY WINDS"
  } else if (stormdata$EVTYPE[i] == "FLASH FLOOD") {
    stormdata$EVTYPE[i]<- "FLASH FLOODING"
  } else if (stormdata$EVTYPE[i] == "FLOOD") {
    stormdata$EVTYPE[i]<- "FLOODING"
  } else if (stormdata$EVTYPE[i] == "HIGH WIND") {
    stormdata$EVTYPE[i]<- "HIGH WINDS"
  } else if (stormdata$EVTYPE[i] == "LANDSLIDE") {
    stormdata$EVTYPE[i]<- "LANDSLIDES"
  } else if (stormdata$EVTYPE[i] == "RIP CURRENTS") {
    stormdata$EVTYPE[i]<- "RIP CURRENT"
  } else if (stormdata$EVTYPE[i] == "RIVER FLOOD") {
    stormdata$EVTYPE[i]<- "RIVER FLOODING"
  } else if (stormdata$EVTYPE[i] == "STRONG WIND") {
    stormdata$EVTYPE[i]<- "STRONG WINDS"
  } else if (stormdata$EVTYPE[i] == "THUNDERSTORM WIND") {
    stormdata$EVTYPE[i]<- "THUNDERSTORM WINDS"
  } else if (stormdata$EVTYPE[i] == "WILD FIRES") {
    stormdata$EVTYPE[i]<- "WILDFIRE"
  } else if (stormdata$EVTYPE[i] == "WIND") {
    stormdata$EVTYPE[i]<- "WINDS"
  } else {
  }
}
PropertyDamage<-as.numeric(0)
for (i in 1:902297) {
  if (stormdata$PROPDMGEXP[i] == "K") {
    PropertyDamage[i]<- (stormdata$PROPDMG[i] * 1000)
  } else if (stormdata$PROPDMGEXP[i] == "M") {
    PropertyDamage[i]<- (stormdata$PROPDMG[i] * 1000000)
  } else if (stormdata$PROPDMGEXP[i] == "B") {
    PropertyDamage[i]<- (stormdata$PROPDMG[i] * 1000000000)
  } else {
    PropertyDamage[i]<- 0
  }
}
stormdata$PropertyDamage<- PropertyDamage

CropDamage<-as.numeric(0)
for (i in 1:902297) {
  if (stormdata$CROPDMGEXP[i] == "K") {
    CropDamage[i]<- (stormdata$CROPDMG[i] * 1000)
  } else if (stormdata$CROPDMGEXP[i] == "M") {
    CropDamage[i]<- (stormdata$CROPDMG[i] * 1000000)
  } else if (stormdata$CROPDMGEXP[i] == "B") {
    CropDamage[i]<- (stormdata$CROPDMG[i] * 1000000000)
  } else {
    CropDamage[i]<- 0
  }
}
stormdata$CropDamage<- CropDamage

Results

1. Impact: Population health

To determine the type of even that has the largest impact on population health, we will look at the Fatalities and Injuries variables for each event type.

We will look at two different measures to determine which types of events have the most relevant impact on public health: i) Aggregate injuries and fatalities per event type and ii) Average injuries and fatalities per type of event.

We start off by aggregating the number of fatalities and injuries per event type as follows. We remove event types with fewer than 10 injuries and fewer than 10 fatalities to facilitate analysis. We note that we will only look at aggregate figures since 1996 given that it is when all event types began to be included in the database.

subsetstormdata<-subset(stormdata, year(stormdata$BGN_DATE) > 1995)
injuries<-with(subsetstormdata, tapply(INJURIES, EVTYPE, sum))
fatalities<-with(subsetstormdata, tapply(FATALITIES, EVTYPE, sum))
fatalities<-as.data.frame(fatalities)
injuries<-as.data.frame(injuries)
a<-rownames(fatalities)
fatalities$EventType<-a
a<-rownames(injuries)
injuries$EventType<-a
merged<-merge(fatalities, injuries, by.x = "EventType", by.y ="EventType")
merged<-subset(merged, fatalities> 10 & injuries> 10)

We now graph aggregate fatalities and injuries per event type. From this graph we can see that the types of events that have the most significant public health impact are: the tornado by number of injuries and excessive heat by number of fatalities. Other important events include flooding, lightning and flash flooding.

library(ggplot2)
g<-ggplot(merged, aes(fatalities, injuries))
g<- g + geom_point(aes(color = EventType)) + labs(x = "Aggregate number of fatalities", y = "Aggreggate Number of Injuries") + ggtitle("Aggregate fatalities and injuries per event type since 1996") + theme(legend.position="right") + theme(legend.text = element_text(size=8)) + annotate("text", x = 1511, y = 20067, label = "Tornado", size = 3) + annotate("text", x = 1697, y = 5891, label = "Excessive Heat", size = 3) + annotate("text", x = 414, y = 6058, label = "Flooding", size = 3) + annotate("text", x = 651, y = 3641, label = "Lightning", size = 3) + annotate("text", x = 887, y = 1074, label = "Flash Flooding", size = 3)
print(g)

Next, we calculate the average injuries and fatalities per event. We continue to use the subseted data from 1996 and thereafter. We can observe from the graph below that the events with the most significant impact per incident on average are Tsunamis, Hurricane/Typhoon and Excessive Heat.

A<-table(subsetstormdata$EVTYPE)
A<-as.data.frame(A)
merged<-merge(merged, A, by.x = "EventType", by.y = "Var1")
average<-merged
average$AverageFatalities<-average$fatalities / average$Freq
average$AverageInjuries<-average$injuries / average$Freq
g<- ggplot(average, aes(AverageFatalities, AverageInjuries))
g<- g + geom_point(aes(color = EventType)) + labs(x = "Average fatalities per incident", y = "Average injuries per incident") + ggtitle("Average injuries and fatalities per event per incident") + annotate("text", x = 1.65, y = 6.05, label = "Tsunami", size = 3) + annotate("text", x = 1.08, y = 3.45, label = "Excessive Heat", size = 3) + annotate("text", x = 0.72, y = 14.08, label = "Hurricane/Typhoon", size = 3)
print(g)

2. Impact: Economic

To measure the economic impact, we will also use the subseted data from 1996 and thereafter and look at the magnitude of the damage on property and crops.

We begin by aggregating per event type over property and crop damage. We subset for events with an impact of more than US44mm in property damage or US30mm in crop damage for analytical purposes.

property<-with(subsetstormdata, tapply(PropertyDamage, EVTYPE, sum))
crop<-with(subsetstormdata, tapply(CropDamage, EVTYPE, sum))
property<-as.data.frame(property)
crop<-as.data.frame(crop)
a<-rownames(property)
property$EventType<-a
a<-rownames(crop)
crop$EventType<-a
mergeddamage<-merge(property, crop, by.x = "EventType", by.y ="EventType")
mergeddamage<-subset(mergeddamage, property > 44000000 | crop > 30000000)

In the following graph, we observe that the events with the largest economic impact since 1996 have been droughts, floods and hurricanes/typhoons.

g<-ggplot(mergeddamage, aes(property, crop))
g<- g + geom_point(aes(color = EventType)) + labs(x = "Property damage (US$)", y = "Crop damage (US$)") + ggtitle("Economic impact since 1996") + theme(legend.position="right") + theme(legend.text = element_text(size=8)) + annotate("text", x = 1046101000, y = 12567566000, label = "Drought", size = 3) + annotate("text", x = 143944833550, y = 4274778400, label = "Flooding", size = 3) + annotate("text", x = 69305840000, y = 2007872800, label = "Hurricane/Typhoon", size = 3)
print(g)