NOAA Storm Events Effects on Population and Economic Heatlh

Synopsis

For this analysis we looked at NOAA Data from Storm Events with the aim to answer 2 main questions:

Across the United States, which types of events are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?

After analyzing the data we found that tornados and floods were particularly damaging to health of people and of property damage. Crops were most affected by extreme temperature fluctuations like drought and extreme heat and cold, along with flood and hurricanes.

Data Processing

The NOAA Storm Data was provided in a bzip file for download

https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2

This data also came with the following documentation:

National Weather Service Storm Data Documentation: https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf
National Climatic Data Center Storm Events FAQ: https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf

After downloading the data and unzipping it, I loaded it into RStudio

stormdf <- read.table("repdata-data-StormData.csv", sep=",", header=TRUE)

Now after loading the data, I decided that the fields we were interested in were the date, event type, fatalities, injuries, and the property and crop damage-related columns. After investigation and not wanting to rely to heavily on older, incomplete data I sought to only examine storm data from 1995 and on.

sdf <- data.frame(stormdf$BGN_DATE, stormdf$EVTYPE, stormdf$FATALITIES, stormdf$INJURIES, stormdf$PROPDMG,stormdf$PROPDMGEXP, stormdf$CROPDMG,stormdf$CROPDMGEXP)
colnames(sdf) <- c("BGN_DATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
sdf$BGN_DATE <- as.Date(sdf$BGN_DATE, format = "%m/%d/%Y")
sdf2 <- sdf[sdf$BGN_DATE >= as.Date("1/1/1995", format = "%d/%m/%Y"), ]

Now I had to deal with converting the exponent field to something measureable. For both Property Damage and Crop Damage, I transformed h to 100s, K to 1,000s, M to 1,000,000s, and B to 1,000,000,000s. The other special characters I treated as 1s after looking into the data and realizing that there few instances of these.

sdf2$PROPDMGEXP <- as.character(sdf2$PROPDMGEXP)
sdf2$CROPDMGEXP <- as.character(sdf2$CROPDMGEXP)

sdf2$PROPDMGEXP[sdf2$PROPDMGEXP == "" | sdf2$PROPDMGEXP == "+" | sdf2$PROPDMGEXP == "?" | sdf2$PROPDMGEXP == "-"] <- "1"
sdf2$PROPDMGEXP[sdf2$PROPDMGEXP == "H" | sdf2$PROPDMGEXP == "h"] <- "100"
sdf2$PROPDMGEXP[sdf2$PROPDMGEXP == "K" | sdf2$PROPDMGEXP == "k"] <- "1000"
sdf2$PROPDMGEXP[sdf2$PROPDMGEXP == "M" | sdf2$PROPDMGEXP == "m"] <- "1000000"
sdf2$PROPDMGEXP[sdf2$PROPDMGEXP == "B" | sdf2$PROPDMGEXP == "b"] <- "1000000000"

sdf2$CROPDMGEXP[sdf2$CROPDMGEXP == "" | sdf2$CROPDMGEXP == "?"] <- "1"
sdf2$CROPDMGEXP[sdf2$CROPDMGEXP == "H" | sdf2$CROPDMGEXP == "h"] <- "100"
sdf2$CROPDMGEXP[sdf2$CROPDMGEXP == "K" | sdf2$CROPDMGEXP == "k"] <- "1000"
sdf2$CROPDMGEXP[sdf2$CROPDMGEXP == "M" | sdf2$CROPDMGEXP == "m"] <- "1000000"
sdf2$CROPDMGEXP[sdf2$CROPDMGEXP == "B" | sdf2$CROPDMGEXP == "b"] <- "1000000000"

sdf2$PROPDMGEXP <- as.integer(sdf2$PROPDMGEXP)
sdf2$CROPDMGEXP <- as.integer(sdf2$CROPDMGEXP)

Now with these new fields, I could multiply the exponent by the damage column to get a real, measurable, and comparable value.

sdf2$propdmgtotal <- sdf2$PROPDMG * sdf2$PROPDMGEXP
sdf2$cropdmgtotal <- sdf2$CROPDMG * sdf2$CROPDMGEXP

Now the data is ready for analysis

Results

Effect on Population Health

To examine the effect on population health, I wanted to combine fatalities and injuries to get a high level idea of how many people were impacted. I aggregated this total by event type and sorted the data by most impacted.

sdf2$pophealtheffect <- sdf2$FATALITIES + sdf2$INJURIES
pophealth <- aggregate(sdf2$pophealtheffect, by = list(sdf2$EVTYPE), FUN = sum)
colnames(pophealth) <- c("evtype", "pop_effect")
pophealth <- pophealth[with(pophealth, order(-pop_effect)),]
pophealth[1:15,]

##                evtype pop_effect
## 666           TORNADO      23310
## 112    EXCESSIVE HEAT       8428
## 144             FLOOD       7192
## 358         LIGHTNING       5360
## 683         TSTM WIND       3871
## 231              HEAT       2954
## 134       FLASH FLOOD       2668
## 607 THUNDERSTORM WIND       1557
## 787      WINTER STORM       1493
## 313 HURRICANE/TYPHOON       1339
## 288         HIGH WIND       1334
## 773          WILDFIRE        986
## 206              HAIL        926
## 254        HEAVY SNOW        866
## 157               FOG        779

I also wanted to get an idea if certain events were more likely to kill rather than injure as I didn't think the previous analysis may have been obscuring the full picture. Here are the results of most impactful event types for population injuries and fatalities

injuryfx <- aggregate(sdf2$INJURIES, by = list(sdf2$EVTYPE), FUN = sum)
colnames(injuryfx) <- c("evtype", "injuries")
injuryfx <- injuryfx[with(injuryfx, order(-injuries)), ]
injuryfx[1:15,]

##                evtype injuries
## 666           TORNADO    21765
## 144             FLOOD     6769
## 112    EXCESSIVE HEAT     6525
## 358         LIGHTNING     4631
## 683         TSTM WIND     3630
## 231              HEAT     2030
## 134       FLASH FLOOD     1734
## 607 THUNDERSTORM WIND     1426
## 787      WINTER STORM     1298
## 313 HURRICANE/TYPHOON     1275
## 288         HIGH WIND     1093
## 206              HAIL      916
## 773          WILDFIRE      911
## 254        HEAVY SNOW      751
## 157               FOG      718

fatalityfx <- aggregate(sdf2$FATALITIES, by = list(sdf2$EVTYPE), FUN = sum)
colnames(fatalityfx) <- c("evtype", "fatalities")
fatalityfx <- fatalityfx[with(fatalityfx, order(-fatalities)),]
fatalityfx[1:15,]

##                evtype fatalities
## 112    EXCESSIVE HEAT       1903
## 666           TORNADO       1545
## 134       FLASH FLOOD        934
## 231              HEAT        924
## 358         LIGHTNING        729
## 144             FLOOD        423
## 461       RIP CURRENT        360
## 288         HIGH WIND        241
## 683         TSTM WIND        241
## 16          AVALANCHE        223
## 462      RIP CURRENTS        204
## 787      WINTER STORM        195
## 233         HEAT WAVE        161
## 607 THUNDERSTORM WIND        131
## 121      EXTREME COLD        126

There were many ones on both lists, but things like avalanche and rip currents were not reflected as much as the fatality incidence was much higher. Overall I thought it was fine to look at injury and fatality in concert when determining the most impactful events, but I also wanted to look into the split between fatality vs. injury for each of these.

Therefore I took the highest 15 event types and examined them by injury vs. fatality

pophealth2 <- pophealth[1:15,]
uniqueNames <- pophealth2$evtype
popdf1 <- sdf2[sdf2$EVTYPE %in% uniqueNames, ]

f_df <- aggregate(popdf1$FATALITIES, by = list(popdf1$EVTYPE), FUN = sum)
i_df <- aggregate(popdf1$INJURIES, by = list(popdf1$EVTYPE), FUN = sum)

colnames(f_df) <- c("evtype", "fatalities")
colnames(i_df) <- c("evtype", "injuries")

f_df <- f_df[with(f_df, order(evtype)),]
i_df <- i_df[with(i_df, order(evtype)),]

pop_df <- data.frame(f_df, i_df$injuries)
colnames(pop_df)[3] <- "injuries"

Now I have a data frame that shows 15 event types and the number of injuries and fatalities they've caused since 1995. Plotting this resulted in the following chart.

library(ggplot2)
library(reshape2)
df2 <- melt(pop_df, id.var = "evtype")
ggplot(df2, aes(x = evtype, y = value, fill = variable)) + geom_bar(stat="identity") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Event Type") + ylab("# of Fatalities & Injuries") + ggtitle("NOAA Top 15: Most Harmful Events for Population Health, 1995-2011")

plot of chunk unnamed-chunk-8

Clearly tornadoes, excessive heat, and flooding were primary events that caused harm to the population whether by death or injury.

Effect on Economic Factors

I decided to look at economic consequences as an effect of property and crop damage. But I also did not want to look at them together, because I thought that there was much more insight to be gleaned from looking at them separately.

As such, I looked at the top 15 most harmful events for property damage first.

propfx <- aggregate(sdf2$propdmgtotal, by = list(sdf2$EVTYPE), FUN = sum)
colnames(propfx) <- c("evtype", "propdmg")
propfx <- propfx[with(propfx, order(-propdmg)),]

propfx2 <- propfx[1:15,]
ggplot(propfx2, aes(x=evtype, y=propdmg)) + geom_bar(stat="identity", color = "black", fill = "blue") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Event Type") + ylab("Property Damage ($)") + ggtitle("NOAA Top 15 Most Harmful Events for Property Damage, 1995-2011")

plot of chunk unnamed-chunk-9

Most of the events that affect property damage are ones where water is involved such as flood, hurricane, and storms. Interestingly, tornadoes though very impactful on population health, are less consequential on property damage.

Now I'll look at the effects on crop damage.

cropfx <- aggregate(sdf2$cropdmgtotal, by = list(sdf2$EVTYPE), FUN = sum)
colnames(cropfx) <- c("evtype", "cropdmg")
cropfx <- cropfx[with(cropfx, order(-cropdmg)),]

cropfx2 <- cropfx[1:15,]
ggplot(cropfx2, aes(x=evtype, y=cropdmg)) + geom_bar(stat="identity", color = "black", fill = "green") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Event Type") + ylab("Crop Damage ($)") + ggtitle("NOAA Top 15 Most Harmful Events for Crop Damage, 1995-2011")

plot of chunk unnamed-chunk-10

Crops are most affected by extreme temperatures, much more so than properties. This is somewhat intuitive that drought and excessive heat and cold rank fairly highly. Flood also appears here which further cements it as a very impactful event type. Tornadoes however is not among the top 15 most harmful events in regards to crop damage.

Summary

As seen above, we can conclude that tornadoes and flood are particularly impactful on population and economic factors. Given more time, I'd like to further investigate with an eye on how often certain events are occur more than others to get a better idea if tornadoes and floods rank so highly simply because the recorded instances of them dwarf that of things like avalanches or mudslides.