Health and Economic Damage from NOAA Storms Database

Synopsis

The purpose of this report is to determine the most destructive types of events from the NOAA storms database. It includes the following:

Data Processing

Read the compressed CSV file

dataDir <- "data"
fileName <- "repdata_data_StormData.csv.bz2"
fileDir <- paste(dataDir, fileName, sep='/')
df <- read.csv(bzfile(fileDir))

Some Data Cleaning

sum(is.na(df)) # cool, no missing values
## [1] 1745947
df<-subset(df, FATALITIES+INJURIES+PROPDMG+CROPDMG>0) #remove with no human/material damage
df$EVTYPE <- factor(df$EVTYPE) # go from 985 to 488 even types, cleaned original data significantly
df$STATE <- factor(df$STATE) # 72 TO 67 States (now makes more sense including non state codes)

The damage ammounts (PROPDMG, and CROPDMG) have each an extra column that determines their magnitude (e.g. 10s, 100s, 1000s, etc). It is necessary to introduce the effect so that the numbers are right.

fixDollars <- function(dollarColumn, magColumn) {
    multi <- c(h=100, k=1000, m=1e6, b=1e9)
    multiColumn <- as.numeric(multi[tolower(magColumn)])
    multiColumn[is.na(multiColumn)] <- 1
    dollarColumn <- multiColumn*dollarColumn
}
df$PROPDMG <-fixDollars(df$PROPDMG, df$PROPDMGEXP)
df$CROPDMG <-fixDollars(df$CROPDMG, df$CROPDMGEXP)

Reduce columns to essential for analysis to:

keepCols <- c(7,8,23,24,25,27)
df <- df[,keepCols]

Results

In the case of health “harm”, the strategy here is to generate a weighed score (Weighted Harm) that accounts for the severity of death, but also accounts for injuries. In the case of economic damage, it is simple sum of monetary damage from buildings and crops.

Weighted Harm (or, Harm Score)

Although this is a very touchy subject, that I don't know much about, I chose to use a weighted average to “quantify” the health hazard of events. If both weights are one, then the weighted health hazzard would define “casualties”. The reasoning here is that there might be events that cause lots of non life threatening injuries, but result in very few fatalities. While these events cause considerable harm, they did not have the harmful health impact of fatalities.

wFatl <- 1
wInjr <- 0.10 # every 10 injuries account for a fatality
df$wHarm <- with(df, wFatl*FATALITIES + wInjr*INJURIES) # create a new column with these

The most harmful events are those that have the highest Weighted Harm, aggregating them by summing, and then ordering the data will bring to the top the events with highest total weighted harm score.

healthHarm <- aggregate(wHarm~EVTYPE, sum, data=df)
healthHarm<- healthHarm[order(healthHarm$wHarm, decreasing = T),]
rownames(healthHarm) <- NULL
healthHarm[1:10,]
##            EVTYPE   wHarm
## 1         TORNADO 14767.6
## 2  EXCESSIVE HEAT  2555.5
## 3       LIGHTNING  1339.0
## 4       TSTM WIND  1199.7
## 5     FLASH FLOOD  1155.7
## 6           FLOOD  1148.9
## 7            HEAT  1147.0
## 8     RIP CURRENT   391.2
## 9       HIGH WIND   361.7
## 10   WINTER STORM   338.1
sumHealthHarm <- healthHarm[1:6,]
sumHealthHarm$EVTYPE <- as.character(sumHealthHarm$EVTYPE)
sumHealthHarm$EVTYPE[6] <- 'EVERYTHING\nELSE'
sumHealthHarm$wHarm[6] <- sum(healthHarm$wHarm[6:nrow(healthHarm)])
with(sumHealthHarm, 
     pie(wHarm, labels=paste0(EVTYPE,"\nw. harm = ",round(wHarm,0)),
         init.angle=90, radius=0.9, col=heat.colors(6),
         main="Weighted Harm Total by Event Type"))

plot of chunk unnamed-chunk-6

The figure above summarizes in a pie chart the total harm score per event type. As can be seen, Tornados are the most damaging to population health. The pie chart includes an “everthing else” section, which helps to gain a perspective on the relative total damage.

Total economic damage.

It is easier to add dollars, in this case, the totals for buildings and crops.

df$dollarDamage <- with(df, PROPDMG+CROPDMG)

As we did with the health hazard, we can aggregate economic damage by summing and then order in decreasing order. This ordering will bring the types of events with the highest total economic damage to the top of the list.

economicDamage <- aggregate(dollarDamage~EVTYPE, sum, data=df)
economicDamage <- economicDamage[order(economicDamage$dollarDamage, decreasing = T),]
rownames(economicDamage) <- NULL
economicDamage[1:10,]
##               EVTYPE dollarDamage
## 1              FLOOD    1.503e+11
## 2  HURRICANE/TYPHOON    7.191e+10
## 3            TORNADO    5.735e+10
## 4        STORM SURGE    4.332e+10
## 5               HAIL    1.876e+10
## 6        FLASH FLOOD    1.756e+10
## 7            DROUGHT    1.502e+10
## 8          HURRICANE    1.461e+10
## 9        RIVER FLOOD    1.015e+10
## 10         ICE STORM    8.967e+09
sumEconomicDamage<- economicDamage[1:6,]
sumEconomicDamage$EVTYPE <- as.character(sumEconomicDamage$EVTYPE)
sumEconomicDamage$EVTYPE[6] <- 'EVERYTHING\nELSE'
sumEconomicDamage$dollarDamage[6] <- sum(economicDamage$dollarDamage[6:nrow(economicDamage)])
with(sumEconomicDamage,
     pie(dollarDamage, labels=paste0(EVTYPE,"\n$",round(dollarDamage/1e9,0),"B"),
         init.angle=90, radius=0.9, col=heat.colors(6),
         main="Economic Damage by Event Type (Billions of $)"))

plot of chunk unnamed-chunk-8

The figure above summarizes in a pie chart the total econmic damage in Billions of $ per event type. As can be seen, Floods are the most damaging in monetary terms. The pie chart includes an “everthing else” section, which helps to gain a perspective on the relative total damage.

Conclusions

The analysis of the NOAA storm database data has allowed to answer the following questions in the following way:

1- Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

2- Across the United States, which types of events have the greatest economic consequences?