The purpose of this report is to determine the most destructive types of events from the NOAA storms database. It includes the following:
Read the compressed CSV file
dataDir <- "data"
fileName <- "repdata_data_StormData.csv.bz2"
fileDir <- paste(dataDir, fileName, sep='/')
df <- read.csv(bzfile(fileDir))
Some Data Cleaning
sum(is.na(df)) # cool, no missing values
## [1] 1745947
df<-subset(df, FATALITIES+INJURIES+PROPDMG+CROPDMG>0) #remove with no human/material damage
df$EVTYPE <- factor(df$EVTYPE) # go from 985 to 488 even types, cleaned original data significantly
df$STATE <- factor(df$STATE) # 72 TO 67 States (now makes more sense including non state codes)
The damage ammounts (PROPDMG, and CROPDMG) have each an extra column that determines their magnitude (e.g. 10s, 100s, 1000s, etc). It is necessary to introduce the effect so that the numbers are right.
fixDollars <- function(dollarColumn, magColumn) {
multi <- c(h=100, k=1000, m=1e6, b=1e9)
multiColumn <- as.numeric(multi[tolower(magColumn)])
multiColumn[is.na(multiColumn)] <- 1
dollarColumn <- multiColumn*dollarColumn
}
df$PROPDMG <-fixDollars(df$PROPDMG, df$PROPDMGEXP)
df$CROPDMG <-fixDollars(df$CROPDMG, df$CROPDMGEXP)
Reduce columns to essential for analysis to:
keepCols <- c(7,8,23,24,25,27)
df <- df[,keepCols]
In the case of health “harm”, the strategy here is to generate a weighed score (Weighted Harm) that accounts for the severity of death, but also accounts for injuries. In the case of economic damage, it is simple sum of monetary damage from buildings and crops.
Although this is a very touchy subject, that I don't know much about, I chose to use a weighted average to “quantify” the health hazard of events. If both weights are one, then the weighted health hazzard would define “casualties”. The reasoning here is that there might be events that cause lots of non life threatening injuries, but result in very few fatalities. While these events cause considerable harm, they did not have the harmful health impact of fatalities.
wFatl <- 1
wInjr <- 0.10 # every 10 injuries account for a fatality
df$wHarm <- with(df, wFatl*FATALITIES + wInjr*INJURIES) # create a new column with these
The most harmful events are those that have the highest Weighted Harm, aggregating them by summing, and then ordering the data will bring to the top the events with highest total weighted harm score.
healthHarm <- aggregate(wHarm~EVTYPE, sum, data=df)
healthHarm<- healthHarm[order(healthHarm$wHarm, decreasing = T),]
rownames(healthHarm) <- NULL
healthHarm[1:10,]
## EVTYPE wHarm
## 1 TORNADO 14767.6
## 2 EXCESSIVE HEAT 2555.5
## 3 LIGHTNING 1339.0
## 4 TSTM WIND 1199.7
## 5 FLASH FLOOD 1155.7
## 6 FLOOD 1148.9
## 7 HEAT 1147.0
## 8 RIP CURRENT 391.2
## 9 HIGH WIND 361.7
## 10 WINTER STORM 338.1
sumHealthHarm <- healthHarm[1:6,]
sumHealthHarm$EVTYPE <- as.character(sumHealthHarm$EVTYPE)
sumHealthHarm$EVTYPE[6] <- 'EVERYTHING\nELSE'
sumHealthHarm$wHarm[6] <- sum(healthHarm$wHarm[6:nrow(healthHarm)])
with(sumHealthHarm,
pie(wHarm, labels=paste0(EVTYPE,"\nw. harm = ",round(wHarm,0)),
init.angle=90, radius=0.9, col=heat.colors(6),
main="Weighted Harm Total by Event Type"))
The figure above summarizes in a pie chart the total harm score per event type. As can be seen, Tornados are the most damaging to population health. The pie chart includes an “everthing else” section, which helps to gain a perspective on the relative total damage.
It is easier to add dollars, in this case, the totals for buildings and crops.
df$dollarDamage <- with(df, PROPDMG+CROPDMG)
As we did with the health hazard, we can aggregate economic damage by summing and then order in decreasing order. This ordering will bring the types of events with the highest total economic damage to the top of the list.
economicDamage <- aggregate(dollarDamage~EVTYPE, sum, data=df)
economicDamage <- economicDamage[order(economicDamage$dollarDamage, decreasing = T),]
rownames(economicDamage) <- NULL
economicDamage[1:10,]
## EVTYPE dollarDamage
## 1 FLOOD 1.503e+11
## 2 HURRICANE/TYPHOON 7.191e+10
## 3 TORNADO 5.735e+10
## 4 STORM SURGE 4.332e+10
## 5 HAIL 1.876e+10
## 6 FLASH FLOOD 1.756e+10
## 7 DROUGHT 1.502e+10
## 8 HURRICANE 1.461e+10
## 9 RIVER FLOOD 1.015e+10
## 10 ICE STORM 8.967e+09
sumEconomicDamage<- economicDamage[1:6,]
sumEconomicDamage$EVTYPE <- as.character(sumEconomicDamage$EVTYPE)
sumEconomicDamage$EVTYPE[6] <- 'EVERYTHING\nELSE'
sumEconomicDamage$dollarDamage[6] <- sum(economicDamage$dollarDamage[6:nrow(economicDamage)])
with(sumEconomicDamage,
pie(dollarDamage, labels=paste0(EVTYPE,"\n$",round(dollarDamage/1e9,0),"B"),
init.angle=90, radius=0.9, col=heat.colors(6),
main="Economic Damage by Event Type (Billions of $)"))
The figure above summarizes in a pie chart the total econmic damage in Billions of $ per event type. As can be seen, Floods are the most damaging in monetary terms. The pie chart includes an “everthing else” section, which helps to gain a perspective on the relative total damage.
The analysis of the NOAA storm database data has allowed to answer the following questions in the following way:
1- Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
2- Across the United States, which types of events have the greatest economic consequences?