Using the NOAA Storm Database, answer the questions:
1. What type of weather event is most harmful to human populations in the US?
2. What type of weather event is most harmful to the economy of the US?
The analysis will use the R data.table library.
# Load Dependencies
library(data.table)
The NOAA Storm Database is provided in a compressed format. Loading and extracting the file is very time-consuming, and is cached for efficiency.
# Read data from NOAA Storm Database .csv.bz2 file
NOAA <- read.csv("repdata-data-StormData.csv.bz2")
The data frame obtained from the .csv.bz2 is converted to a data.table object.
# Transform data frame into data table
NOAA <- data.table(NOAA)
The data relevant to each analysis are split into separate data.table objects for further processing.
# Split data table into economic and and population tables
# Process population table into final form for analysis
pop <- cbind(NOAA$EVTYPE, NOAA$FATALITIES, NOAA$INJURIES)
pop <- data.table(pop)
setnames(pop, c("Event Type", "Fatalities", "Injuries"))
econ <- cbind(NOAA$EVTYPE, NOAA$PROPDMG, as.factor(NOAA$PROPDMGEXP), NOAA$CROPDMG, as.factor(NOAA$CROPDMGEXP))
econ <- data.table(econ)
The economic data are provided as four columns: two pairs (property damage and crop damage) of two columns, with the first being a number and the second being the magnitude of that number, encoded as K for thousands, M for millions, and B for billions of dollars. The data are very time-consuming to process, so the total dollar amounts in each “weight class” (thousands, millions, or billions of dollar of damage) are determined.
# Sum damage estimates for cases where PROPDMGEXP is K, M, or B
# In the econ object, these are encoded as 17, 19, and 14, respectively.
kTotal <- sum(econ$V2[econ$V3==17])*10^3
mTotal <- sum(econ$V2[econ$V3==19])*10^6
bTotal <- sum(econ$V2[econ$V3==14])*10^9
The total damage for events causing thousands of dollars of property damage is 1.073529210^{10}
The total damage for events causing millions of dollars of property damage is 1.406944510^{11}
The total damage for events causing billionss of dollars of property damage is 2.758510^{11}
Because the millions and billions categories each sum to more than 10-fold more than the thousands, only these two will be used for the analysis.
This process is repeated for the crop damage category.
# Sum damage estimates for cases where CROPDMGEXP is K, M, or B
# In the econ object, these are encoded as 7, 9, and 5, respectively.
kTotal <- sum(econ$V4[econ$V5==7])*10^3
mTotal <- sum(econ$V4[econ$V5==9])*10^6
bTotal <- sum(econ$V4[econ$V5==5])*10^9
The total damage for events causing thousands of dollars of property damage is 1.342955910^{9}
The total damage for events causing millions of dollars of property damage is 3.4140810^{10}
The total damage for events causing billionss of dollars of property damage is 1.36110^{10}
Because the millions and billions categories each sum to more than 10-fold more than the thousands, only these two will be used for the analysis.
The economic data are processed into their final form for analysis. This step is cached for efficiency.
# Extract rows to be used for analysis.
econ <- rbind(
subset(econ, econ$V3==19),
subset(econ, econ$V3==14),
subset(econ, econ$V5==9),
subset(econ, econ$V5==5)
)
# Process encoded property damage amounts into simple dollar amounts
processProp <- function(x, y){
if (y == 19) {
x * 10^6
} else {
x * 10^9
}
}
# Process encoded crop damage amounts into simple dollar amounts
processCrop <- function(x, y){
if (y == 9) {
x * 10^6
} else {
x * 10^9
}
}
for (i in 1:length(econ$V1)) {
econ$V2[i] <- processProp(econ$V2[i], econ$V3[i])
econ$V4[i] <- processCrop(econ$V4[i], econ$V5[i])
}
econ <- cbind(econ$V1, econ$V2, econ$V4)
econ <- data.table(econ)
setnames(econ, c("Event Type", "Property Damage", "Crop Damage"))
Aggregate sums are taken for crop/property damage and fatalities by event code, and the most significant contributors in each category are identified.
# Property damage
setkey(econ, `Event Type`)
propAgg <- as.data.frame(econ[,sum(`Property Damage`),by=`Event Type`])
setnames(propAgg, c("Event Type", "Damage (USD)"))
mostPDmg <- head(propAgg[order(propAgg$`Damage (USD)`, decreasing = T),1], 4)
# Crop damage
setkey(econ, `Event Type`)
cropAgg <- as.data.frame(econ[,sum(`Crop Damage`),by=`Event Type`])
setnames(cropAgg, c("Event Type", "Damage (USD)"))
mostCDmg <- head(cropAgg[order(cropAgg$`Damage (USD)`, decreasing = T),1], 3)
# Fatalities
setkey(pop, `Event Type`)
fatalAgg <- as.data.frame(pop[,sum(Fatalities),by=`Event Type`])
setnames(fatalAgg, c("Event Type", "Fatalities"))
mostFatal <- head(fatalAgg[order(fatalAgg$Fatalities, decreasing = T),1], 2)
plot(propAgg$`Event Type`, propAgg$`Damage (USD)`, xlab="Event Type",
ylab="Damage (USD)", main="Total Property Damage by Event Type Since 1950")
abline(h=1e+13, col="red")
The above figure shows the total property damage caused by each type of event in the NOAA database since 1950. The points above the red line indicate event types which have caused damage in excess of $10 trillion. The four event types are: 244, 170, 153, and 856. Notably, the event type which causes the most harm to human populations is below this line (see below).
plot(cropAgg$`Event Type`, cropAgg$`Damage (USD)`, xlab="Event Type",
ylab="Damage (USD)", main="Total Crop Damage by Event Type Since 1950")
abline(h=2e+13, col="red")
The above figure shows the total crop damage caused by each type of event in the NOAA database since 1950. The points above the red line indicate event types which have caused damage in excess of $20 trillion. The three event types are: 170, 153, and 834. Two of these match the greatest sources of property damage, above, and one is the same as that which causes the most harm to human populations (see below).
plot(fatalAgg$`Event Type`, fatalAgg$Fatalities, xlab="Event Type",
ylab="Fatalities", main="Total Fatalities by Event Type Since 1950")
abline(h=1500, col="red")
The above figure shows the total fatalities caused by each type of even in the NOAA database since 1950. The points above the red line indicate event types which have caused fatalities in excess of 1500 people. The two event types are: 834 and 130. The most fatal event type is also the one for which the most injuries occur by a wide margin (analysis not shown).
The most harmful event types for property damage have some overlap with those for crop damage (event types 170 and 153), but the event type that is the most harmful to crops (making it the single largest contributor to total economic harm) is also the one that causes the greatest harm to human populations (event type 834).