In this project, we analyze the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. We investigate severe weather events on population health and events that lead to major economic loss. We have found that Tornado, Excessive Heat, TSTM Winds, Flood and Lightning are the events that lead to the most amount of fatalities and injuries. In terms of economic consequences, Flood, Hurricane/Typhoon, Tornado, Storm Surge and Hail are the events that lead to the most property and crop loss.
First we load the data:
storm.dat <- read.csv("repdata-data-StormData.csv.bz2")
The dimensions of the data and the names of the features are listed as:
dim(storm.dat)
## [1] 902297 37
names(storm.dat)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
We choose the features FATALITIES and INJURIES to investigate effects on human population and CROPDMG and PROPDMG to investigate economic effects. The features CROPDMGEXP and PROPDMGEP contain the powers on 10 which determine the total economic loss combined with CROPDMG and PROPDMG.
In order to study the events that are most harmful to human population, we consider the sum of FATALITIES and INJURIES as follows:
suppressMessages(suppressWarnings(library(dplyr)))
df <- group_by(storm.dat, EVTYPE)
df <- summarise(df, harm.pop = sum(FATALITIES + INJURIES))
df <- arrange(df, desc(harm.pop))
The top five most harmful events are
print.data.frame(df[1:5,])
## EVTYPE harm.pop
## 1 TORNADO 96979
## 2 EXCESSIVE HEAT 8428
## 3 TSTM WIND 7461
## 4 FLOOD 7259
## 5 LIGHTNING 6046
Below is a plot of these five most harmful events in terms of the fatalities and injuries they result in:
ev.type <- seq(1,5,1)
plot(ev.type, df[1:5, ]$harm.pop/1000, xlab = "", type = "b",
ylab = "Number of fatalities+injuries / 1000",
axes = FALSE, cex.lab = 0.8)
labs = c("TORNADO", "EXCESSIVE HEAT", "TSTM WIND", "FLOOD", "LIGHTNING")
axis(side=1, at = seq(1,5,1), labels = labs, cex.axis = 0.5)
axis(side=2, at = seq(20, 100, 20))
title("Harmful Events for Population Health")
box()
Now we consider the events with the most significant econmomic consequences. We first write a function for converting PROPDMGEXP and CROPDMGEXP into the proper power of 10:
fn.exp <- function(x){
labs <- c(0:9, "k", "K", "m", "M", "b", "B")
pows <- c(0:9, rep(3,2), rep(6,2), rep(9,2))
temp <- as.character(x); n <- length(x)
return.vec <- c(rep(0,n)) # initiate a return vector
# Loop over labels and assign the return.vec with the corresponding power of 10
for (i.lab in 1:length(labs)){
lmask <- temp == labs[i.lab]
return.vec[lmask] = pows[i.lab]
}
# If PROPDMGEXP or CROPDMGEXP contains any other factor (like "", "-", "?" etc.)
# then, we assume that the power of 10 is simply 0
return.vec
}
Then, we add new columns to storm.dat for the actual values of PROPDMG and CROPDMG
storm.dat <- mutate(storm.dat, prop.dam = PROPDMG * 10^(fn.exp(PROPDMGEXP)))
storm.dat <- mutate(storm.dat, crop.dam = CROPDMG * 10^(fn.exp(CROPDMGEXP)))
Finally, we group by event type, summarise with respect to property and crop damage
df.2 <- group_by(storm.dat, EVTYPE)
df.2 <- summarise(df.2, harm.ec = sum(crop.dam + prop.dam))
df.2 <- arrange(df.2, desc(harm.ec))
The top five most harmful events are
print.data.frame(df.2[1:5,])
## EVTYPE harm.ec
## 1 FLOOD 150319678257
## 2 HURRICANE/TYPHOON 71913712800
## 3 TORNADO 57362333946
## 4 STORM SURGE 43323541000
## 5 HAIL 18761221491
Below is a plot of these five most harmful events in terms of the crop and property damages they result in:
ev.type <- seq(1,5,1)
plot(ev.type, df.2[1:5, ]$harm.ec/10^9, xlab = "", type = "b",
ylab = "Total Property + Crop damage (Billion Dollars)",
axes = FALSE, cex.lab = 0.8)
labs = c("FLOOD", "HURRICANE/TYPHOON", "TORNADO", "STORM SURGE", "HAIL")
axis(side=1, at = seq(1,5,1), labels = labs, cex.axis = 0.5)
axis(side=2, at = seq(20, 140, 20))
title("Events with Economic Consequences")
box()