Synopsis

In this project, we analyze the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. We investigate severe weather events on population health and events that lead to major economic loss. We have found that Tornado, Excessive Heat, TSTM Winds, Flood and Lightning are the events that lead to the most amount of fatalities and injuries. In terms of economic consequences, Flood, Hurricane/Typhoon, Tornado, Storm Surge and Hail are the events that lead to the most property and crop loss.

Data Processing

First we load the data:

storm.dat <- read.csv("repdata-data-StormData.csv.bz2")

The dimensions of the data and the names of the features are listed as:

dim(storm.dat)
## [1] 902297     37
names(storm.dat)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

We choose the features FATALITIES and INJURIES to investigate effects on human population and CROPDMG and PROPDMG to investigate economic effects. The features CROPDMGEXP and PROPDMGEP contain the powers on 10 which determine the total economic loss combined with CROPDMG and PROPDMG.

Results

In order to study the events that are most harmful to human population, we consider the sum of FATALITIES and INJURIES as follows:

suppressMessages(suppressWarnings(library(dplyr)))
df <- group_by(storm.dat, EVTYPE) 
df <- summarise(df, harm.pop = sum(FATALITIES + INJURIES))
df <- arrange(df, desc(harm.pop))

The top five most harmful events are

print.data.frame(df[1:5,])
##           EVTYPE harm.pop
## 1        TORNADO    96979
## 2 EXCESSIVE HEAT     8428
## 3      TSTM WIND     7461
## 4          FLOOD     7259
## 5      LIGHTNING     6046

Below is a plot of these five most harmful events in terms of the fatalities and injuries they result in:

ev.type <- seq(1,5,1)
plot(ev.type, df[1:5, ]$harm.pop/1000, xlab = "", type = "b", 
     ylab = "Number of fatalities+injuries / 1000",
     axes = FALSE, cex.lab = 0.8)
labs = c("TORNADO", "EXCESSIVE HEAT", "TSTM WIND", "FLOOD", "LIGHTNING")
axis(side=1, at = seq(1,5,1), labels = labs, cex.axis = 0.5)
axis(side=2, at = seq(20, 100, 20))
title("Harmful Events for Population Health")
box()

Now we consider the events with the most significant econmomic consequences. We first write a function for converting PROPDMGEXP and CROPDMGEXP into the proper power of 10:

fn.exp <- function(x){
  labs <- c(0:9, "k", "K", "m", "M", "b", "B")
  pows <- c(0:9, rep(3,2), rep(6,2), rep(9,2))
  temp <- as.character(x); n <- length(x)
  return.vec <- c(rep(0,n)) # initiate a return vector
  
  # Loop over labels and assign the return.vec with the corresponding power of 10
  for (i.lab in 1:length(labs)){
    lmask <- temp == labs[i.lab]
    return.vec[lmask] = pows[i.lab]
  }
  # If PROPDMGEXP or CROPDMGEXP contains any other factor (like "", "-", "?" etc.)
  # then, we assume that the power of 10 is simply 0
  return.vec
}

Then, we add new columns to storm.dat for the actual values of PROPDMG and CROPDMG

storm.dat <- mutate(storm.dat, prop.dam = PROPDMG * 10^(fn.exp(PROPDMGEXP)))
storm.dat <- mutate(storm.dat, crop.dam = CROPDMG * 10^(fn.exp(CROPDMGEXP)))

Finally, we group by event type, summarise with respect to property and crop damage

df.2 <- group_by(storm.dat, EVTYPE) 
df.2 <- summarise(df.2, harm.ec = sum(crop.dam + prop.dam))
df.2 <- arrange(df.2, desc(harm.ec))

The top five most harmful events are

print.data.frame(df.2[1:5,])
##              EVTYPE      harm.ec
## 1             FLOOD 150319678257
## 2 HURRICANE/TYPHOON  71913712800
## 3           TORNADO  57362333946
## 4       STORM SURGE  43323541000
## 5              HAIL  18761221491

Below is a plot of these five most harmful events in terms of the crop and property damages they result in:

ev.type <- seq(1,5,1)
plot(ev.type, df.2[1:5, ]$harm.ec/10^9, xlab = "", type = "b", 
     ylab = "Total Property + Crop damage (Billion Dollars)",
     axes = FALSE, cex.lab = 0.8)
labs = c("FLOOD", "HURRICANE/TYPHOON", "TORNADO", "STORM SURGE", "HAIL")
axis(side=1, at = seq(1,5,1), labels = labs, cex.axis = 0.5)
axis(side=2, at = seq(20, 140, 20))
title("Events with Economic Consequences")
box()