Title: NOAA Storm Data Analysis - Health Effects and Economic Consequences of Severe Weather Events

Synopsis

The basic goal of this assignment is to explore the NOAA Storm Database and address the following questions:

Q1) Across the United States, which types of events are most harmful with respect to population health?

Q2) Across the United States, which types of events have the greatest economic consequences?

After careful analysis, tornadoes were found to be the most hazardous event with respect to public health, and the greatest economic impact was caused by flash floods and droughts.

Data Processing

Setting up Global Options and Loading the Required Libraries:

knitr::opts_chunk$set(echo = TRUE)
library(plyr)
library(dplyr)
library(ggplot2)
library(grid)
library(gridExtra)

Getting information of the environment of the analysis:

sessionInfo()
## R version 3.3.1 (2016-06-21)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 14393)
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] gridExtra_2.2.1 ggplot2_2.2.1   dplyr_0.5.0     plyr_1.8.4     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.7      digest_0.6.9     rprojroot_1.2    assertthat_0.1  
##  [5] R6_2.1.2         gtable_0.2.0     DBI_0.5-1        backports_1.0.5 
##  [9] magrittr_1.5     scales_0.4.1     evaluate_0.10    stringi_1.1.5   
## [13] lazyeval_0.2.0   rmarkdown_1.4    tools_3.3.1      stringr_1.2.0   
## [17] munsell_0.4.3    yaml_2.1.14      colorspace_1.3-2 htmltools_0.3.5 
## [21] knitr_1.15.1     tibble_1.2

Loading and Preprocessing the Data:

stormdata <- read.csv("NOAAStormData.csv.")

head(stormdata)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

Before the data is processed for analysis, the storm data need some preprocessing. Event types don’t have a specific format. For instance, there are events with types TORNAO, TORNADO and tornado which obviously refer to the same type of event.

# number of unique items in the column EVTYPE
length(unique(stormdata$EVTYPE))
## [1] 985
# Convert all letters to lowercase to be able to process homogeneously
event_types <- tolower(stormdata$EVTYPE)
# replace all punct. characters with a space
event_types <- gsub("[[:blank:][:punct:]+]", " ", event_types)
length(unique(event_types))
## [1] 874
# update the storm frame
stormdata$EVTYPE <- event_types

Data Analysis:

To find the event types that are most harmful to population health, the number of casualties are aggregated by the event type.

casualties <- ddply(stormdata, .(EVTYPE), summarize,
                    fatalities = sum(FATALITIES),
                    injuries = sum(INJURIES))

# Find events that caused most death and injury
fatal_events <- head(casualties[order(casualties$fatalities, decreasing = T), ], 10)
injury_events <- head(casualties[order(casualties$injuries, decreasing = T), ], 10)

The property damage is represented with two fields, a number PROPDMG in dollars and the exponent PROPDMGEXP. Similarly, the crop damage is represented using two fields, CROPDMG and CROPDMGEXP. To find the event types that are most harmful to the economy, the property and crop damage for each event are calculated.

exp_transform <- function(e) {
    # h -> hundred, k -> thousand, m -> million, b -> billion
    if (e %in% c('h', 'H'))
        return(2)
    else if (e %in% c('k', 'K'))
        return(3)
    else if (e %in% c('m', 'M'))
        return(6)
    else if (e %in% c('b', 'B'))
        return(9)
    else if (!is.na(as.numeric(e))) # if a digit
        return(as.numeric(e))
    else if (e %in% c('', '-', '?', '+'))
        return(0)
    else {
        stop("Invalid exponent value.")
    }
}
prop_dmg_exp <- sapply(stormdata$PROPDMGEXP, FUN=exp_transform)
stormdata$prop_dmg <- stormdata$PROPDMG * (10 ** prop_dmg_exp)
crop_dmg_exp <- sapply(stormdata$CROPDMGEXP, FUN=exp_transform)
stormdata$crop_dmg <- stormdata$CROPDMG * (10 ** crop_dmg_exp)
# Compute the economic loss by event type
econ_loss <- ddply(stormdata, .(EVTYPE), summarize,
                   prop_dmg = sum(prop_dmg),
                   crop_dmg = sum(crop_dmg))

# filter out events that caused no economic loss
econ_loss <- econ_loss[(econ_loss$prop_dmg > 0 | econ_loss$crop_dmg > 0), ]
prop_dmg_events <- head(econ_loss[order(econ_loss$prop_dmg, decreasing = T), ], 10)
crop_dmg_events <- head(econ_loss[order(econ_loss$crop_dmg, decreasing = T), ], 10)

Results

Question 1: Across the United States, which types of events are most harmful with respect to population health?

Top 10 events that caused largest number of deaths are:

fatal_events[, c("EVTYPE", "fatalities")]
##             EVTYPE fatalities
## 741        tornado       5633
## 116 excessive heat       1903
## 138    flash flood        978
## 240           heat        937
## 410      lightning        816
## 762      tstm wind        504
## 154          flood        470
## 515    rip current        368
## 314      high wind        248
## 19       avalanche        224
#Set the levels in order
p1 <- ggplot(data=fatal_events,
             aes(x=reorder(EVTYPE, fatalities), y=fatalities, fill=fatalities)) +
    geom_bar(stat="identity") +
    coord_flip() +
    ylab("Total number of fatalities") +
    xlab("Event type") +
    theme(legend.position="none")

p2 <- ggplot(data=injury_events,
             aes(x=reorder(EVTYPE, injuries), y=injuries, fill=injuries)) +
    geom_bar(stat="identity") +
    coord_flip() + 
    ylab("Total number of injuries") +
    xlab("Event type") +
    theme(legend.position="none")
grid.arrange(p1, p2, ncol = 2,top = "Most Dangerous Weather Events for Public Health")

Question 2: Across the United States, which types of events have the greatest economic consequences?

Top 10 events that caused most property damage (in dollars) are as follows:

prop_dmg_events[, c("EVTYPE", "prop_dmg")]
##                 EVTYPE     prop_dmg
## 138        flash flood 6.820237e+13
## 697 thunderstorm winds 2.086532e+13
## 741            tornado 1.078951e+12
## 209               hail 3.157558e+11
## 410          lightning 1.729433e+11
## 154              flood 1.446577e+11
## 366  hurricane typhoon 6.930584e+10
## 166           flooding 5.920826e+10
## 585        storm surge 4.332354e+10
## 270         heavy snow 1.793259e+10

Similarly, the events that caused biggest crop damage are

crop_dmg_events[, c("EVTYPE", "crop_dmg")]
##                EVTYPE    crop_dmg
## 84            drought 13972566000
## 154             flood  5661968450
## 519       river flood  5029459000
## 382         ice storm  5022113500
## 209              hail  3025974480
## 357         hurricane  2741910000
## 366 hurricane typhoon  2607872800
## 138       flash flood  1421317100
## 125      extreme cold  1312973000
## 185      frost freeze  1094186000
# Set the levels in order
p1 <- ggplot(data=prop_dmg_events,
             aes(x=reorder(EVTYPE, prop_dmg), y=log10(prop_dmg), fill=prop_dmg )) +
    geom_bar(stat="identity") +
    coord_flip() +
    xlab("Event type") +
    ylab("Property damage in dollars (log-scale)") +
    theme(legend.position="none")

p2 <- ggplot(data=crop_dmg_events,
             aes(x=reorder(EVTYPE, crop_dmg), y=crop_dmg, fill=crop_dmg)) +
    geom_bar(stat="identity") +
    coord_flip() + 
    xlab("Event type") +
    ylab("Crop damage in dollars") + 
    theme(legend.position="none")
grid.arrange(p1, p2, ncol = 2,top = "Most Hazardous Weather Events for US Economy")