Storm Events Consequences on Population Health and Economic Damage

Synopsis

In this report, we aim to analyze the consequences of storm events in the United States between the years 1996 and 2011. Our goal is to find out the most impacting events in this period regarding the total number of fatalities and injuries, as well as the economic damage to properties and crops.

To achieve this objective, we obtained storm events data from the National Weather Service, which is collected from reports across the U.S.. These reports date back to 1950, however, up until 1995, only tornado, thunderstorm wind and hail events were registered. This way, our analyses consider only 1996 and beyond.

Results indicate that excessive heat and tornadoes are the most fatal events, while tornadoes are the ones that cause most injuries, followed by floods and excessive heat. In particular, it seems that major tornadoes hit the U.S. in 2011. Finally, floods have caused most property damages, while droughts have caused most crop damages, and, considering both property and crop damages together, floods are the most critical events.

Data Processing

The data file has been downloaded from the following link: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2

For the analysis, this file is expected to be downloaded and placed in the working directory with its original name (i.e., repdata%2Fdata%2FStormData.csv.bz2).

Required libraries

Load the necessary libraries in order to perform the analysis.

suppressPackageStartupMessages(library("dplyr"))
suppressPackageStartupMessages(library("data.table"))
suppressPackageStartupMessages(library("lubridate"))
suppressPackageStartupMessages(library("lattice"))

Loading and preprocessing the data

Let's take a look at the first 5 rows of our dataset and have a general idea of what it looks like.

filebz2 <- "repdata%2Fdata%2FStormData.csv.bz2"
filecsv <- bzfile(filebz2)
read.csv(filecsv, nrows = 5)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00      130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00      145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00      900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0      NA         NA       NA       NA          0
## 2 TORNADO         0      NA         NA       NA       NA          0
## 3 TORNADO         0      NA         NA       NA       NA          0
## 4 TORNADO         0      NA         NA       NA       NA          0
## 5 TORNADO         0      NA         NA       NA       NA          0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0      NA         NA   14.0   100 3   0          0
## 2         NA         0      NA         NA    2.0   150 2   0          0
## 3         NA         0      NA         NA    0.1   123 2   0          0
## 4         NA         0      NA         NA    0.0   100 2   0          0
## 5         NA         0      NA         NA    0.0   150 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0         NA  NA         NA        NA
## 2        0     2.5          K       0         NA  NA         NA        NA
## 3        2    25.0          K       0         NA  NA         NA        NA
## 4        2     2.5          K       0         NA  NA         NA        NA
## 5        2     2.5          K       0         NA  NA         NA        NA
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806      NA      1
## 2     3042      8755          0          0      NA      2
## 3     3340      8742          0          0      NA      3
## 4     3458      8626          0          0      NA      4
## 5     3412      8642          0          0      NA      5

Now we know we are only interested in the following variables: BGN_DATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG and CROPDMGEXP. So we can save some time by reading just them.

# Load and process the data
filecsv <- bzfile(filebz2)
dat <- read.csv(filecsv,
                as.is = TRUE,
                colClasses = c("NULL",
                               "character",     # BGN_DATE
                               rep("NULL", 5),
                               "factor",        # EVTYPE
                               rep("NULL", 14),
                               "numeric",       # FATALITIES
                               "numeric",       # INJURIES
                               "numeric",       # PROPDMG
                               "factor",        # PROPDMGEXP
                               "numeric",       # CROPDMG
                               "factor",        # CROPDMGEXP
                               rep("NULL", 9)),
                check.names = FALSE)

Extract the year from BGN_DATE

dat$BGN_DATE <- year(as.Date(dat$BGN_DATE, format = "%m/%d/%Y %T", tz = "GMT"))

Let's filter some data.

First, from the Storm Events Database website, we know that up until 1995 (included), there were only data for tornado, thunderstorm wind and hail. In order to avoid such bias in our analysis, we can take only data from 1996 (included) onwards.

Furthermore, from the preparation document, we know that magnitude characters from damage estimates include none, “K” for thousands, “M” for millions, and “B” for billions. This way, we can ignore additional characters, which are potentially inaccurate.

dat <- filter(dat, BGN_DATE >= 1996,
                   PROPDMGEXP %in% c("", "B", "M", "K"),                   
                   CROPDMGEXP %in% c("", "B", "M", "K"))

Let's check some summaries from our current data

# Convert to data.table
dat <- dat %>% data.table %>% setkey(EVTYPE, BGN_DATE)

# Print data summaries
str(dat)
## Classes 'data.table' and 'data.frame':   653529 obs. of  8 variables:
##  $ BGN_DATE  : num  2001 2003 2002 1998 1998 ...
##  $ EVTYPE    : Factor w/ 985 levels "?","ABNORMALLY DRY",..: 2 2 3 4 4 4 4 5 5 5 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ INJURIES  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ PROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  - attr(*, ".internal.selfref")=<externalptr> 
##  - attr(*, "sorted")= chr  "EVTYPE" "BGN_DATE"
summary(dat)
##     BGN_DATE                  EVTYPE         FATALITIES       
##  Min.   :1996   HAIL             :207715   Min.   :  0.00000  
##  1st Qu.:2000   TSTM WIND        :128662   1st Qu.:  0.00000  
##  Median :2005   THUNDERSTORM WIND: 81402   Median :  0.00000  
##  Mean   :2004   FLASH FLOOD      : 50998   Mean   :  0.01336  
##  3rd Qu.:2008   FLOOD            : 24247   3rd Qu.:  0.00000  
##  Max.   :2011   TORNADO          : 23154   Max.   :158.00000  
##                 (Other)          :137351                      
##     INJURIES           PROPDMG          PROPDMGEXP        CROPDMG       
##  Min.   :0.00e+00   Min.   :   0.00   K      :369938   Min.   :  0.000  
##  1st Qu.:0.00e+00   1st Qu.:   0.00          :276185   1st Qu.:  0.000  
##  Median :0.00e+00   Median :   0.00   M      :  7374   Median :  0.000  
##  Mean   :8.87e-02   Mean   :  11.69   B      :    32   Mean   :  1.839  
##  3rd Qu.:0.00e+00   3rd Qu.:   1.00   -      :     0   3rd Qu.:  0.000  
##  Max.   :1.15e+03   Max.   :5000.00   ?      :     0   Max.   :990.000  
##                                       (Other):     0                    
##    CROPDMGEXP    
##         :373069  
##  K      :278685  
##  M      :  1771  
##  B      :     4  
##  ?      :     0  
##  0      :     0  
##  (Other):     0

Results

In this section, we show our results according to each question.

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

There is a number of ways to answer this question. We will analyze fatalities and injuries separately.

Let's check the 5 most harmful events regarding fatalities.

tot.fat <- dat[, .(TOTAL_FATALITIES = sum(FATALITIES)), by = EVTYPE] %>%
           setorder(-TOTAL_FATALITIES) %>%
           head(5)
tot.fat
##            EVTYPE TOTAL_FATALITIES
## 1: EXCESSIVE HEAT             1797
## 2:        TORNADO             1511
## 3:    FLASH FLOOD              887
## 4:      LIGHTNING              651
## 5:          FLOOD              414

Now we plot their effects in each year.

# Get data to plot
plt <- dat[EVTYPE %in% tot.fat$EVTYPE,
           .(TOTAL_FATALITIES = sum(FATALITIES)),
           by = .(BGN_DATE, EVTYPE)]

plt$EVTYPE <- droplevels(plt$EVTYPE)

# Plot
xyplot(TOTAL_FATALITIES ~ BGN_DATE,
       plt,
       groups = EVTYPE,
       type = "b",
       auto.key = list(corner = c(0.6, 0.9)),
       lwd = 2,
       main = "Total fatalities by year",
       xlab = "Year",
       ylab = "Number of fatalities")

plot of chunk harmful2

Let's check the 5 most harmful events regarding injuries.

tot.inj <- dat[, .(TOTAL_INJURIES = sum(INJURIES)), by = EVTYPE] %>%
           setorder(-TOTAL_INJURIES) %>%
           head(5)
tot.inj
##            EVTYPE TOTAL_INJURIES
## 1:        TORNADO          20667
## 2:          FLOOD           6758
## 3: EXCESSIVE HEAT           6391
## 4:      LIGHTNING           4141
## 5:      TSTM WIND           3629

Now we plot their effects in each year.

# Get data to plot
plt <- dat[EVTYPE %in% tot.inj$EVTYPE,
           .(TOTAL_INJURIES = sum(INJURIES)),
           by = .(BGN_DATE, EVTYPE)]

plt$EVTYPE <- droplevels(plt$EVTYPE)

# Plot
xyplot(TOTAL_INJURIES ~ BGN_DATE,
       plt,
       groups = EVTYPE,
       type = "b",
       auto.key = list(corner = c(0.6, 0.9)),
       lwd = 2,
       main = "Total injuries by year",
       xlab = "Year",
       ylab = "Number of injuries")

plot of chunk harmful4

Across the United States, which types of events have the greatest economic consequences?

There is a number of ways to answer this question. We will analyze property and crop damages separately.

First, we need to convert magnitudes to values.

convertMagToVal <- function(mag) {
    if (mag == "") {
        val <- 1
    } else if (mag == "K") {
        val <- 1000
    } else if (mag == "M") {
        val <- 1000000
    } else if (mag == "B") {
        val <- 1000000000
    }

    return (val)
}

dat$PROPDMG <- dat$PROPDMG * sapply(dat$PROPDMGEXP, convertMagToVal)
dat$CROPDMG <- dat$CROPDMG * sapply(dat$CROPDMGEXP, convertMagToVal)

Let's check the 5 most damaging events regarding property.

tot.prop <- dat[, .(TOTAL_PROPDMG = sum(PROPDMG)), by = EVTYPE] %>%
            setorder(-TOTAL_PROPDMG) %>%
            head(5)
tot.prop
##               EVTYPE TOTAL_PROPDMG
## 1:             FLOOD  143944833550
## 2: HURRICANE/TYPHOON   69305840000
## 3:       STORM SURGE   43193536000
## 4:           TORNADO   24616945710
## 5:       FLASH FLOOD   15222203910

Let's check the 5 most damaging events regarding crop.

tot.crop <- dat[, .(TOTAL_CROPDMG = sum(CROPDMG)), by = EVTYPE] %>%
            setorder(-TOTAL_CROPDMG) %>%
            head(5)
tot.crop
##               EVTYPE TOTAL_CROPDMG
## 1:           DROUGHT   13367566000
## 2:             FLOOD    4974778400
## 3:         HURRICANE    2741410000
## 4: HURRICANE/TYPHOON    2607872800
## 5:              HAIL    2476029450

Now we plot the 5 most damaging events regarding both property and crop together.

tot.dmg <- dat[, .(TOTAL_DMG = sum(PROPDMG, CROPDMG)), by = EVTYPE] %>%
           setorder(-TOTAL_DMG) %>%
           head(5)
tot.dmg
##               EVTYPE    TOTAL_DMG
## 1:             FLOOD 148919611950
## 2: HURRICANE/TYPHOON  71913712800
## 3:       STORM SURGE  43193541000
## 4:           TORNADO  24900370720
## 5:              HAIL  17071172870
barchart(EVTYPE ~ TOTAL_DMG/1000000000,
         tot.dmg,
         main = "Most damaging events",
         xlab = "Damage (in billions of dollars)",
         ylab = "Event",
         col = "darkred")

plot of chunk damage4