1. INTRODUCTION

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

2. DOWNLOAD

Default settings

knitr::opts_chunk$set(echo = TRUE)

The following code is used to download the data:

setwd("C:/Users/junio/Desktop/COURSERA/DATA SCIENCE/COURSE 5 - Reproducible Research/WEEK 4/Project")

url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"

if (file.exists("StormData.csv") == F){
        download.file(url, "StormData.csv.bz2")
        library(R.utils)
        bunzip2("StormData.csv.bz2", "StormData.csv")
}

3. DATA PROCESSING

Health Impact

To evaluate the health impact, the total fatalities and the total injuries for each event type (EVTYPE) are calculated. The codes for this calculation are shown as follows.We will use the “data.table” to handle the data quickly and efficiently.

library(data.table)
# Read the csv file with "fread" to convert it to "data.table"
dt <- fread("StormData.csv", header = T)
## Warning in fread("StormData.csv", header = T): Read less rows (902297) than
## were allocated (967216). Run again with verbose=TRUE and please report.
# Create the variables "TOT.FATALITIES" and "TOT.INJURIES" which are the sum of the total fatalities and injuries by type of event respectively.
dt1 <- dt[, .(TOT.FATALITIES = sum(FATALITIES), TOT.INJURIES = sum(INJURIES)), by = EVTYPE]

# Order each variables created in a decreasing order and convert to factor type, this is done so that when the results are shown in barplot they are shown in descending order.
dt11 <- dt1[order(dt1$TOT.FATALITIES, decreasing = T),]
dt12 <- dt1[order(dt1$TOT.INJURIES, decreasing = T),]
dt11$EVTYPE <- factor(dt11$EVTYPE, levels = dt11$EVTYPE)
dt12$EVTYPE <- factor(dt12$EVTYPE, levels = dt12$EVTYPE)
dt11
##                     EVTYPE TOT.FATALITIES TOT.INJURIES
##   1:               TORNADO           5633        91346
##   2:        EXCESSIVE HEAT           1903         6525
##   3:           FLASH FLOOD            978         1777
##   4:                  HEAT            937         2100
##   5:             LIGHTNING            816         5230
##  ---                                                  
## 981:           SLEET STORM              0            0
## 982:           DENSE SMOKE              0            0
## 983:       LAKESHORE FLOOD              0            0
## 984: ASTRONOMICAL LOW TIDE              0            0
## 985:      VOLCANIC ASHFALL              0            0
dt12
##                     EVTYPE TOT.FATALITIES TOT.INJURIES
##   1:               TORNADO           5633        91346
##   2:             TSTM WIND            504         6957
##   3:                 FLOOD            470         6789
##   4:        EXCESSIVE HEAT           1903         6525
##   5:             LIGHTNING            816         5230
##  ---                                                  
## 981:           SLEET STORM              0            0
## 982:           DENSE SMOKE              0            0
## 983:       LAKESHORE FLOOD              0            0
## 984: ASTRONOMICAL LOW TIDE              0            0
## 985:      VOLCANIC ASHFALL              0            0

Economic Impact

The data provides two types of economic impact, namely property damage (PROPDMG) and crop damage (CROPDMG). The actual damage in $USD is indicated by PROPDMGEXP and CROPDMGEXP parameters. The index in the PROPDMGEXP and CROPDMGEXP can be interpreted as the following:

  • H,h = hundreds = 100
  • K,k = kilos = thousands = 1,000
  • M,m = millions = 1,000,000
  • B,b = billions = 1,000,000,000
  • (+) = 1
  • (-) = 0
  • (?) = 0
  • black/empty character = 0
  • numeric 0..8 = 10

The total damage caused by each event type is calculated with the following code.

# Create the scale function to make the conversion of the values of PROPDMGEXP and CROPDMGEXP to units
Escala <- function(x){
        
        if(x %in% 0:8){
                exp <- 10        
        }else if(x %in% c("H","h")){
                exp <- 100     
        }else if(x %in% c("K","k")){
                exp <- 1000
        }else if(x %in% c("M","m")){
                exp <- 1000000
        }else if(x %in% c("B","b")){
                exp <- 1000000000
        }else if(x %in% c("-","?","")){
                exp <- 0     
        }else if(x == "+"){
                exp <- 1     
        }
        exp
        
}

# Create the variables "EXP.PROPDMGEXP" and "EXP.CROPDMGEXP" which are the conversion of the values of PROPDMGEXP and CROPDMGEXP to units
dt2 <- dt[, .(EXP.PROPDMGEXP = sapply(PROPDMGEXP, Escala), EXP.CROPDMGEXP = sapply(CROPDMGEXP, Escala), PROPDMG, CROPDMG, EVTYPE)]

# Create the variables "TOT.PROPDMG" and "TOT.CROPDMG" which are Economic Impact by property damage and crop damage respectively.
dt2 <- dt2[, .(TOT.PROPDMG = EXP.PROPDMGEXP*PROPDMG, TOT.CROPDMG = EXP.CROPDMGEXP*CROPDMG, EXP.PROPDMGEXP, EXP.CROPDMGEXP, PROPDMG, CROPDMG, EVTYPE)]

# Create the variables "TOT.PROPDMGxEVTYPE" and "TOT.CROPDMGxEVTYPE" which are the sum of the total Economic Impact by property damage and crop damage respectively by type of event.
dt2 <- dt2[, .(TOT.PROPDMGxEVTYPE = sum(TOT.PROPDMG), TOT.CROPDMGxEVTYPE = sum(TOT.CROPDMG)), by = EVTYPE]

# Create the variable "TOT.DMG" which is total Economic Impact
dt2 <- dt2[, .(TOT.DMG = TOT.PROPDMGxEVTYPE + TOT.CROPDMGxEVTYPE, EVTYPE)]

# Order the variable created in a decreasing order and convert to factor type.
dt21 <- dt2[order(dt2$TOT.DMG, decreasing = T),]
dt21$EVTYPE <- factor(dt21$EVTYPE, levels = dt21$EVTYPE)
dt21
##           TOT.DMG                  EVTYPE
##   1: 150319678250                   FLOOD
##   2:  71913712800       HURRICANE/TYPHOON
##   3:  57352117607                 TORNADO
##   4:  43323541000             STORM SURGE
##   5:  18758224527                    HAIL
##  ---                                     
## 981:            0                DROWNING
## 982:            0 GUSTY THUNDERSTORM WIND
## 983:            0    HIGH SURF ADVISORIES
## 984:            0             SLEET STORM
## 985:            0        VOLCANIC ASHFALL

3. RESULTS

Health Impact

The top 10 events with the highest total fatalities and injuries are shown graphically.

library(ggplot2)
theme_set(theme_bw())

g <- ggplot(dt11[1:10,], aes(EVTYPE, TOT.FATALITIES))
g <- g + geom_bar(stat = "identity", width = 0.5, fill = "red") + labs(x = "Event Type", y = "Fatalities") + theme(axis.text.x = element_text(angle = 90, vjust = 0.8, size = 8), axis.text.y = element_text(size = 8)) + scale_y_continuous(breaks = seq(0, 8000, 1000))
g

f <- ggplot(dt12[1:10,], aes(EVTYPE, TOT.INJURIES))
f <- f + geom_bar(stat = "identity", width = 0.5, fill = "yellow") + labs(x = "Event Type", y = "Injuries") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, size = 8), axis.text.y = element_text(size = 8)) + scale_y_continuous(breaks = seq(0, 100000, 20000))
f

Economic Impact

The top 10 events with the highest total economic damages (property and crop combined) are shown graphically.

h <- ggplot(dt21[1:10,], aes(EVTYPE, TOT.DMG))
h <- h + geom_bar(stat = "identity", width = 0.5, fill = "green") + labs(x = "Event Type", y = "Economic Impact") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, size = 8), axis.text.y = element_text(size = 8))
h