Reproducible Research - Health and economic damages caused by weather events in US

by Mauricio G. Melara Camargo

Synopsis

This report explores the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. More info can be found on National Weather Service Storm Data Documentation.

This study has shown that Tornado is the event type with the highest accumulated health impact, while Flood is the one with the highest accumulated economic damage, accounting from 1950 to 2011.

Data Processing

Import the libraries which will be used:

library(R.utils)
library(plyr)
library(lattice)

First, the database is downloaded and unzipped, in case it is not present on current folder:

if (!file.exists("./repdata-data-StormData.csv")) {
    fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
    download.file(fileUrl, destfile = "repdata-data-StormData.csv.bz2", method = "curl")
    bunzip2("repdata-data-StormData.csv.bz2")
}
stormData <- read.csv("./repdata-data-StormData.csv")

According to question being adressed, only related fields are selected from database:

# selects only the collumns which contains relevant data
stormData <- stormData[c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", 
    "CROPDMG", "CROPDMGEXP")]

Some transformations are performed in order to get the real damage value in US dollars, both for properties and crops:

# mapping factor value to its adequate values
stormData$PROPDMGEXP_VALUE <- mapvalues(stormData$PROPDMGEXP, from = c("K", 
    "M", "m", "B", "H", "h"), to = c(1000, 1e+06, 1e+06, 1e+09, 100, 100))
stormData$CROPDMGEXP_VALUE <- mapvalues(stormData$CROPDMGEXP, from = c("K", 
    "k", "M", "m", "B"), to = c(1000, 1000, 1e+06, 1e+06, 1e+09))

# ignores all values which are different from h,H,m,M,k,K,b,B
stormData$PROPDMGEXP_VALUE[!stormData$PROPDMGEXP %in% c("h", "H", "m", "M", 
    "k", "K", "b", "B")] <- 0
stormData$CROPDMGEXP_VALUE[!stormData$CROPDMGEXP %in% c("h", "H", "m", "M", 
    "k", "K", "b", "B")] <- 0

# converting from factor to integer
stormData$PROPDMGEXP_VALUE <- as.integer(as.character(stormData$PROPDMGEXP_VALUE))
stormData$CROPDMGEXP_VALUE <- as.integer(as.character(stormData$CROPDMGEXP_VALUE))

# calculates real damage value
stormData$PROPDMG_TOTAL <- stormData$PROPDMG * stormData$PROPDMGEXP_VALUE
stormData$CROPDMG_TOTAL <- stormData$CROPDMG * stormData$CROPDMGEXP_VALUE

Total damage on properties and crops are added to get total damage value per weather event:

# sum up crop damage and property damage to check total damage
stormData$CROP_PROP_TOTAL <- stormData$PROPDMG_TOTAL + stormData$CROPDMG_TOTAL

Total acummulated damage over all years per event type is calculated. A bar chart from the top 5 most economic damage is created:

# calculates the total number of occurrences, total damage and mean damage
# per event type
economicSummary <- ddply(stormData, "EVTYPE", summarise, N = length(CROP_PROP_TOTAL), 
    mean = mean(CROP_PROP_TOTAL), total = sum(CROP_PROP_TOTAL))

# orders by total damage value (among all years)
plotData <- tail(economicSummary[order(economicSummary$total), ])
plotData
##                EVTYPE      N      mean     total
## 147       FLASH FLOOD  54277    323565 1.756e+10
## 238              HAIL 288661     64984 1.876e+10
## 666       STORM SURGE    261 165990579 4.332e+10
## 830           TORNADO  60652    945593 5.735e+10
## 406 HURRICANE/TYPHOON     88 817201282 7.191e+10
## 164             FLOOD  25326   5935390 1.503e+11

barchart(total/1e+09 ~ EVTYPE, data = plotData, xlab = "Event Type", ylab = "Total Economic Damage (in Billion Dolars)", 
    main = "Economic Damage per Event Type (Top 5)")

plot of chunk unnamed-chunk-6

The same is done to calculate health damage. Data is summarized to compute the accumulated number of fatalities and injuries over all years per event type:

# calculates health summaries
healthSummary <- ddply(stormData, "EVTYPE", summarise, N = length(FATALITIES), 
    meanFat = mean(FATALITIES), totalFat = sum(FATALITIES), meanInj = mean(INJURIES), 
    totalInj = sum(INJURIES))

# plot top 5 event with most fatalities
plotData <- tail(healthSummary[order(healthSummary$totalFat), ])
plotData
##             EVTYPE      N  meanFat totalFat meanInj totalInj
## 854      TSTM WIND 219940 0.002292      504 0.03163     6957
## 452      LIGHTNING  15754 0.051796      816 0.33198     5230
## 269           HEAT    767 1.221643      937 2.73794     2100
## 147    FLASH FLOOD  54277 0.018019      978 0.03274     1777
## 123 EXCESSIVE HEAT   1678 1.134088     1903 3.88856     6525
## 830        TORNADO  60652 0.092874     5633 1.50607    91346

barchart(totalFat ~ EVTYPE, data = plotData, xlab = "Event Type", ylab = "Total Number of Fatalities", 
    main = "Total Fatalities per Event Type (Top 5)")

plot of chunk unnamed-chunk-7


# plot top 5 event with most injuries
plotData <- tail(healthSummary[order(healthSummary$totalInj), ])
plotData
##             EVTYPE      N  meanFat totalFat meanInj totalInj
## 269           HEAT    767 1.221643      937 2.73794     2100
## 452      LIGHTNING  15754 0.051796      816 0.33198     5230
## 123 EXCESSIVE HEAT   1678 1.134088     1903 3.88856     6525
## 164          FLOOD  25326 0.018558      470 0.26806     6789
## 854      TSTM WIND 219940 0.002292      504 0.03163     6957
## 830        TORNADO  60652 0.092874     5633 1.50607    91346
barchart(totalInj ~ EVTYPE, data = plotData, xlab = "Event Type", ylab = "Total Number of Injuries", 
    main = "Total Injuries per Event Type (Top 5)")

plot of chunk unnamed-chunk-7

Results

It's possible to see by the information presented on the Data Processing section that the weather event type which has the higher economic impact is the Flood, which has caused an economic damage of more than 150 billion dollars from 1950 to 2011. This event has not only a high damage per occurrence but also a high frequency (more than 25k occurrences), which contributes to its first position on the ranking. Hurricanes and Typhoon occupy the second place, however these events are rare (only 88 occurrences), followed by Tornado.

Regarding the health damage, the analysis showed that Tornado is by far the event type which has caused the highest number of fatalities and injuries over all years accounted. It caused more than 5.6k fatalities and more than 90k injuries on US population.