Flood as the significant cause of population and property/crop damage in the United States between 1950 and 2001

Synopsis

This simple analysis shows that flood was a very significant cause of both fatalities/injuries and property damage across the United States between 1950 and 2001. It also shows that cataclisms causing the biggest number injuries and fatalities are not automatically responsible for the highest property damage - and vice versa.

Data and prerqeuqisites

This project involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. The data used in the analysis can be downloaded from here (there's no code for downloading the file since it has been cached from R Studio).

setwd("/Users/barbara/Data/")
stormData <- read.csv("stormData.csv.bz2")

The lattice library has been used in the analysis.

library(lattice)

Data Preprocessing

In the light of posed question, data preprocessing would mostly needed for the clearer identifitacion of event types.

Here the prepaparation was limited to:

EVTYPE <- toupper(stormData$EVTYPE)
EVTYPE <- gsub("FLASH FLOOD", "FLOOD", EVTYPE)
EVTYPE <- gsub("FLASH FLOODING", "FLOOD", EVTYPE)
EVTYPE <- gsub("FLOOD/FLOOD", "FLOOD", EVTYPE)
EVTYPE <- gsub("HIGH WIND", "STRONG WIND", EVTYPE)
EVTYPE <- gsub("TSTM WIND", "THUNDERSTORM WIND", EVTYPE)
EVTYPE <- gsub("HEAT", "EXTREME HEAT", EVTYPE)
EVTYPE <- gsub("EXTREME EXTREME COLD", "EXTREME COLD", EVTYPE)
EVTYPE <- gsub("EXTREME EXTREME EXTREME COLD", "EXTREME COLD", EVTYPE)
EVTYPE <- gsub("EXTREME EXTREME EXTREME COLD/WIND CHILL", "EXTREME COLD", EVTYPE)
EVTYPE <- gsub("EXTREME COLD/WIND CHILL", "EXTREME COLD", EVTYPE)
EVTYPE <- gsub("EXTREME EXTREME COLD/WIND CHILL", "EXTREME COLD", EVTYPE)
EVTYPE <- gsub("EXTREME EXTREME HEAT", "EXTREME HEAT", EVTYPE)
EVTYPE <- gsub("EXCESSIVE EXTREME HEAT", "EXTREME HEAT", EVTYPE)
EVTYPE <- gsub("EXTREME EXTREME EXTREME HEAT", "EXTREME HEAT", EVTYPE)
EVTYPE <- gsub("EXTREME EXTREME EXTREME HEAT WAVE", "EXTREME HEAT", EVTYPE)
EVTYPE <- gsub("EXTREME HEAT WAVE", "EXTREME HEAT", EVTYPE)
stormData$EVTYPE <- EVTYPE
PROPDMGEXP <- toupper(stormData$PROPDMGEXP)
stormData$PROPDMGEXP <- PROPDMGEXP
CROPDMGEXP <- toupper(stormData$CROPDMGEXP)
stormData$CROPDMGEXP <- CROPDMGEXP
stormDataCleaned <- stormData[(stormData$CROPDMGEXP == "") | (stormData$PROPDMGEXP == 
    "0") | (stormData$PROPDMGEXP == "H") | (stormData$PROPDMGEXP == "K") | (stormData$PROPDMGEXP == 
    "M") | (stormData$PROPDMGEXP == "B"), ]
stormDataCleaned <- stormDataCleaned[(stormDataCleaned$CROPDMGEXP == "") | (stormDataCleaned$CROPDMGEXP == 
    "0") | (stormDataCleaned$CROPDMGEXP == "H") | (stormDataCleaned$CROPDMGEXP == 
    "K") | (stormDataCleaned$CROPDMGEXP == "M") | (stormDataCleaned$CROPDMGEXP == 
    "B"), ]

dataLength <- length(stormDataCleaned[, 1])

Clean dataset consists of 897971 rows.

Results

Results of the analysis refer to two questions: 1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  1. Across the United States, which types of events have the greatest economic consequences?

To answer the first question sorting fatalities and injuries by the type of cataclism was followed by the examination of the most harmful events.


fatalitiesByType <- sort(tapply(stormDataCleaned$FATALITIES, stormDataCleaned$EVTYPE, 
    sum), decreasing = TRUE)

injuriesByType <- sort(tapply(stormDataCleaned$INJURIES, stormDataCleaned$EVTYPE, 
    sum), decreasing = TRUE)

fatInj <- cbind(fatalitiesByType, injuriesByType)
eventFatInj <- data.frame(rownames(fatInj), fatInj)

colnames(eventFatInj) <- c("EventType", "Fatalities", "Injuries")
head(eventFatInj, n = 100)

Quick examination of the 100 most harmful events allows us to say that thornadoes are responsible for the highest number of deaths(5588) and injuries (90429) during the period of observation. Remaining events cause at least ten times less damage to the population. Therefore in a chart Tornado has been remove as an outlier.

biggestFatInj <- eventFatInj[(eventFatInj$Injuries > 500) & (eventFatInj$Injuries < 
    10000), ]
percentageFatInj <- (mean(biggestFatInj[, 2]/biggestFatInj[, 3])) * 100


barchart(Fatalities + Injuries ~ EventType, data = biggestFatInj, scales = list(x = list(rot = 45, 
    cex = 0.8)), ylim = c(0, 10000), auto.key = list(columns = 2))

plot of chunk unnamed-chunk-6

Apart from tornado, the biggest number of injuries and fatalities are caused flood and extreme heat, with another significant results for thunderstorm and strong wind, lightning and extreme cold. Number of fatalities approximates to 17.3868 % of injuries.

The second question, concerning economic consequences, has been answered basing on declared values of the property and crop damage. The events were selected for which the amount of loss was declared in billions were selected and sorted.

biggestDamage <- stormDataCleaned[(stormDataCleaned$PROPDMGEXP == "B") | (stormDataCleaned$CROPDMGEXP == 
    "B"), ]

billionDamage <- length(biggestDamage[, 1])

propdamageByType <- sort(tapply(biggestDamage$PROPDMG, biggestDamage$EVTYPE, 
    sum), decreasing = TRUE)

cropdamageByType <- sort(tapply(biggestDamage$CROPDMG, biggestDamage$EVTYPE, 
    sum), decreasing = TRUE)

propCrop <- cbind(propdamageByType, cropdamageByType)
eventPropCrop <- data.frame(rownames(propCrop), propCrop)
colnames(eventPropCrop) <- c("EventType", "PropertyDamage", "CropDamage")

There were 43 events for which property or crop damage of a billion were declared.

Top 10 events were selected for a barchart for the reasons of clarity.

biggestEventPropCrop <- eventPropCrop[1:10, ]

barchart(PropertyDamage + CropDamage ~ EventType, data = biggestEventPropCrop, 
    scales = list(x = list(rot = 45, cex = 0.8)), ylim = c(0, 1000), auto.key = list(columns = 2))

plot of chunk unnamed-chunk-8

The amount of loss made by flood and ice storm are outstanding compared to other remaining bars. In both cases the crop damage exceeds the property damage, however it doesn't seem to be the rule for the whole dataset.

As one can see, floods seem to be an outstanding value in both plots. This fact seems to be important since these cataclisms are not outbreaking and can be easily avoided with careful and centralized water management policy.

biggestInjuriesAndDamage <- merge(biggestFatInj, biggestEventPropCrop, by = intersect(names(biggestFatInj), 
    names(biggestEventPropCrop)))

biggestInjuriesAndDamage
##      EventType Fatalities Injuries PropertyDamage CropDamage
## 1        FLOOD       1477     8581          123.5      729.7
## 2 WINTER STORM        206     1275            5.0        2.5

There are only 3 events which occur as most harmful both to population and property: flood, wind storm and tornado (excluded from the first chart and visible on the second barchart). This means that the top-damaging cataclisms do not intersect heavily.