Synopsis

In this work an analysis of the effects of storms and other severe weather events in public health and in economic activity is carried out from public data covering the period of time 1950-2011. The work provides an answer to the following questions:

The research finds as conclussion that TORNADO, EXCESSIVE HEAT (and HEAT), FLASH FLOOD (and FLOOD) plus LIGHTNING are the most devastating phenomena’s for population health. On the other hand, FLOOD, HURRICATE/TYPHOON, TORNADO and STORM SURGE are the ones that cause the most total economic consequences, while if we consider only crop damages DROUGHT and FLOOD (and RIVER FLOOD) are the key ones.

Data Processing

Original Data

Data is provided in a compressed file repdata_data_StormData.csv.bz2 that is assumed in the ./data directory. It is publicly available (National Oceanic and Atmospheric Administration) and can be downloaded from Storm Data. Accompanying documentation and FAQ are available at Storm Data Documentation and FAQ.

Reading the Data

data.raw <- read.csv("data/repdata_data_StormData.csv.bz2")

Data Analysis

As the data set contains a lot of information not needed for the analysis and due to the big size, the first task will be to select the columns of interest reducing the data set (dplyr library is used).

Two different datasets are created from the original one: one for the population damages and another one for the economic damages. In both cases different datasets are created to order separately Fatalities and Injuries in the case of population consequences and Property and Crop damages in case of economic consequences. Another dataset with total economic consequences is also created.

Data Transformation

In the case of economic damages due to the special codification of the data (a base number plus an exponent) it is required to pre-process the data to convert the exponent from a qualitative format to a numeric one. In the original data each exponent is saved in the PROPDMGEXP and CROPDMGEXP variables as a character like for example B (for Billion), M (for Million) K (for thousands), etc or a digit \(x\) (for \(10^x\)) and should be converted to a numeric multiplicator (K=1000, M=1000000, etc). Then the real damage is computed as the result of muliplying the base number times the new numeric multiplicator.

library(dplyr)
library(ggplot2)
library(gridExtra)
data   <- select(data.raw, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

# Analysis for population Health
pop    <- filter(data, FATALITIES!=0 | INJURIES!=0) %>% select(EVTYPE, FATALITIES, INJURIES)
popsum <- group_by(pop, EVTYPE) %>%  summarize(Fatalities=sum(FATALITIES), Injuries=sum(INJURIES))
sumf   <- arrange(popsum, -Fatalities) %>% select(-Injuries) %>% slice(1:10)
sumf$EVTYPE <- factor(sumf$EVTYPE, levels=sumf$EVTYPE[order(-sumf$Fatalities)])
sumi   <- arrange(popsum, -Injuries) %>% select(-Fatalities) %>% slice(1:10)
sumi$EVTYPE <- factor(sumi$EVTYPE, levels=sumi$EVTYPE[order(-sumi$Injuries)])

# Analysis for economic damage
# Function to change the exponent column by a proper multiplicator
chgexp <- function(exp) {
    f <- factor(toupper(exp))
    f <- factor(sub("0", "X", f))
    f <- factor(sub("2", "100", f))
    f <- factor(sub("H", "100", f))
    f <- factor(sub("3", "1000", f))
    f <- factor(sub("K", "1000", f))
    f <- factor(sub("4", "10000", f))
    f <- factor(sub("5", "100000", f))
    f <- factor(sub("6", "1000000", f))
    f <- factor(sub("M", "1000000", f))
    f <- factor(sub("7", "10000000", f))
    f <- factor(sub("B", "1000000000", f))
    f <- factor(sub("\\+", "1", f))
    f <- factor(sub("-", "-1", f))
    f <- factor(sub("X", "1", f))
    f <- factor(sub("\\?", "0", f))
    f <- factor(sub("^$", "1", f))
    return(f)
}

# Subset data to rows and columns of interest for economic data
eco <- filter(data, PROPDMG!=0 | CROPDMG!=0)  %>% select(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
        select(type=EVTYPE, propbase=PROPDMG, propexp=PROPDMGEXP, cropbase=CROPDMG, cropexp=CROPDMGEXP)
# Capitals
eco$type    <- toupper(eco$type)
# Change the exponents from a conceptual representation (K, M, B, etc) to a numeric factor
eco$propexp <- chgexp(eco$propexp)
eco$propdmg <- eco$propbase*as.numeric(as.character(eco$propexp))
eco$cropexp <- chgexp(eco$cropexp)
eco$cropdmg <- eco$cropbase*as.numeric(as.character(eco$cropexp))
# Group each type of event adding up the property damage and Crop Damange
ecosum      <- group_by(eco, type) %>%  summarize(Propdmg=sum(propdmg), Cropdmg=sum(cropdmg))
sumP        <- arrange(ecosum, -Propdmg) %>% slice(1:10)
sumP$type   <- factor(sumP$type, levels=sumP$type[order(-sumP$Propdmg)])
sumC        <- arrange(ecosum, -Cropdmg)  %>% slice(1:10)
sumC$type   <- factor(sumC$type, levels=sumC$type[order(-sumC$Cropdmg)])
sumT        <- mutate(ecosum, totalDmg=Propdmg+Cropdmg)
sumT        <- arrange(sumT, -totalDmg) %>% slice(1:10)
sumT$type   <- factor(sumT$type, levels=sumT$type[order(desc(sumT$totalDmg))])

Results

Across the United States, which types of events are most harmful with respect to population health?

As the following panel plot shows TORNADO, EXCESSIVE HEAT, FLASH FLOOD, HEAT, LIGHTNING are the key events causing fatalities while for Injuries, TORNADO, TSTM WIND, FLOOD, EXCESSIVE HEAT and LIGHTNING are the most important ones.

Even if from a quantitative point of view it is not possible to add Fatalities and Injuries or even establish an equivalence; it is clear that the key events for both results are mostly common even if in some cases in different order of importance.

It is important to notice that in both cases TORNADOS are, by far, the most devastating: 46.6269349% of Fatalities and 72.7578297% of Injuries. In the case of Injuries the TORNADOS consequences overwhelms any other events.

In sum TORNADO, EXCESSIVE HEAT (and HEAT), FLASH FLOOD (and FLOOD) plus LIGHTNING are the most devastating phenomena’s.

# Graph to show events ordered by Fatalities
th     <- theme(axis.text=element_text(size=6))
#axis.text.y=element_text(angle=45,vjust=1)
g1     <- ggplot(sumf, aes(x=EVTYPE, y=Fatalities)) + 
          geom_bar(stat="identity", fill="red") + coord_flip() + xlab("") + th
# Graph to show events ordered by Injuries
g2     <- ggplot(sumi, aes(x=EVTYPE, y=Injuries)) + 
          geom_bar(stat="identity", fill="salmon4") + coord_flip() + xlab("") + th
# Panel Plot
grid.arrange(g1, g2, ncol=2, main="Population Health consequences", widths=c(1,1))

Figure 1 Most harmful events for population. Analysis by Fatalities and Injuries to observe communalities and differences

Across the United States, which types of events have the greatest economic consequences?

In that case as the panel shows that:

  • If we consider total economic damage or property damage FLOOD, HURRICANE/THYPHOON, TORNADO and STORM SURGE are the most devastating ones.
  • When considering Crops damage, the most devastating events are DROUGHT, FLOOD, RIVER FLOOD and ICE STORM. Clearly crops are more influenced by “water” events.

From the data we can appreciate a clear difference between property damanages and crop damages, but in total the property damages are most important in economic terms.

# Graph to show events ordered by property damage
grid.newpage()
g3          <- ggplot(sumP, aes(x=type, y=Propdmg)) + 
               geom_bar(stat="identity", fill="blue4") + coord_flip() + xlab("") + th
# Graph to show events ordered by crop damage
g4          <- ggplot(sumC, aes(x=type, y=Cropdmg)) + 
               geom_bar(stat="identity", fill="salmon4") + coord_flip() + xlab("") + th
# Graph to show events ordered by total damage
g5          <- ggplot(sumT, aes(x=type, y=totalDmg)) + geom_bar(stat="identity", fill="red") +
               coord_flip() + xlab("") + ylab("Total Damages") + th
# Plot of the graphs in one unique side by side
grid.arrange(g3, g4, g5, ncol=3, main="Damages to Property, Crops and Total", widths=c(1,1,1))

Figure 2 Events that have the greatest economic consequences. Analysis by Property and Crops as well as total damage