Tornadoes cause the most damage to public health while hurricanes have the highest economical damage

This study was carried using the National Weather Service Storm Data Documentation. The data was accessed on 5/29/2014 and all the manipulations were documented below.

The study aims to answer the two following questions:

  1. Across the United States, which types of events are the most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

The analysis revealed that Tornadoes cause the most harm to public health considering fatalities and injuries both.

Additionally Hurricanes/typhoons caused most of the economic damage in the order of billions of U.S. dollars.

Data Processing

The RDS data is read to a data frame using only the necessary columns, as the reading takes a lot of processing power.

## Loading required package: ggplot2
stormdt <- read.csv("repdata-data-StormData.csv.bz2", header = TRUE, colClasses = c("factor", 
    "character", "NULL", "NULL", "factor", "character", "factor", "factor", 
    rep("NULL", 10), "numeric", "numeric", "factor", "factor", "numeric", "numeric", 
    "numeric", "factor", "numeric", "factor", rep("NULL", 9)))

head(stormdt, 3)
##   STATE__          BGN_DATE COUNTY COUNTYNAME STATE  EVTYPE LENGTH WIDTH F
## 1    1.00 4/18/1950 0:00:00  97.00     MOBILE    AL TORNADO   14.0   100 3
## 2    1.00 4/18/1950 0:00:00   3.00    BALDWIN    AL TORNADO    2.0   150 2
## 3    1.00 2/20/1951 0:00:00  57.00    FAYETTE    AL TORNADO    0.1   123 2
##    MAG FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 0.00          0       15    25.0          K       0           
## 2 0.00          0        0     2.5          K       0           
## 3 0.00          0        2    25.0          K       0

Public Health

Once a data frame is established, the parameters of analyzing the damage done to public health should be considered. In this study we focused on total number of fatalities(fat) and injuries(inj). There are both indirect and direct consequences for both parameters but this study considers both.

We simply need to add the total number of fatalities (and injuries) for each event type (EVTYPE). And bind these two new data frames by rows.

fat <- aggregate(FATALITIES ~ EVTYPE, data = stormdt, FUN = sum, na.rm = TRUE)
fat <- fat[order(fat[, 2], decreasing = TRUE), ]
fat$type <- rep("fatality", length(fat[, 1]))
names(fat) <- c("EvType", "Damage", "DmgType")
head(fat, 3)
##             EvType Damage  DmgType
## 834        TORNADO   5633 fatality
## 130 EXCESSIVE HEAT   1903 fatality
## 153    FLASH FLOOD    978 fatality
inj <- aggregate(INJURIES ~ EVTYPE, data = stormdt, FUN = sum, na.rm = TRUE)
inj <- inj[order(inj[, 2], decreasing = TRUE), ]
inj$type <- rep("injury", length(inj[, 1]))
names(inj) <- c("EvType", "Damage", "DmgType")
head(inj, 3)
##        EvType Damage DmgType
## 834   TORNADO  91346  injury
## 856 TSTM WIND   6957  injury
## 170     FLOOD   6789  injury
health <- rbind(fat, inj)

Since there are a total of 195 event types, we need to subset the ones that are most serious public healtwise; for a better understanding. Here events with less than a total of 1000 injuries and 100 fatalities are not considered to be serious.

Additionally Tornadoes have a significant lead in fatalities and injuries so to check which event types follow tornadoes, we subset the data frame again without Tornadoes.

srshealth <- health[health$Damage > 1000 & health$DmgType == "injury" | health$Damage > 
    100 & health$DmgType == "fatality", ]
sanstornado <- srshealth[-grep("TORNADO", srshealth$EvType), ]

Economic Consequences

Once a data frame is established, the parameters of analyzing the economic consequences should be considered. In this study we focused on Property Damage(propDmg) and Crop Damage(cropDmg).

Although there exists detailed damage figures in the data the 'highest' economic consequences are decided to be the ones in the order of Billions of US Dollars. The PROPDMGEXP and CROPDMGEXP variables in the data gives the exponential of the damage where 'B' indicates billions.

We simply need to subset new data frames for each damage type that have 'B' in their [PC]ROPDMGEXP variable. And bind these two new data frames by rows.

propDmg.f <- factor(stormdt[grep("[Bb]", stormdt$PROPDMGEXP), 6])
propDmg <- as.data.frame(propDmg.f)
colnames(propDmg) <- "EvType"
propDmg$DmgType <- rep("property", length(propDmg$EvType))
head(propDmg, 3)
##                      EvType  DmgType
## 1              WINTER STORM property
## 2 HURRICANE OPAL/HIGH WINDS property
## 3            HURRICANE OPAL property
cropDmg.f <- factor(stormdt[grep("[Bb]", stormdt$CROPDMGEXP), 6])
cropDmg <- as.data.frame(cropDmg.f)
colnames(cropDmg) <- "EvType"
cropDmg$DmgType <- rep("crop", length(cropDmg$EvType))
head(cropDmg, 3)
##        EvType DmgType
## 1        HEAT    crop
## 2 RIVER FLOOD    crop
## 3     DROUGHT    crop
HDmg <- rbind(propDmg, cropDmg)

Results

Public Health

Two plots are drawn using the data that was cleaned above:

qplot(x = EvType, y = Damage, data = srshealth, facets = DmgType ~ ., geom = "bar", 
    stat = "identity", main = "Total Number of Fatalities or Injuries per Event Type") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0)) + facet_wrap(~DmgType, 
    scales = "free_y", nrow = 2)

plot of chunk unnamed-chunk-6

qplot(x = EvType, y = Damage, data = sanstornado, facets = DmgType ~ ., geom = "bar", 
    stat = "identity", main = "Total Number of Fatalities or Injuries per Event Type (w/o Tornadoes)") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0)) + facet_wrap(~DmgType, 
    scales = "free_y", nrow = 2)

plot of chunk unnamed-chunk-6

The first one clearly shows that Tornadoes caused the most fatalities and injuries throughout the data. The second one shows Excessive Heat folows tornadoes, and is followed by Floods and Thunderstorms with Wind.

Economic Consequences

A single Plot is drawn using the data that was cleaned above:

qplot(EvType, data = HDmg, facets = DmgType ~ ., geom = "bar", ylab = "Times Damage Exceeds USD Billions", 
    main = "Number of Times the Event Type damage Exceeded USD Billions") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + facet_wrap(~DmgType, 
    scales = "free_y", nrow = 2)

plot of chunk unnamed-chunk-7

This plot clearly shows that Hurricanes/ Typhoons cause the highes damage on property while Drought is the most damaging natural disaster when it comes to crops.