1. Synopsis

This report uses the data from the NOAA Storm Database to find out the types of events that has are most harmful to population health, as measured by the number of fatalities, and number of injuries. This report also identifies the types of events that have the greatest economic consequences, as measured by the sum of property damage and crop damage.

2. Data Processing

2.1. Load required libraries and the NOAA Storm Database from the Internet

Firstly, the required libraries are loaded. The URL for the database is set and the file is downloaded into the working directory, unzipped, and read into the data variable.

library(R.utils)
library(ggplot2)
theURL <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destFile <- "stormdata.csv.bz2"
csvFile <- "stormdata.csv"
download.file(theURL, destfile=destFile, method="auto")
bunzip2(destFile, csvFile, remove=FALSE, overwrite=TRUE)

data <- read.csv(csvFile)

2.2. Define helper function for standardising the units used in measurement of damages

The expLetter2Number function is defined. The function takes a parameter, x, which takes its values from data$PROPDMGEXP. The letters (B, M, K, H) are evaluated to the corresponding number of zeroes, while other characters (+, -, ?) and blank are assumed to be errorneous and evaluated to no zeroes.

expLetter2Number <- function(x)
{
    if (is.na(suppressWarnings(as.numeric(x))))
    {
        switch(x,
            "b"=9,
            "B"=9,
            "m"=6,
            "M"=6,
            "k"=3,
            "K"=3,
            "h"=2,
            "H"=2,
            "-"=0,
            "?"=0,
            "+"=0,
            0
        )
    }
    else
    {
        as.numeric(x)
    }
}

2.3. Calculate absolute amount of damages

The property and crop damage values are calculated, taking into account the labels indicating billions, millions, etc. by using the expLetter2Number function, and then raising 10 to the power of the return value.

data$PROPDMGEXP <- toupper(data$PROPDMGEXP)
data$CROPDMGEXP <- toupper(data$CROPDMGEXP)

data$PROPDMGEXP_NUM <- sapply(data$PROPDMGEXP, expLetter2Number)
data$CROPDMGEXP_NUM <- sapply(data$CROPDMGEXP, expLetter2Number)

data$PROPDMG_VAL <- data$PROPDMG * 10 ^ data$PROPDMGEXP_NUM
data$CROPDMG_VAL <- data$CROPDMG * 10 ^ data$CROPDMGEXP_NUM

2.4. Sorting the data

Fatality and injury numbers are aggregated by the event type, and then sorted and reordered accordingly.

fatalities <- aggregate(FATALITIES ~ EVTYPE, data, sum)
injuries <- aggregate(INJURIES ~ EVTYPE, data, sum)

fatalities <- fatalities[order(fatalities$FATALITIES, decreasing = TRUE),]
injuries <- injuries[order(injuries$INJURIES, decreasing = TRUE),]

fatalities <- transform(fatalities, EVTYPE=reorder(EVTYPE, -FATALITIES))
injuries <- transform(injuries, EVTYPE=reorder(EVTYPE, -INJURIES))

Property and crop damage figures are aggregated by the event type, and then sorted and reordered according to decreasing property and crop damage.

propDamage <- aggregate(PROPDMG_VAL ~ EVTYPE, data, sum)
cropDamage <- aggregate(CROPDMG_VAL ~ EVTYPE, data, sum)

propDamage <- propDamage[order(propDamage$PROPDMG_VAL, decreasing = TRUE),]
cropDamage <- cropDamage[order(cropDamage$CROPDMG_VAL, decreasing = TRUE),]

propDamage <- transform(propDamage, EVTYPE=reorder(EVTYPE, -PROPDMG_VAL))
cropDamage <- transform(cropDamage, EVTYPE=reorder(EVTYPE, -CROPDMG_VAL))

Finally, the total economic damage is defined in this report as the sum of the property and crop damage. Therefore, these two data frames are merged and the property and crop damage columns are totalled to find the total economic damage, and the merged data frame is sorted and reordered according to decreasing economic damage.

TOTALDMG <- merge(propDamage, cropDamage, by = "EVTYPE")
TOTALDMG$TOTALDMG = TOTALDMG$PROPDMG_VAL + TOTALDMG$CROPDMG_VAL
TOTALDMG <- TOTALDMG[order(TOTALDMG$TOTALDMG, decreasing = TRUE),]

TOTALDMG <- transform(TOTALDMG, EVTYPE=reorder(EVTYPE, -TOTALDMG))

3. Results

3.1 Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

A barplot of the top 10 largest number of fatalities is plotted against the event type. Clearly, tornado is the most harmful event, with respect to fatalities.

library(ggplot2)
ggplot(fatalities[1:10,], aes(y=FATALITIES, x=EVTYPE)) + 
    geom_bar(stat="identity", fill="red") + 
    labs(title="Fatilities", y="Fatilities", x="Event") +
    theme(axis.text.x = element_text(angle = 90))

plot of chunk unnamed-chunk-7

A barplot of the top 10 largest number of injuries is plotted against the event type. Clearly, tornado is also the most harmful event, with respect to injuries.

ggplot(injuries[1:10,], aes(y=INJURIES, x=EVTYPE)) + 
    geom_bar(stat="identity", fill="red") + 
    labs(title="Injuries", y="Injuries", x="Event") +
    theme(axis.text.x = element_text(angle = 90))

plot of chunk unnamed-chunk-8

3.2 Across the United States, which types of events have the greatest economic consequences?

A barplot of the top 10 highest economic damage (a sum of property and crop damage) is plotted against the event type. Clearly, flood has the greatest economic consequences.

ggplot(TOTALDMG[1:10,], aes(y=TOTALDMG/1000000, x=EVTYPE)) + 
    geom_bar(stat="identity", fill="blue") + 
    labs(title="Economic Damage", y="Damage in $millions", x="Event") +
    theme(axis.text.x = element_text(angle = 90))

plot of chunk unnamed-chunk-9