library(knitr)
library(dplyr)
library(ggplot2)
library(reshape2)

Synopsis

The Storm Data Preparation Guide from the National Oceanic and Atmospheric Association details how a storm event is classified. The classification is an elaborate process which makes for a inordinate large list of somewhat subjective event names. This classification includes the documentation of fatalities, injuries and cost of damage to properties and crops. These variables will be used to assess the extent of population and economic damage. The analysis in this document will start from the point of the variables representing the damage that occurred. These variables will be categorized and the weather events will be associated to the different levels of these variables. To that end it will be clear which weather events are attributable to the higher or more devastating levels of damage. Which will allow for those in the appropriate position to prioritize contingencies and budget accordingly.

Data Cleaning

Given the raw zipped csv file. The file is unzipped and loaded into the environment to a variable wdata. The variables with their associated event types are subsetted from the overall data. Each variable is subsetted from the wdata where at least an occurence has happened. As an example, the data set is filtered for events that have caused at least one injury into the variable injury.

# Load the data into a variable.
wdata <- read.csv("repdata-data-StormData.csv.bz2", header = TRUE,
                  stringsAsFactors = FALSE)
# Subset data pertaining to the four variables FATALITIES, INJURIES, PROPDMG and
# CROPDMG. Where each of these variables are greater than 0.
fatal <- filter(select(wdata, EVTYPE, FATALITIES), FATALITIES > 0)
injury <- filter(select(wdata, EVTYPE, INJURIES), INJURIES>0)
prop <- filter(select(wdata, EVTYPE, PROPDMG, PROPDMGEXP), PROPDMG >0)
crop <- filter(select(wdata, EVTYPE, CROPDMG, CROPDMGEXP), CROPDMG >0)

With the prop and crop subsets, the value of damage to property and crops is expressed over two columns. The first is a number given to a maximum of three decimal spaces, the second is the denomination of the number. The denominations are either thousands (K), millions(M) or billions(B) of dollars in damage. There are a few records in these subsets that dont specifiy clearly what the denomination is, as a result they will not be included. So the subsets will be reduced again to the records that have a clear value associated with the damage. First all lower case denominations will be converted to upper case characters. Then the two columns will combined into one column expressing the value of damage in thousands of dollars only. Finally the two subsets will be re-assigned with just the weather event and the value of the damages.

# Convert lower case denomination characters to upper case characters
 prop$PROPDMGEXP <- toupper(prop$PROPDMGEXP)
 crop$CROPDMGEXP <- toupper(crop$CROPDMGEXP)

# Subset the damage variables for only those records that have a properly
# designated denomination.

prop <- filter(prop, PROPDMGEXP == 'K'|PROPDMGEXP == 'M'|PROPDMGEXP == 'B')
crop <- filter(crop, CROPDMGEXP == 'K'|CROPDMGEXP == 'M'|CROPDMGEXP == 'B')

# Convert to an actual value of damages in dollars
propval <- mapply(function(x) switch(x, 'K' =1, 'M' = 1000, 'B' = 1000000),
                  prop$PROPDMGEXP, USE.NAMES = FALSE)*prop$PROPDMG
prop <- mutate(prop, DMGVAL = propval)
cropval <- mapply(function(x) switch(x, 'K' =1, 'M' = 1000, 'B' = 1000000),
                  crop$CROPDMGEXP, USE.NAMES = FALSE)*crop$CROPDMG
crop <- mutate(crop, DMGVAL = cropval)

# Re-assign the damages subsets with just the event and damage value

prop <- select(prop, EVTYPE, DMGVAL)
crop <- select(crop, EVTYPE, DMGVAL)

Exploratory Analysis

The variables are grouped by the event type and summarized with respect to the number of event occurences and total of the variable associated with that event. This will give two angles to cover. The first is ordering the event types by occurence and then by total. This will give a better idea of high impact low probability and low impact high probability events occur. Contingencies and associated budgeting can then be stratified into those two groupings.

The top 10 event occurences and totals are displayed below for each of the population health and economic damages variables. Figure 1. displays the number occurences of the event for each of the variables. Figure 2. displays the total fatalities and injuries for each event. Figure 3. displays the total damage attributable to property and crops.

# tabulate the number of occurences and the total fatalities and injuries as a result
# of the weather events

fatal_group <- group_by(fatal, EVTYPE)
fatal_summ <- summarize(fatal_group, 'Occur' = length(FATALITIES),
                       'Total' = sum(FATALITIES))

injury_group <- group_by(injury, EVTYPE)
injury_summ <- summarize(injury_group, 'Occur' = length(INJURIES),
                       'Total' = sum(INJURIES))

# sort the fatalities and injuries by number of occurences

fatal_summ_occ <- arrange(fatal_summ, desc(Occur))[1:10,]
fatal_summ_occ <- mutate(fatal_summ_occ, vName = 'FatalOccur')
injury_summ_occ <- arrange(injury_summ, desc(Occur))[1:10,]
injury_summ_occ <- mutate(injury_summ_occ, vName = 'InjuryOccur')

# sort the fatalities and injuries by total

fatal_summ_tot <- arrange(fatal_summ, desc(Total))[1:10,]
fatal_summ_tot <- mutate(fatal_summ_tot, vName = 'FatalTotal')
injury_summ_tot <- arrange(injury_summ, desc(Total))[1:10,]
injury_summ_tot <- mutate(injury_summ_tot, vName = 'InjuryTotal')
# tabulate the number of occurences and the associated costs of the weather events on 
# property and crop damages
prop_group <- group_by(prop, EVTYPE)
prop_summ <- summarize(prop_group, 'Occur' = length(DMGVAL),
                       'Total' = sum(DMGVAL))

crop_group <- group_by(crop, EVTYPE)
crop_summ <- summarize(crop_group, 'Occur' = length(DMGVAL),
                       'Total' = sum(DMGVAL))

# sort the property and crop damages by number of occurences

prop_summ_occ <- arrange(prop_summ, desc(Occur))[1:10,]
prop_summ_occ <- mutate(prop_summ_occ, vName = 'PropDmgOccur')
crop_summ_occ <- arrange(crop_summ, desc(Occur))[1:10,]
crop_summ_occ <- mutate(crop_summ_occ, vName = 'CropDmgOccur')

# sort the property and crop damages by damage value

prop_summ_tot <- arrange(prop_summ, desc(Total))[1:10,]
prop_summ_tot <- mutate(prop_summ_tot, vName = 'PropTotal')
crop_summ_tot <- arrange(crop_summ, desc(Total))[1:10,]
crop_summ_tot <- mutate(crop_summ_tot, vName = 'CropTotal')
summ_occ <- rbind(fatal_summ_occ, injury_summ_occ, prop_summ_occ,
                  crop_summ_occ)

summ_occ$vName <- factor(summ_occ$vName)

g <- ggplot(summ_occ, aes(x = EVTYPE, y = Occur, fill = vName))
g <- g + geom_bar(stat = 'identity') + coord_flip()
g <- g + facet_wrap(~ vName)
g

Figure 1. Event Occurences Per Variable

summ_health<- rbind(fatal_summ_tot, injury_summ_tot)

summ_health$vName <- factor(summ_health$vName)

g <- ggplot(summ_health, aes(x = EVTYPE, y = Total, fill = vName))
g <- g + geom_bar(stat = 'identity') + coord_flip()
g <- g + facet_wrap(~ vName)
g

Figure 2. Total Fatalities and Injuries Per Event

summ_dmg<- rbind(prop_summ_occ,crop_summ_occ)

summ_dmg$vName <- factor(summ_dmg$vName)

g <- ggplot(summ_dmg, aes(x = EVTYPE, y = Total, fill = vName))
g <- g + geom_bar(stat = 'identity') + coord_flip()
g <- g + facet_wrap(~ vName)
g

Figure 3. Total Property and Crop Damage Per Event

Results

The purpose of this document was two fold, the first was to see which event was most harmful to population health. From Figure 1. the event that has the highest occurence and poses the most threat in terms of frequency are TORNADOES. With respect to the number of fatalities and injuries, TORNADOES again have caused the most damage to population health.

The second important question, was which event had the greatest economic consequences. Again Figure 1. is the starting point, and it shows the most frequent problems to property came from WIND related events. TORNADOES and THUNDERSTORMS were primarily the events that occured with the most frequency in property damage cases. Second to that was HAIL and FLOODING, which is the primary event that occured with the most frequency in crop damage. From Figure 3. the event that caused the highest impact with relatively less frequency was FLOODING with the higher frequency event TORNADOES coming in a distant second.