Synopsis

In this project, we explore the hazardous effects of the different types of weather events having taken place in the United States from 1950 to 2011. One of the aspects being examined is the number of fatalities and injuries caused by such events. Another aspect is the amount of economic costs incurred by the weather events, measured by both property and crop damages. The result points to tornado as the most physically harmful event type during this period of time. The analysis also indicates flood as the most damaging event type economically.

Data Processing

The first step of data processing is to obtain the data file by donwloading it from the hosting site. The data file is immediately loaded into R afterwards.

setwd("/home/wshao/school/coursera/RepData-015/Proj02")
download.file("http://d396qusza40orc.cloudfront.net/repdata/data/StormData.csv.bz2", dataFileName)
dataFileName <- "repdata_data_StormData.csv.bz2"
df <- read.csv(dataFileName)

Subsequently, some data cleaning procedures are performed. First, the data in the exponent component columns of the property and crop damage have not consistently been entered as integer numbers, but as letter symbols. For example, the entries “K”, M“,”B" would mean “3”, “6”, “9” respectively. Thus a routine is used for enforcing this consistency.

# Define the mapping of symbol to actual exponential power
expmap <- list('+'='','-'='','?'='',B='9',b='9',H='2',h='2',K='3',k='3',M='6',m='6')

# Define function of looking up expmap
lookup.exp <- function(key) {
  return(ifelse(key %in% names(expmap),expmap[[key]],key))
}

# Define function of calculating the final number
lookup.exp.and.calc <- function(val) {
  raw.val = val[1]
  the.exp = lookup.exp(as.character(val[2]))
  if ((raw.val == '') || (the.exp == '')) {
    return(0)
  } else {
    return(as.integer(raw.val) * 10^as.integer(the.exp))
  }  
}

The final calculations of damages in absolute dollar amount are stored in three new columns to the data frame:

df$Property.Damage.Amount <- apply(df[c('PROPDMG','PROPDMGEXP')], 1, lookup.exp.and.calc)
df$Crop.Damage.Amount <- apply(df[c('CROPDMG','CROPDMGEXP')], 1, lookup.exp.and.calc)
df$Total.Damage.Amount <- df$Property.Damage.Amount + df$Crop.Damage.Amount

Next, some work is needed to enforce the consistency in the EVTYPE column of the data frame. To facilitate the matching of each raw entry in the EVTYPE column with the standard 48 event types, the comparisons ought to be case insensitive. The shorthand notation “Tstm Wind” is taken to mean “Thunderstorm Wind”, and some other humanly intrepretable entries are listed in the helper mapping function.

Some longer standard event type names happen to literally include shorter standard event type names. For example, “Excessive Heat” contains “Heat”, and “Marine Hail” contains “Hail”. Therefore we would designate the the longer standard event type names to be searched first to avoid the categorization into the wrong event types.

Since to err is human, it is entirely possible that the raw entries do not match any of the 48 standard event type names. For example, the raw event type that starts with the word “Summary” seems to have been entered for administrative purposes. There could also be times where the event type is misspelled. In all of these cases we simply allow the original event data type to be the final event type in consideration. These cases do not matter in the end result of the analysis, since their actual fatality, injury, and economic damage tallies are relatively minuscule in comparison to the top-grossing standard event types in the Results section.

# Clean up the event type a little bit
evcate <- c('Marine Hail', 'Marine High Wind', 'Marine Strong Wind', 'Marine Thunderstorm Wind', 'Excessive Heat',
            'Extreme Cold/Wind Chill', 'Heavy Snow', 'Astronomical Low Tide', 'Avalanche', 'Blizzard',
            'Coastal Flood', 'Cold/Wind Chill', 'Debris Flow', 'Dense Fog', 'Dense Smoke',
            'Drought', 'Dust Devil', 'Dust Storm', 'Flash Flood', 'Flood',
            'Freezing Fog', 'Frost/Freeze', 'Funnel Cloud', 'Hail', 'Heat',
            'Heavy Rain', 'High Surf', 'High Wind', 'Hurricane (Typhoon)', 'Ice Storm',
            'Lakeshore Flood', 'Snow', 'Lightning', 'Rip Current', 'Seiche',
            'Sleet', 'Storm Surge/Tide', 'Strong Wind', 'Thunderstorm Wind', 'Tornado',
            'Tropical Depression', 'Tropical Storm', 'Tsunami', 'Volcanic Ash', 'Waterspout',
            'Wildfire', 'Winter Storm', 'Winter Weather')

# The helper mapper
evcate.helpmap <- list('extreme cold' = 'Extreme Cold/Wind Chill',
                       'extreme wind chill' = 'Extreme Cold/Wind Chill',
                       'cold' = 'Cold/Wind Chill',
                       'wind chill' = 'Cold/Wind Chill',
                       'frost' = 'Frost/Freeze',
                       'freeze' = 'Frost/Freeze',                 
                       'hurricane' = 'Hurricane (Typhoon)',
                       'typhoon' = 'Hurricane (Typhoon)',
                       'storm surge' = 'Storm Surge/Tide',
                       'storm tide' = 'Storm Surge/Tide',
                       'tstm wind' = 'Thunderstorm Wind',
                       'blowing snow' = 'Blizzard')

# The cleaning routine
clean.evtype <- function(otype) {

    # Convert to lower case, trim whitespace for the event type name
    otype <- tolower(as.character(otype))
    otype <- gsub("^\\s+|\\s+$", "", otype)
    
    # First pass, scan inside the helper table evcate.helpmap
    for (possibleOtype in names(evcate.helpmap)) {
        if (grepl(possibleOtype, otype, ignore.case = TRUE)) {
            return(evcate.helpmap[[possibleOtype]])
        }
    }
    
    # Second pass, scan for the known event type types themselves
    for (evc in evcate) {
        if (grepl(evc, otype, ignore.case = TRUE)) {
          return(evc)
        }
    }
    
    # Third option: Return whatever is the original event type

    # Make to be title case
    otype <- gsub("\\b([a-z])([a-z]+)", "\\U\\1\\L\\2" , otype, perl=TRUE)
    return(otype)
}

# Apply the cleaning function and create new column EVCATE, the cleaned version of EVTYPE
df$EVCATE <- apply(df[c('EVTYPE')], 1, clean.evtype)

Results

The total fatalities, injuries, and damage amounts are tallied for each event type.

library(plyr)
dfByEvcate <- ddply(df[c('EVCATE','FATALITIES','INJURIES','Total.Damage.Amount','Property.Damage.Amount','Crop.Damage.Amount')], .(EVCATE), numcolwise(sum))

For the fatality analysis, we look at the top 10:

# Most harmful - fatalities
dfEvForFat <- dfByEvcate[with(dfByEvcate, order(-FATALITIES)),c('EVCATE','FATALITIES')]
dfEvForFat <- dfEvForFat[1:10,]
par(mai=c(1.5,3.0,1,1))
barplot(dfEvForFat$FATALITIES, col='red', horiz = TRUE,
        main = 'Top 10 Fatality-Causing Weather Event Types in the US',
        xlab = '# of Fatalities', las = 1,
        names.arg = dfEvForFat$EVCATE)

plot of chunk unnamed-chunk-7

Tornado seems to be by far the most dangerous in this aspect, having caused more than 5000 reported deaths from 1950 to 2011. Excessive heat and heat are the second and third most leading fatality-causing events, respectively. We also look at the top 10 injury-causing events:

# Most harmful - injuries
dfEvForInj <- dfByEvcate[with(dfByEvcate, order(-INJURIES)),c('EVCATE','INJURIES')]
dfEvForInj <- dfEvForInj[1:10,]
par(mai=c(1.5,3.0,1,1))
barplot(dfEvForInj$INJURIES, col='yellow', horiz = TRUE,
        main = 'Top 10 Injury-Causing Weather Event Types in the US',
        xlab = '# of Injuries', las = 1,
        names.arg = dfEvForInj$EVCATE)

plot of chunk unnamed-chunk-9

Tornado has caused more than 90,000 injuries in this 61-year period, making it the most dangerous injury-causing event as well. Thunerstorm wind and flood came in at second and third, respectively.

Looking at the economic effects of storm events, we would graph the top 10 events causing the most financial damages, as well as their specific breakdown of property damage and crop damage:

#Most Economically damaging
dfEvForDam <- dfByEvcate[with(dfByEvcate, order(-Total.Damage.Amount)),c('EVCATE','Total.Damage.Amount','Property.Damage.Amount','Crop.Damage.Amount')]
dfEvForDam <- dfEvForDam[1:10,]
# Grouped bar plot
library(ggplot2)
library(reshape2)
dfEvForDamMelted <- melt(dfEvForDam[,c('EVCATE','Total.Damage.Amount','Property.Damage.Amount','Crop.Damage.Amount')],id.vars = 1)
g3 <- ggplot(dfEvForDamMelted, aes(x = EVCATE, y = value))
g3 <- g3 + geom_bar(aes(fill = variable),position = "dodge", stat="identity")
g3 <- g3 + ggtitle("Top 10 Economically Damaging Weather Event Types in the US")
g3 <- g3 + xlab("Event Type") + ylab("Cost in USD") + coord_flip()
print(g3)

plot of chunk unnamed-chunk-11

Flood caused the most economic damage in the US from 1950 to 2011. It is notable to mention that property damage account for over 90% of the total damage amount in each of the top four event cagetories: flood, hurricane, tornado, and storm surge/tide. In the top 10, only drought and ice storm has caused more crop damage than property damage.