Purpose

Identify the weather events that most affects a country’s population health as well as those resulting in the most damage.

Synopsis

After downloading the weather storm data from the specified site, read in the datafile identifying the following weather events that have fatalities, property damage, and crop damage associated with them. Since the actual weather event descriptions are free form text, there is a strong possibility that this data is inconsistent, duplicated, and misspelled. With this fact, group the weather events into more consistent categories before identifying the top 25 events resulting in the most fatalities, property damage, and crop damage.

Data Processing

Retrieve the storm data file from the specified source, e.g. https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2. Save the file as “repdata_data_StormData.csv.bz2”. The program reads the bz2 (Bzip2 compressed) formatted file directly as the initial processing step.

# Source file obtained from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2
storm_data <- read.csv(bzfile("repdata_data_StormData.csv.bz2"
                              , "repdata_data_StormData.csv")
                       , header = TRUE 
                       ,sep = ",")

A preliminary analysis of the file revealed over 150 unique weather events. Many were duplicates or were very similar to each other while others were misspelled. In an attempt to gain more meaningful results, the weather events were evaluated and regrouped into more generalized categories. The function, assignGroup, takes a weather event as an input parameter and returns a more consistent and standardized weather grouping in all capital letters.

# group weather events into consistent categories
assignGroup <- function(p_type) {
    if (!is.null(p_type)) {
        
        p_type <- toupper(p_type)
        
        # manually investigate and re-group items
        if (p_type == "AVALANCE") {
            p_type <- "AVALANCHE"
            }
        else if (p_type == "COASTAL FLOOD") {
            p_type <- "COASTAL FLOODING"
            }
        else if (p_type == "COASTALSTORM") {
            p_type <- "COASTAL STORM"
            }
        else if (p_type %in% c("COLD"
                               ,"COLD AND SNOW"
                               ,"COLD TEMPERATURE"
                               ,"COLD WAVE"
                               ,"COLD WEATHER"
                               ,"COLD/WINDS")) {
            p_type <- "COLD/WIND CHILL"
            }
        else if (p_type %in% c("DROUGHT/EXCESSIVE HEAT"
                               , "EXCESSIVE HEAT"
                               , "EXTREME HEAT"
                               , "RECORD/EXCESSIVE HEAT"
                               , "RECORD HEAT") 
                 || (regexpr("^UNSEASONABLY WARM", p_type) > 0)
                 ) {
            p_type <- "DROUGHT/EXCESSIVE HEAT"
            }
        else if (p_type %in% c("EXTENDED COLD"
                               , "EXTREME COLD/WIND CHILL"
                               , "LOW TEMPERATURE"
                               , "RECORD COLD"
                               , "UNSEASONABLY COLD")) {
            p_type <- "EXTREME COLD"
            }
        else if (p_type == "FALLING SNOW/ICE") {
            p_type <- "SNOW AND ICE"
            }
        else if (p_type %in% c("FLASH FLOOD"
                               , "FLASH FLOOD/FLOOD"
                               , "FLASH FLOODING"
                               , "FLASH FLOODING/FLOOD"
                               , "FLASH FLOODS"
                               , "FLOOD"
                               , "FLOOD & HEAVY RAIN"
                               , "FLOODING"
                               , "FLOOD/RIVER FLOOD"
                               , "MINOR FLOODING"
                               , "RAPIDLY RISING WATER"
                               , "URBAN/SML STREAM FLD"
                               , "URBAN AND SMALL STREAM FLOODIN") 
                 || (regexpr("^RIVER FLOOD", p_type) > 0)) {
            p_type <- "FLOOD/FLASH FLOOD"
            }
        else if (p_type == "FOG AND COLD TEMPERATURES") {
            p_type <- "FOG"
            }
        else if (regexpr("^FREEZING", p_type) > 0) {
            p_type <- "FREEZE"
            }
        else if (p_type == "GLAZE") {
            p_type <- "FROST"
            }
        else if (p_type == "GUSTY WIND") {
            p_type <- "GUSTY WINDS"
            }
        else if (regexpr("^HEAT WAVE", p_type) > 0) {
            p_type <- "HEAT"
            }
        else if (p_type == "HEAVY SNOW AND HIGH WINDS") {
            p_type <- "HEAVY SNOW"
            }
        else if (p_type %in% c("HEAVY SURF AND WIND"
                               , "HEAVY SURF/HIGH SURF"
                               , "ROUGH SURF")) {
            p_type <- "HEAVY SURF"
            }
        else if (p_type %in% c("HIGH SWELLS"
                               , "HIGH WATER"
                               , "HIGH WAVES")) {
            p_type <- "HIGH SURF"
            }
        else if (regexpr("^HIGH WIND", p_type) > 0) {
            p_type <- "HIGH WINDS"
            }
        else if (regexpr("^HURRICANE", p_type) > 0) {
            p_type <- "HURRICANE/TYPHOON"
            }
        else if (regexpr("^(HYPOTNERMIA|HYPOTHERMIA|HYPTHERMIA)", p_type) > 0) {
            p_type <- "HYPTHERMIA/EXPOSURE"
            }
        else if (regexpr("^(ICE|ICY)", p_type) > 0) {
            p_type <- "ICE"
            }
        else if (regexpr("^LANDSLIDE", p_type) > 0) {
            p_type <- "LANDSLIDES"
            }
        else if (p_type %in% c("LIGHT SNOW"
                               , "HEAVY SNOW")) {
            p_type <- "SNOW"
            }
        else if (p_type == "LIGHTNING.") {
            p_type <- "LIGHTNING"
            }
        else if (p_type =="MARINE MISHAP") 
            {
            p_type <- "MARINE ACCIDENT"
            }
        else if (regexpr("^MUDSLIDE", p_type) > 0) {
            p_type <- "MUDSLIDES"
            }
        else if (regexpr("^RAIN/", p_type) > 0) {
            p_type <- "MIXED PRECIP"        
            }
        else if (regexpr("^RIP CURRENT", p_type) > 0) {
            p_type <- "RIP CURRENTS"
            }
        else if (p_type == "SNOW/ BITTER COLD") {
            p_type <- "SNOW"
            }
        else if (regexpr("^STRONG WIND", p_type) > 0) {
            p_type <- "STRONG WINDS"
            }
        else if (regexpr("^(THUNDERSTORM WIND|THUNDERTORM)", p_type) > 0) {
            p_type <- "THUNDERSTORM WINDS"
            }
        else if (regexpr("^(TORNADO|TSTM WIND|MARINE TSTM WIND)", p_type) > 0) {
            p_type <- "TORNADOS"        
            }
        else if (regexpr("^TROPICAL STORM", p_type) > 0) {
            p_type <- "TROPICAL STORMS"        
            }
        else if (regexpr("^WATERSPOUT", p_type) > 0) {
            p_type <- "WATERSPOUTS"        
            }
        else if (regexpr("^WILD", p_type) > 0) {
            p_type <- "WILDFIRES"        
            }
        else if (regexpr("^WIND", p_type) > 0) {
            p_type <- "WINDS"        
            }
        else if (regexpr("^WINTER STORM", p_type) > 0) {
            p_type <- "WINTER STORMS"        
            }
        else if (regexpr("^(WINTER WEATHER|WINTRY MIX)", p_type) > 0) {
            p_type <- "WINTER WEATHER/MIX"        
            }
        }
    return (p_type)
    }

In calculating property damage (PROPDMG) and crop damage (CROPDMG) estimates, a dollar amount unit column must be considered. The additional columns of PROPDMGEXP and CROPDMGEXP signify the dollar amount magnitude. For example, “K” for thousands, “M” for millions, and “B” for billions of dollars. Any other value will be considered noise and will be ignored. The function, getDamageInDollars, will take two input parameters: a dollar amount and the dollar amount magnitude and calculate the actual dollar amount.

getDamageInDollars <- function(p_amt, p_unit) {

    p_unit <- toupper(p_unit)    
    totalAmt <- p_amt;
    
    if (p_unit %in% c("K", "M", "B")) {
        if (p_unit == "K") {
            totalAmt <- p_amt * 1000
            }
        else if (p_unit == "M") {
            totalAmt <- p_amt * 10^6
            }
        else if (p_unit == "B") {
            totalAmt <- p_amt * 10^9
            }
        } 
    
    return (totalAmt)
    }

In order to identify the weather events that most affects population as well as those events that result in the greatest amount of damage, create aggregate datasets based upon the following columns: FATALITIES, PROPDMG, and CROPDMG.

After grouping the data, graph the top 25 weather events that result in the greatest number of fatalities and those that result in the greatest amount of property and crop damage.

# -----------------------------------------------------------------------------
# aggregate by fatalities
# -----------------------------------------------------------------------------
storm_data.fatalities <- aggregate(storm_data$FATALITIES ~ storm_data$EVTYPE
                                   , data = storm_data
                                   , FUN = sum)
colnames(storm_data.fatalities) <- c("type", "fatalities")

# only display events where fatalities > 0
fatalities_above_zero <- subset(storm_data.fatalities, fatalities > 0)
fatalities_above_zero$new_group <- NULL
fatalities_above_zero$new_group <- toupper(as.character(fatalities_above_zero$type))
for (i in seq(from=1, to=nrow(fatalities_above_zero))) 
    fatalities_above_zero$new_group[i] <- assignGroup(as.character(fatalities_above_zero$type[i]))    

# regroup data    
fatalities_above_zero.regrouped <- aggregate(fatalities_above_zero$fatalities 
                                             ~ fatalities_above_zero$new_group
                                             , data = fatalities_above_zero
                                             , FUN = sum)
colnames(fatalities_above_zero.regrouped) <- c("type", "fatalities")

t1 <- fatalities_above_zero.regrouped[order(-fatalities_above_zero.regrouped$fatalities),]

# retrieve top 25 events
t1 <- head(t1, 25)
# -----------------------------------------------------------------------------
# create a subset of storm_data data representing only those records whose
# property and crop damage amounts are greater than 0
# -----------------------------------------------------------------------------
damages <- storm_data[storm_data$PROPDMG+storm_data$CROPDMG > 0
                 , c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
# create columns to hold total property and crop damage amounts
damages$totalpropdmg <- NULL
damages$totalcropdmg <- NULL

# Note:  the is a very time consuming operation
for (i in seq(from=1, to=nrow(damages))) {
    damages$totalpropdmg[i] <- getDamageInDollars(damages$PROPDMG[i], damages$PROPDMGEXP[i])
    damages$totalcropdmg[i] <- getDamageInDollars(damages$CROPDMG[i], damages$CROPDMGEXP[i])
    }

# create a subset of the property and crop damage data
# and only on those records whose value > 0
damages_prop <- damages[damages$totalpropdmg > 0,]
damages_crop <- damages[damages$totalcropdmg > 0,]
# -----------------------------------------------------------------------------
# aggregate by property damange 
# -----------------------------------------------------------------------------
storm_data.propdmg <- aggregate(damages_prop$totalpropdmg ~ damages_prop$EVTYPE
                                , data = damages_prop
                                , FUN = sum)
colnames(storm_data.propdmg) <- c("type", "propdmg")

storm_data.propdmg$new_group <- NULL
storm_data.propdmg$new_group <- toupper(as.character(storm_data.propdmg$type))
for (i in seq(from=1, to=nrow(storm_data.propdmg))) 
    storm_data.propdmg$new_group[i] <- assignGroup(as.character(storm_data.propdmg$type[i]))    
# regroup data    
storm_data.propdmg.regrouped <- aggregate(storm_data.propdmg$propdmg 
                                          ~ storm_data.propdmg$new_group
                                          , data = storm_data.propdmg
                                          , FUN = sum)
colnames(storm_data.propdmg.regrouped) <- c("type", "propdmg")

t2 <- storm_data.propdmg.regrouped[order(-storm_data.propdmg.regrouped$propdmg),]

# retrieve top 25 events
t2 <- head(t2, 25)
# set property damage value in millions of dollars
t2$propdmg <- t2$propdmg / 10^6
# -----------------------------------------------------------------------------
# aggregate by crop damange 
# -----------------------------------------------------------------------------
storm_data.cropdmg <- aggregate(damages_crop$totalcropdmg ~ damages_crop$EVTYPE
                                , data = damages_crop
                                , FUN = sum)
colnames(storm_data.cropdmg) <- c("type", "cropdmg")

storm_data.cropdmg$new_group <- NULL
storm_data.cropdmg$new_group <- toupper(as.character(storm_data.cropdmg$type))
for (i in seq(from=1, to=nrow(storm_data.cropdmg))) 
    storm_data.cropdmg$new_group[i] <- assignGroup(as.character(storm_data.cropdmg$type[i]))    
# regroup data    
storm_data.cropdmg.regrouped <- aggregate(storm_data.cropdmg$cropdmg 
                                          ~ storm_data.cropdmg$new_group
                                          , data = storm_data.cropdmg
                                          , FUN = sum)
colnames(storm_data.cropdmg.regrouped) <- c("type", "cropdmg")

t3 <- storm_data.cropdmg.regrouped[order(-storm_data.cropdmg.regrouped$cropdmg),]

# retrieve top 25 events
t3 <- head(t3, 25)
# set crop damage value in millions of dollars
t3$cropdmg <- t3$cropdmg / 10^6

Results

The types of weather events with the greatest number of fatalities:

plot of chunk unnamed-chunk-4

##                      type fatalities
## 56               TORNADOS       6177
## 9  DROUGHT/EXCESSIVE HEAT       2060
## 17      FLOOD/FLASH FLOOD       1548
## 23                   HEAT       1118
## 36              LIGHTNING        817
## 43           RIP CURRENTS        577
## 15           EXTREME COLD        298
## 30             HIGH WINDS        293
## 1               AVALANCHE        225
## 63          WINTER STORMS        217
## 55     THUNDERSTORM WINDS        200
## 7         COLD/WIND CHILL        158
## 31      HURRICANE/TYPHOON        135
## 46                   SNOW        134
## 52           STRONG WINDS        111
## 29              HIGH SURF        109
## 3                BLIZZARD        101
## 34                    ICE        101
## 24             HEAVY RAIN         98
## 61              WILDFIRES         90
## 57        TROPICAL STORMS         66
## 18                    FOG         63
## 64     WINTER WEATHER/MIX         62
## 27             HEAVY SURF         57
## 35             LANDSLIDES         39
The types of weather events that produce the greatest amount of property damage is:

plot of chunk unnamed-chunk-5

##                          type  propdmg
## 52          FLOOD/FLASH FLOOD 166975.6
## 119         HURRICANE/TYPHOON  84756.2
## 211                  TORNADOS  63077.3
## 190               STORM SURGE  43323.5
## 73                       HAIL  15732.3
## 230                 WILDFIRES   8491.6
## 214           TROPICAL STORMS   7714.4
## 232             WINTER STORMS   6749.0
## 118                HIGH WINDS   6003.4
## 201        THUNDERSTORM WINDS   5223.1
## 191          STORM SURGE/TIDE   4641.2
## 120                       ICE   3972.1
## 92  HEAVY RAIN/SEVERE WEATHER   2500.0
## 164       SEVERE THUNDERSTORM   1205.4
## 30                    DROUGHT   1046.1
## 170                      SNOW    950.0
## 132                 LIGHTNING    928.7
## 88                 HEAVY RAIN    694.2
## 11                   BLIZZARD    659.2
## 218                   TYPHOON    600.2
## 19           COASTAL FLOODING    392.5
## 125                LANDSLIDES    324.7
## 83                  HAILSTORM    241.0
## 192              STRONG WINDS    177.7
## 216                   TSUNAMI    144.1
The types of weather events that produce the greatest amount of crop damage is:

plot of chunk unnamed-chunk-6

##                       type  cropdmg
## 9                  DROUGHT 13972.57
## 18       FLOOD/FLASH FLOOD 12268.99
## 44       HURRICANE/TYPHOON  5515.29
## 46                     ICE  5027.11
## 28                    HAIL  3025.95
## 16            EXTREME COLD  1338.07
## 24            FROST/FREEZE  1094.19
## 68                TORNADOS  1036.17
## 38              HEAVY RAIN   733.40
## 69         TROPICAL STORMS   694.90
## 43              HIGH WINDS   686.30
## 64      THUNDERSTORM WINDS   605.69
## 10  DROUGHT/EXCESSIVE HEAT   497.42
## 22                  FREEZE   456.73
## 37                    HEAT   407.06
## 76               WILDFIRES   402.78
## 8          DAMAGING FREEZE   296.23
## 15       EXCESSIVE WETNESS   142.00
## 57                    SNOW   134.66
## 19        FLOOD/RAIN/WINDS   112.80
## 2                 BLIZZARD   112.06
## 60            STRONG WINDS    69.95
## 5  COLD AND WET CONDITIONS    66.00
## 23                   FROST    66.00
## 40             HEAVY RAINS    60.50