Synopsis

NOAA’s significant weather event data (also known as Storm data) for USA from years 1950 through 2011 were analyzed for fatalities, injuries, property damage and crop damages. Weather event types in the source data had some areas that required tidying. Similarly, property damage and crop damage magnitude fields had some invalid data but that percentage was very low.

Results show Tornado, Excessive Heat, Heat, Flash Flood and High Wind are the top five killers. For injuries, the top five events are: Tornado, High Wind, Flood, Excessive Heat and Lightning. In terms of damages to property and crops, Flood, Hurricane (Typhoon), Tornado, Storm Surge/Tide and Hail are the top five events. Please review the analysis and graphs for further details.

Data Processing

The data analysis addresses the following questions:
1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
2. Across the United States, which types of events have the greatest economic consequences?

The analysis and illustrations are to help a government or a municipal manager who may be responsible for preparing for severe weather events and will need to prioritize resources for different types of events.

Please note this is the first step in analysis. Further analysis is possible by looking at trends over years as well as by geography. These will be done in subsequent steps of the analysis in the future.

Set global options

Ensure all the needed libraries are loaded. The code and results are echoed to output.

library(knitr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(reshape2)
library(ggplot2)
opts_chunk$set(echo = TRUE)

Load Data

Data is read directly from the “bz2” zipfile which was downloaded from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2.

## validate the zipfile exists
dataFileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
filePath <- "./repdata_data_StormData.csv.bz2"
if (!file.exists(filePath))
{
    print(paste0("File '", filePath, "' does not exist"))
    print(paste0("Download dataset from '", dataFileURL, "' to the working directory"))
    quit()
}

## read the file
stormDF <- read.csv(filePath, stringsAsFactors = FALSE)

Data Transformation: Description and Justification

The source data needed some tidying to help answer the questions.
1. NOAA documentation lists 48 event types. However, analysis shows 985 event types are present in the source data. From that list, 57 event types contributes to 10 deaths or more. Those event types are mapped to existing 48 event types. As we could not find a good mapping for Landslide, it is created as a new event type.
2. From the source data, 87 event types contribute to 10 injuries or more. Those event types are mapped to the existing 49 event types (original 48 plus Landslide).
3. Fields PROPDMGEXP and CROPDMGEXP are supposed to be blank or K for thousnds, or M for millions or B for billions. However, 0.04% of PROPDMGEXP and 0.00004% of CROPDMGEXP contain invalid values and an assumption was made to ignore those values and treat the multiplier as 1. Mapping has been done to map “k” to “K”, “m” to “M” and “b” to “B”. This impacts a very small subset of data (less than 0.000016% of rows).

Data Transformation: EVTYPE

Storm data documentation (https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf) lists 48 event types whereas there are 985 levels in the dataset. We need to map these events in a reasonable way.

## initial peek into data
evtype <- as.factor(stormDF$EVTYPE)
str(evtype)
##  Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## create a vector of standard event types
stdEvtype <- c("Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood", "Cold/Wind Chill",
               "Debris Flow", "Dense Fog", "Dense Smoke", "Drought", "Dust Devil", "Dust Storm",
               "Excessive Heat", "Extreme Cold/Wind Chill", "Flash Flood", "Flood", "Frost/Freeze",
               "Funnel Cloud", "Freezing Fog", "Hail", "Heat", "Heavy Rain", "Heavy Snow", "High Surf",
               "High Wind", "Hurricane (Typhoon)", "Ice Storm", "Lake-Effect Snow", "Lakeshore Flood",
               "Lightning", "Marine Hail", "Marine High Wind", "Marine Strong Wind",
               "Marine Thunderstorm Wind", "Rip Current", "Seiche", "Sleet", "Storm Surge/Tide",
               "Strong Wind", "Thunderstorm Wind", "Tornado", "Tropical Depression", "Tropical Storm",
               "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather")
    
## create a mapping from various forms of standard event types to standard event types
## we are using parallel vectors to maintain this mapping
events <- vector()
stdEvents <- vector()
for (idx in 1:length(stdEvtype))
{
    ## convert everything to upper case for case insensitive comparison
    event <- toupper(stdEvtype[idx])
    events <- append(events, event)
    stdEvents <- append(stdEvents, stdEvtype[idx])
}

## Need to convert:
##  "Cold/Wind Chill", "Cold" and "Wind Chill" maps to "Cold/Wind Chill" (use of '/')
##  "Lake-Effect Snow", "Lake Effect Snow" to "Lake-Effect Snow" (use of '-')
##  "Hurricane (Typhoon)", "Hurricane", "Typhoon" to "Hurricane (Typhoon)" (use of '(' and ')')
## Tried doing the above by code. However, found there are exceptions to these.
## Hence doing these by hand mapping!
additionalEvents <- c("EXTREME COLD", "EXTREME WIND CHILL", "COLD",
                      "WIND CHILL", "FROST", "FREEZE", "HURRICANE",
                      "TYPHOON", "LAKE EFFECT SNOW", "STORM SURGE",
                      "TIDE")
additionalStdEvents <- c("Extreme Cold/Wind Chill", "Extreme Cold/Wind Chill", "Cold/Wind Chill",
                         "Cold/Wind Chill", "Frost/Freeze", "Frost/Freeze", "Hurricane (Typhoon)",
                         "Hurricane (Typhoon)", "Lake-Effect Snow", "Storm Surge/Tide",
                         "Storm Surge/Tide")
events <- c(events, additionalEvents)
stdEvents <- c(stdEvents, additionalStdEvents)

## display top 100 events causing most fatalities
peopleHealthDF <- stormDF %>%
                  select(EVTYPE, FATALITIES, INJURIES) %>%
                  group_by(EVTYPE) %>%
                  summarize (totalFatalities=sum(FATALITIES), totalInjuries=sum(INJURIES)) %>%
                  arrange(desc(totalFatalities))
head(peopleHealthDF$EVTYPE, n = 100)
##   [1] "TORNADO"                    "EXCESSIVE HEAT"            
##   [3] "FLASH FLOOD"                "HEAT"                      
##   [5] "LIGHTNING"                  "TSTM WIND"                 
##   [7] "FLOOD"                      "RIP CURRENT"               
##   [9] "HIGH WIND"                  "AVALANCHE"                 
##  [11] "WINTER STORM"               "RIP CURRENTS"              
##  [13] "HEAT WAVE"                  "EXTREME COLD"              
##  [15] "THUNDERSTORM WIND"          "HEAVY SNOW"                
##  [17] "EXTREME COLD/WIND CHILL"    "STRONG WIND"               
##  [19] "BLIZZARD"                   "HIGH SURF"                 
##  [21] "HEAVY RAIN"                 "EXTREME HEAT"              
##  [23] "COLD/WIND CHILL"            "ICE STORM"                 
##  [25] "WILDFIRE"                   "HURRICANE/TYPHOON"         
##  [27] "THUNDERSTORM WINDS"         "FOG"                       
##  [29] "HURRICANE"                  "TROPICAL STORM"            
##  [31] "HEAVY SURF/HIGH SURF"       "LANDSLIDE"                 
##  [33] "COLD"                       "HIGH WINDS"                
##  [35] "TSUNAMI"                    "WINTER WEATHER"            
##  [37] "UNSEASONABLY WARM AND DRY"  "URBAN/SML STREAM FLD"      
##  [39] "WINTER WEATHER/MIX"         "TORNADOES, TSTM WIND, HAIL"
##  [41] "WIND"                       "DUST STORM"                
##  [43] "FLASH FLOODING"             "DENSE FOG"                 
##  [45] "EXTREME WINDCHILL"          "FLOOD/FLASH FLOOD"         
##  [47] "RECORD/EXCESSIVE HEAT"      "HAIL"                      
##  [49] "COLD AND SNOW"              "FLASH FLOOD/FLOOD"         
##  [51] "MARINE STRONG WIND"         "STORM SURGE"               
##  [53] "WILD/FOREST FIRE"           "STORM SURGE/TIDE"          
##  [55] "UNSEASONABLY WARM"          "MARINE THUNDERSTORM WIND"  
##  [57] "WINTER STORMS"              "MARINE TSTM WIND"          
##  [59] "ROUGH SEAS"                 "TROPICAL STORM GORDON"     
##  [61] "FREEZING RAIN"              "GLAZE"                     
##  [63] "HEAVY SURF"                 "LOW TEMPERATURE"           
##  [65] "MARINE MISHAP"              "STRONG WINDS"              
##  [67] "FLOODING"                   "HURRICANE ERIN"            
##  [69] "ICE"                        "COLD WEATHER"              
##  [71] "FLASH FLOODING/FLOOD"       "HEAT WAVES"                
##  [73] "HIGH SEAS"                  "ICY ROADS"                 
##  [75] "RIP CURRENTS/HEAVY SURF"    "SNOW"                      
##  [77] "TSTM WIND/HAIL"             "GUSTY WINDS"               
##  [79] "HEAT WAVE DROUGHT"          "HIGH WIND/SEAS"            
##  [81] "Hypothermia/Exposure"       "Mudslide"                  
##  [83] "RAIN/SNOW"                  "ROUGH SURF"                
##  [85] "SNOW AND ICE"               "COASTAL FLOOD"             
##  [87] "COASTAL STORM"              "Cold"                      
##  [89] "COLD WAVE"                  "DRY MICROBURST"            
##  [91] "HEAVY SEAS"                 "Heavy surf and wind"       
##  [93] "High Surf"                  "HIGH WATER"                
##  [95] "HIGH WIND AND SEAS"         "HIGH WINDS/SNOW"           
##  [97] "HYPOTHERMIA/EXPOSURE"       "WATERSPOUT"                
##  [99] "WATERSPOUT/TORNADO"         "WILD FIRES"
head(peopleHealthDF$totalFatalities, n = 100)
##   [1] 5633 1903  978  937  816  504  470  368  248  224  206  204  172  160
##  [15]  133  127  125  103  101  101   98   96   95   89   75   64   64   62
##  [29]   61   58   42   38   35   35   33   33   29   28   28   25   23   22
##  [43]   19   18   17   17   17   15   14   14   14   13   12   11   11   10
##  [57]   10    9    8    8    7    7    7    7    7    7    6    6    6    5
##  [71]    5    5    5    5    5    5    5    4    4    4    4    4    4    4
##  [85]    4    3    3    3    3    3    3    3    3    3    3    3    3    3
##  [99]    3    3
## there are 57 EVTYPES where fatalities are greater than or equal to 10
## map mislabeled/mistyped events from these 57 to the standard events
additionalEvents <- c("HEAT WAVE", "EXTREME HEAT", "HURRICANE/TYPHOON", "FOG",
                      "HEAVY SURF/HIGH SURF", "LANDSLIDE", "UNSEASONABLY WARM AND DRY",
                      "URBAN/SML STREAM FLD", "WINTER WEATHER/MIX", "WIND", "EXTREME WINDCHILL",
                      "WILD/FOREST FIRE", "UNSEASONABLY WARM")
additionalStdEvents <- c("Heat", "Excessive Heat", "Hurricane (Typhoon)", "Dense Fog",
                         "High Surf", "Landslide", "Excessive Heat",
                         "Flash Flood", "Winter Weather", "High Wind", "Extreme Cold/Wind Chill",
                         "Wildfire", "Excessive Heat")
events <- c(events, additionalEvents)
stdEvents <- c(stdEvents, additionalStdEvents)

## display top 100 events causing most injuries
peopleHealthDF <- peopleHealthDF %>%
                  arrange(desc(totalInjuries))
head(peopleHealthDF$EVTYPE, n = 100)
##   [1] "TORNADO"                  "TSTM WIND"               
##   [3] "FLOOD"                    "EXCESSIVE HEAT"          
##   [5] "LIGHTNING"                "HEAT"                    
##   [7] "ICE STORM"                "FLASH FLOOD"             
##   [9] "THUNDERSTORM WIND"        "HAIL"                    
##  [11] "WINTER STORM"             "HURRICANE/TYPHOON"       
##  [13] "HIGH WIND"                "HEAVY SNOW"              
##  [15] "WILDFIRE"                 "THUNDERSTORM WINDS"      
##  [17] "BLIZZARD"                 "FOG"                     
##  [19] "WILD/FOREST FIRE"         "DUST STORM"              
##  [21] "WINTER WEATHER"           "DENSE FOG"               
##  [23] "TROPICAL STORM"           "HEAT WAVE"               
##  [25] "HIGH WINDS"               "RIP CURRENTS"            
##  [27] "STRONG WIND"              "HEAVY RAIN"              
##  [29] "RIP CURRENT"              "EXTREME COLD"            
##  [31] "GLAZE"                    "AVALANCHE"               
##  [33] "EXTREME HEAT"             "HIGH SURF"               
##  [35] "WILD FIRES"               "ICE"                     
##  [37] "TSUNAMI"                  "TSTM WIND/HAIL"          
##  [39] "WIND"                     "URBAN/SML STREAM FLD"    
##  [41] "WINTRY MIX"               "WINTER WEATHER/MIX"      
##  [43] "Heat Wave"                "WINTER WEATHER MIX"      
##  [45] "LANDSLIDE"                "RECORD HEAT"             
##  [47] "HEAVY SURF/HIGH SURF"     "COLD"                    
##  [49] "HURRICANE"                "TROPICAL STORM GORDON"   
##  [51] "WATERSPOUT/TORNADO"       "DUST DEVIL"              
##  [53] "HEAVY SURF"               "STORM SURGE"             
##  [55] "SNOW/HIGH WINDS"          "SNOW SQUALL"             
##  [57] "ICY ROADS"                "SNOW"                    
##  [59] "WATERSPOUT"               "DRY MICROBURST"          
##  [61] "THUNDERSTORMW"            "MARINE THUNDERSTORM WIND"
##  [63] "MIXED PRECIP"             "EXTREME COLD/WIND CHILL" 
##  [65] "BLACK ICE"                "FREEZING RAIN"           
##  [67] "MARINE STRONG WIND"       "STRONG WINDS"            
##  [69] "EXCESSIVE RAINFALL"       "HIGH WIND AND SEAS"      
##  [71] "UNSEASONABLY WARM"        "WINTER STORMS"           
##  [73] "TORNADO F2"               "FLOOD/FLASH FLOOD"       
##  [75] "HEAT WAVE DROUGHT"        "FREEZING DRIZZLE"        
##  [77] "WINTER STORM HIGH WINDS"  "GLAZE/ICE STORM"         
##  [79] "BLOWING SNOW"             "COLD/WIND CHILL"         
##  [81] "THUNDERSTORM"             "HEAVY SNOW/ICE"          
##  [83] "SMALL HAIL"               "THUNDERSTORM  WINDS"     
##  [85] "FLASH FLOODING"           "MARINE TSTM WIND"        
##  [87] "HIGH SEAS"                "GUSTY WINDS"             
##  [89] "NON-SEVERE WIND DAMAGE"   "HIGH WINDS/SNOW"         
##  [91] "EXTREME WINDCHILL"        "STORM SURGE/TIDE"        
##  [93] "ROUGH SEAS"               "MARINE MISHAP"           
##  [95] "COASTAL FLOODING/EROSION" "TYPHOON"                 
##  [97] "High Surf"                "DROUGHT"                 
##  [99] "HEAVY RAINS"              "HIGH WINDS/COLD"
head(peopleHealthDF$totalInjuries, n = 100)
##   [1] 91346  6957  6789  6525  5230  2100  1975  1777  1488  1361  1321
##  [12]  1275  1137  1021   911   908   805   734   545   440   398   342
##  [23]   340   309   302   297   280   251   232   231   216   170   155
##  [34]   152   150   137   129    95    86    79    77    72    70    68
##  [45]    52    50    48    48    46    43    42    42    40    38    36
##  [56]    35    31    29    29    28    27    26    26    24    24    23
##  [67]    22    21    21    20    17    17    16    15    15    15    15
##  [78]    15    13    12    12    10    10    10     8     8     8     8
##  [89]     7     6     5     5     5     5     5     5     4     4     4
## [100]     4
## there are 84 EVTYPES where injuries are greater than or equal to 10
## map mislabeled/mistyped events from these 84 to the standard events
## No equivalents exit for LANDSLIDE - hence, leaving that as is
additionalEvents <- c("TSTM WIND", "WILD FIRES", "ICE", "TSTM WIND/HAIL", "WINTRY MIX",
                      "RECORD HEAT", "SNOW/HIGH WINDS", "SNOW SQUALL", "ICY ROADS", "SNOW",
                      "DRY MICROBURST", "THUNDERSTORMW", "MIXED PRECIP", "BLACK ICE", "FREEZING RAIN",
                      "EXCESSIVE RAINFALL", "FREEZING DRIZZLE", "BLOWING SNOW", "SMALL HAIL")
additionalStdEvents <- c("Thunderstorm Wind", "Wildfire", "Ice Storm", "Thunderstorm Wind", "Winter Weather",
                         "Excessive Heat", "High Wind", "Winter Storm", "Frost/Freeze", "Heavy Snow",
                         "Thunderstorm Wind", "Thunderstorm Wind", "Heavy Rain", "Frost/Freeze", "Heavy Rain",
                         "Heavy Rain", "Heavy Rain", "Winter Storm", "Hail")
events <- c(events, additionalEvents)
stdEvents <- c(stdEvents, additionalStdEvents)

## the following is based on some observed typos and liberties taken with naming
additionalEvents <- c("AVALANCE", "DUST DEVEL", "DUSTSTORM", "HURRICANE", "LIGHTING",
                      "LIGNTNING", "TSTM", "THUNDERSTORM")
additionalStdEvents <- c("Avalanche", "Dust Devil", "Dust Storm", "Hurricane (Typhoon)", "Lightning",
                         "Lightning", "Thunderstorm Wind", "Thunderstorm Wind")
events <- c(events, additionalEvents)
stdEvents <- c(stdEvents, additionalStdEvents)

## create new variable for the updated EVTYPE
stormDF$newEvtype <- apply(stormDF, 1, function(x)
                                    {
                                        eventToMatch <- toupper(x["EVTYPE"])
                                        ## exact matches first (hence fixed = TRUE in grepl)
                                        for (idx in 1:length(events))
                                        {
                                            if (grepl(events[idx], eventToMatch, fixed = TRUE))
                                            {
                                                return(stdEvents[idx])
                                            }
                                        }
                                        ## substring matches next
                                        for (idx in 1:length(events))
                                        {
                                            if (grepl(events[idx], eventToMatch))
                                            {
                                                return(stdEvents[idx])
                                            }
                                        }
                                        return("other")
                                    })
stormDF$newEvtype <- as.factor(stormDF$newEvtype)

Data Transformation: PROPDMGEXP

Storm data documentation (https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf) lists 3 values (K for thousands, M for millions and B for billions) whereas there are 19 levels in the dataset. Blanks should be fine. Our approach is to keep B/b, M/m and K/k for our calculation. Everything else is ignored

## show the distribution
table(stormDF$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5 
## 465934      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      h      H      K      m      M 
##      4      5      1     40      1      6 424665      7  11330
## create a multiplier column
valK = 10.0 ^ 3
valM = 10.0 ^ 6
valB = 10.0 ^ 9
stormDF$propDmgMultiplier <- apply(stormDF, 1, function(x)
                                            {
                                                exp <- toupper(trimws(x["PROPDMGEXP"]))
                                                if (exp == "K") return (valK)
                                                if (exp == "M") return (valM)
                                                if (exp == "B") return (valB)
                                                return (1.0)
                                            })
                                            
## calculate property damage in millions
stormDF$propDmgMill <- apply(stormDF, 1, function(x)
                                    {
                                        damage <- as.numeric(x["PROPDMG"])
                                        multiplier <- as.numeric(x["propDmgMultiplier"])
                                        return (damage * multiplier / valM)
                                    })

Data Transformation: CROPDMGEXP

Storm data documentation (https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf) lists 3 values (K for thousands, M for millions and B for billions) whereas there are 9 levels in the dataset. Blanks should be fine. Our approach is to keep B/b, M/m and K/k for our calculation. Everything else is ignored

## show the distribution
table(stormDF$CROPDMGEXP)
## 
##             ?      0      2      B      k      K      m      M 
## 618413      7     19      1      9     21 281832      1   1994
## create a multiplier column
stormDF$cropDmgMultiplier <- apply(stormDF, 1, function(x)
                                            {
                                                exp <- toupper(trimws(x["CROPDMGEXP"]))
                                                if (exp == "K") return (valK)
                                                if (exp == "M") return (valM)
                                                if (exp == "B") return (valB)
                                                return (1.0)
                                            })
                                            
## calculate property damage in millions
stormDF$cropDmgMill <- apply(stormDF, 1, function(x)
                                    {
                                        damage <- as.numeric(x["CROPDMG"])
                                        multiplier <- as.numeric(x["cropDmgMultiplier"])
                                        return (damage * multiplier / valM)
                                    })

Results

Fatality results

peopleDF <- stormDF %>%
            select(FATALITIES, INJURIES, newEvtype) %>%
            group_by(newEvtype) %>%
            summarize (totalFatalities = sum(FATALITIES), totalInjuries = sum(INJURIES))
            
## look at fatalities only
topFatalityCutoff <- 100
topFatalities <- peopleDF %>%
                 select(newEvtype, totalFatalities) %>%
                 filter(totalFatalities >= topFatalityCutoff) %>%
                 arrange(desc(totalFatalities))
sortedEvtype <- as.character(topFatalities$newEvtype)
topFatalities <- rename(topFatalities, eventType = newEvtype, fatalities = totalFatalities)
qplot(eventType, data = topFatalities, geom = "bar", weight = fatalities, fill = I("#f03b20"),
      xlab = paste0("Event Type (fatalities greater than or equal to ", topFatalityCutoff, ")"),
      ylab = "Total Fatalities",
      main = "Top Fatalities by Weather Events (US 1950-2011)") +
      scale_x_discrete(limits = sortedEvtype) +
      theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5))

## diplay all fatalities
allFatalities <- peopleDF %>%
                 select(newEvtype, totalFatalities) %>%
                 arrange(desc(totalFatalities))
allFatalities <- rename(allFatalities, event.type = newEvtype, fatalities = totalFatalities)
print.data.frame(allFatalities)
##                  event.type fatalities
## 1                   Tornado       5636
## 2            Excessive Heat       1960
## 3                      Heat       1212
## 4               Flash Flood       1063
## 5                 High Wind        864
## 6                 Lightning        817
## 7               Rip Current        577
## 8                     Flood        484
## 9           Cold/Wind Chill        289
## 10                Avalanche        225
## 11             Winter Storm        219
## 12        Thunderstorm Wind        203
## 13  Extreme Cold/Wind Chill        162
## 14                High Surf        146
## 15               Heavy Snow        143
## 16      Hurricane (Typhoon)        133
## 17               Heavy Rain        111
## 18              Strong Wind        111
## 19                Ice Storm        102
## 20                 Blizzard        101
## 21                 Wildfire         90
## 22                Dense Fog         80
## 23                    other         76
## 24           Tropical Storm         66
## 25           Winter Weather         62
## 26                     Hail         45
## 27                Landslide         39
## 28                  Tsunami         33
## 29         Storm Surge/Tide         24
## 30               Dust Storm         22
## 31       Marine Strong Wind         14
## 32 Marine Thunderstorm Wind         10
## 33             Frost/Freeze          7
## 34            Coastal Flood          6
## 35                  Drought          6
## 36               Waterspout          3
## 37               Dust Devil          2
## 38                    Sleet          2
## 39    Astronomical Low Tide          0
## 40              Dense Smoke          0
## 41             Freezing Fog          0
## 42             Funnel Cloud          0
## 43         Lake-Effect Snow          0
## 44                   Seiche          0
## 45      Tropical Depression          0
## 46             Volcanic Ash          0

Injury results

## look at injuries only
topInjuryCutoff <- 500
topInjuries <- peopleDF %>%
               select(newEvtype, totalInjuries) %>%
               filter(totalInjuries >= topInjuryCutoff) %>%
               arrange(desc(totalInjuries))
sortedEvtype <- as.character(topInjuries$newEvtype)
topInjuries <- rename(topInjuries, eventType = newEvtype, injuries = totalInjuries)
qplot(eventType, data = topInjuries, geom = "bar", weight = injuries, fill = I("#fc9272"),
      xlab = paste0("Event Type (injuries greater than or equal to ", topInjuryCutoff, ")"),
      ylab = "Total Injuries",
      main = "Top Injuries by Weather Events (US 1950-2011)") +
      scale_x_discrete(limits = sortedEvtype) +
      theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5))

## diplay all injury values
allInjuries <- peopleDF %>%
               select(newEvtype, totalInjuries) %>%
               arrange(desc(totalInjuries))
allInjuries <- rename(allInjuries, event.type = newEvtype, injuries = totalInjuries)
print.data.frame(allInjuries)
##                  event.type injuries
## 1                   Tornado    91407
## 2                 High Wind     8615
## 3                     Flood     6795
## 4            Excessive Heat     6542
## 5                 Lightning     5232
## 6                      Heat     2684
## 7         Thunderstorm Wind     2468
## 8                 Ice Storm     2154
## 9               Flash Flood     1881
## 10                 Wildfire     1606
## 11                     Hail     1467
## 12             Winter Storm     1373
## 13      Hurricane (Typhoon)     1333
## 14               Heavy Snow     1086
## 15                Dense Fog     1076
## 16                 Blizzard      805
## 17           Winter Weather      615
## 18              Rip Current      529
## 19               Dust Storm      440
## 20           Tropical Storm      383
## 21               Heavy Rain      340
## 22              Strong Wind      301
## 23                    other      297
## 24  Extreme Cold/Wind Chill      231
## 25                High Surf      204
## 26                Avalanche      171
## 27                  Tsunami      129
## 28          Cold/Wind Chill       85
## 29                Landslide       53
## 30               Dust Devil       43
## 31         Storm Surge/Tide       43
## 32             Frost/Freeze       34
## 33               Waterspout       29
## 34 Marine Thunderstorm Wind       26
## 35       Marine Strong Wind       22
## 36                  Drought       19
## 37            Coastal Flood        7
## 38             Funnel Cloud        3
## 39    Astronomical Low Tide        0
## 40              Dense Smoke        0
## 41             Freezing Fog        0
## 42         Lake-Effect Snow        0
## 43                   Seiche        0
## 44                    Sleet        0
## 45      Tropical Depression        0
## 46             Volcanic Ash        0

Economic results

## create groupings based on event types
## look at combined property damage and crop damage
economicDF <- stormDF %>%
            select(propDmgMill, cropDmgMill, newEvtype) %>%
            group_by(newEvtype) %>%
            summarize (totalPropDmg = sum(propDmgMill), totalCropDmg = sum(cropDmgMill))
economicDF <- economicDF %>%
            mutate(totalEconomicDmg = totalPropDmg + totalCropDmg)
topDamagesCutoff <- 500
topDamages <- economicDF %>%
              filter(totalEconomicDmg >= topDamagesCutoff) %>%
              arrange(desc(totalEconomicDmg))
sortedEvtype <- as.character(topDamages$newEvtype)

## to enable better presentation, we need to melt the dataframe
## using reshape2 library package
temp <- topDamages %>%
        select(newEvtype, totalPropDmg, totalCropDmg)
temp <- rename(temp, eventType = newEvtype, property = totalPropDmg, crop = totalCropDmg)
damages <- melt(temp, id=c("eventType"))
damages <- rename(damages, damageType = variable, damages = value)
                 
qplot(eventType, data = damages, geom = "bar", weight = damages, fill = damageType,
      xlab = paste0("Event Type (damages greater than USD ", topDamagesCutoff, "M)"),
      ylab = "Total Damages (USD Millions)",
      main = "Top Damages by Weather Events (US 1950-2011)") +
      scale_x_discrete(limits = sortedEvtype) +
      theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5)) + 
      labs(fill="Damage Type") +
      guides(fill=guide_legend(reverse=TRUE))

## diplay all the damages
allDamages <- economicDF %>%
              arrange(desc(totalEconomicDmg))
allDamages <- rename(allDamages, event.type = newEvtype,
                     property.damage = totalPropDmg,
                     crop.damage = totalCropDmg,
                     total.damage = totalEconomicDmg)
                     
## ALL THE DAMAGES ARE IN USD (MILLIONS)
print.data.frame(allDamages)
##                  event.type property.damage crop.damage total.damage
## 1                     Flood    150205.21668 10847.85595 161053.07263
## 2       Hurricane (Typhoon)     85256.41001  5506.11780  90762.52781
## 3                   Tornado     56993.09803   414.96152  57408.05955
## 4          Storm Surge/Tide     47974.14915     0.85500  47975.00415
## 5                      Hail     17619.99107  3114.21287  20734.20394
## 6               Flash Flood     16965.21784  1540.68525  18505.90309
## 7                   Drought      1046.30600 13972.62178  15018.92778
## 8                 High Wind     10666.33960  1273.38025  11939.71985
## 9                 Ice Storm      3964.13941  5022.11430   8986.25371
## 10                 Wildfire      8491.56350   402.78163   8894.34513
## 11           Tropical Storm      7714.39055   694.89600   8409.28655
## 12        Thunderstorm Wind      6436.56211   652.80039   7089.36250
## 13             Winter Storm      6690.41225    27.44400   6717.85625
## 14               Heavy Rain      3242.60864   795.40980   4038.01844
## 15             Frost/Freeze        19.54120  1997.06100   2016.60220
## 16  Extreme Cold/Wind Chill        67.78740  1312.97300   1380.76040
## 17               Heavy Snow       970.76770   134.66310   1105.43080
## 18                Lightning       933.98495    12.09709    946.08204
## 19                 Blizzard       659.91395   112.06000    771.97395
## 20           Excessive Heat         7.75370   492.41200    500.16570
## 21                     Heat        12.37205   412.01150    424.38355
## 22            Coastal Flood       417.61606     0.05600    417.67206
## 23                Landslide       324.70100    20.01700    344.71800
## 24              Strong Wind       181.17424    69.95350    251.12774
## 25                    other        20.74675   159.28895    180.03570
## 26          Cold/Wind Chill        68.34200    96.79250    165.13450
## 27                  Tsunami       144.06200     0.02000    144.08200
## 28                High Surf       100.02500     0.00000    100.02500
## 29           Winter Weather        27.31050    15.00000     42.31050
## 30         Lake-Effect Snow        40.18200     0.00000     40.18200
## 31                Dense Fog        22.82950     0.00000     22.82950
## 32               Waterspout         9.56420     0.00000      9.56420
## 33               Dust Storm         5.59900     3.60000      9.19900
## 34                Avalanche         8.72180     0.00000      8.72180
## 35             Freezing Fog         2.18200     0.00000      2.18200
## 36      Tropical Depression         1.73700     0.00000      1.73700
## 37                    Sleet         1.50000     0.00000      1.50000
## 38                   Seiche         0.98000     0.00000      0.98000
## 39               Dust Devil         0.71913     0.00000      0.71913
## 40             Volcanic Ash         0.50000     0.00000      0.50000
## 41 Marine Thunderstorm Wind         0.43640     0.05000      0.48640
## 42       Marine Strong Wind         0.41833     0.00000      0.41833
## 43    Astronomical Low Tide         0.32000     0.00000      0.32000
## 44             Funnel Cloud         0.19460     0.00000      0.19460
## 45              Rip Current         0.16300     0.00000      0.16300
## 46              Dense Smoke         0.10000     0.00000      0.10000
## THE END:-)