Synopsis

In this analysis, we take a brief look at NOAA storm database which includes events from 1950 through 2011 looking at events with the most consequential health and economic impacts.

To assess the health impacts, fatality and injury data are examined. To assess economic impacts, property damage costs are examined. For each impact measure, a summary is reported accounting for over 90% of the measure’s total impact.

It is not surprising to see that the top five events that produced the most deaths are also the same categories that produce the most injuries. It is interesting to note that four of those most deadly and dangerous events are also all included in the top five most costly events in terms of property damage. The event that does not appear on the fatality and injury top five lists and that is included in the property damage top five list is hail, which is replaced by heat-related events on the fatality and injury lists. This makes sense as heat-related events do not cause property damage, whereas all the others do.

Data Processing

library(dplyr)

# Download the data file (comment out if already downloaded)
#download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "repdata_data_StormData.csv.bz2")

# Read the data into a dataframe
StormData <- tbl_df(read.csv("repdata_data_StormData.csv.bz2", stringsAsFactors=FALSE))

# First re-categorize more specific events into more general categories for more meaningful summaries.
# Since there are so many categories, the goal here is to only modify those with the most impact.
# We also don't want to be excessive in our manipulation of the raw data as this increases the potential
# for biasing the data inappropriately.
StormData$EVTYPE <- gsub(".*FLOOD.*", "FLOOD", StormData$EVTYPE, perl=TRUE)
StormData$EVTYPE <- gsub(".*HEAT.*", "HEAT", StormData$EVTYPE, perl=TRUE)
StormData$EVTYPE <- gsub(".*WIND.*", "WIND", StormData$EVTYPE, perl=TRUE)

# Create a dataframe selecting only the columns of interest
StormData <- StormData %>% select(EVTYPE, FATALITIES, INJURIES, PROPDMG) %>% group_by(EVTYPE)

Results

Health Effects

Which events have the largest number of fatalities over all time?

library(dplyr)
library(ggplot2)

StormFat <- StormData %>%
# (1) Select the columns of interest      
                select(EVTYPE, FATALITIES) %>%
# (2) Sum the number of fatalities for each event type            
                summarise(TotalFatal = sum(FATALITIES)) %>% 
# (3) Arrange the sums in descending order;            
                arrange(desc(TotalFatal)) %>% 
# (4) Calculate the percentage of all fatalaties for the corresponding event type    
                mutate(Percentage = TotalFatal/sum(TotalFatal)*100)

# Look at the most impactful events w.r.t. fatalities
StormFat
## Source: local data frame [672 x 3]
## 
##          EVTYPE TotalFatal Percentage
## 1       TORNADO       5633     37.194
## 2          HEAT       3138     20.720
## 3         FLOOD       1523     10.056
## 4          WIND       1446      9.548
## 5     LIGHTNING        816      5.388
## 6   RIP CURRENT        368      2.430
## 7     AVALANCHE        224      1.479
## 8  WINTER STORM        206      1.360
## 9  RIP CURRENTS        204      1.347
## 10 EXTREME COLD        160      1.056
## ..          ...        ...        ...
# Keep the events in descending order for plotting
StormFat$EVTYPE <- factor(StormFat$EVTYPE, levels = StormFat$EVTYPE, ordered = TRUE)

# Generate a plot summarizing the top 5 most deadly events
ggplot(StormFat[1:5, 1:2], aes(x=EVTYPE, y=TotalFatal)) +
    geom_bar(stat="identity") +
    xlab("Event Type") +
    ylab("Total Fatalities") +
    ggtitle("Weather Events With the Greatest Fatalities")

plot of chunk unnamed-chunk-2

Which events have the largest number of injuries over all time?

library(dplyr)
library(ggplot2)

StormInj <- StormData %>% 
# (1) Select the columns of interest      
                select(EVTYPE, INJURIES) %>%
# (2) Sum the number of injuries for each event type            
                summarise(TotalInj = sum(INJURIES)) %>% 
# (3) Arrange the sums in descending order;            
                arrange(desc(TotalInj)) %>% 
# (4) Calculate the percentage of all injuries for the corresponding event type    
                mutate(Percentage = TotalInj/sum(TotalInj)*100)

# Look at the most impactful events w.r.t. injuries
StormInj
## Source: local data frame [672 x 3]
## 
##               EVTYPE TotalInj Percentage
## 1            TORNADO    91346    65.0020
## 2               WIND    11495     8.1799
## 3               HEAT     9154     6.5140
## 4              FLOOD     8603     6.1219
## 5          LIGHTNING     5230     3.7217
## 6          ICE STORM     1975     1.4054
## 7               HAIL     1361     0.9685
## 8       WINTER STORM     1321     0.9400
## 9  HURRICANE/TYPHOON     1275     0.9073
## 10        HEAVY SNOW     1021     0.7265
## ..               ...      ...        ...
# Keep the events in descending order for plotting
StormInj$EVTYPE <- factor(StormInj$EVTYPE, levels = StormInj$EVTYPE, ordered = TRUE)

# Generate a plot summarizing the top 5 most dangerous events
ggplot(StormInj[1:5, 1:2], aes(x=EVTYPE, y=TotalInj)) +
    geom_bar(stat="identity") +
    xlab("Event Type") +
    ylab("Total Injuries") +
    ggtitle("Weather Events With the Greatest Injuries")

plot of chunk unnamed-chunk-3

Economic Effects

Which events cause the most property damage?

library(dplyr)
library(ggplot2)

StormEcon <- StormData %>% 
# (1) Select the columns of interest      
                select(EVTYPE, PROPDMG) %>%
# (2) Sum the damage cost for each event type            
                summarise(TotalDmg = sum(PROPDMG)) %>% 
# (3) Arrange the sums in descending order;            
                arrange(desc(TotalDmg)) %>% 
# (4) Calculate the percentage of damage costs for the corresponding event type    
                mutate(Percentage = TotalDmg/sum(TotalDmg)*100)

# Look at the most impactful events w.r.t. property damage costs
StormEcon
## Source: local data frame [672 x 3]
## 
##          EVTYPE TotalDmg Percentage
## 1       TORNADO  3212258    29.5122
## 2          WIND  3133429    28.7880
## 3         FLOOD  2434047    22.3625
## 4          HAIL   688693     6.3273
## 5     LIGHTNING   603352     5.5432
## 6  WINTER STORM   132721     1.2194
## 7    HEAVY SNOW   122252     1.1232
## 8      WILDFIRE    84459     0.7760
## 9     ICE STORM    66001     0.6064
## 10   HEAVY RAIN    50842     0.4671
## ..          ...      ...        ...
# Keep the events in descending order for plotting
StormEcon$EVTYPE <- factor(StormEcon$EVTYPE, levels = StormEcon$EVTYPE, ordered = TRUE)

# Generate a plot summarizing the top 5 most costly events
ggplot(StormEcon[1:5, 1:2], aes(x=EVTYPE, y=TotalDmg)) +
    geom_bar(stat="identity") +
    xlab("Event Type") +
    ylab("Property Damage ($)") +
    ggtitle("Weather Events With the Greatest Economic Impact")

plot of chunk unnamed-chunk-4