In this analysis, we take a brief look at NOAA storm database which includes events from 1950 through 2011 looking at events with the most consequential health and economic impacts.
To assess the health impacts, fatality and injury data are examined. To assess economic impacts, property damage costs are examined. For each impact measure, a summary is reported accounting for over 90% of the measure’s total impact.
It is not surprising to see that the top five events that produced the most deaths are also the same categories that produce the most injuries. It is interesting to note that four of those most deadly and dangerous events are also all included in the top five most costly events in terms of property damage. The event that does not appear on the fatality and injury top five lists and that is included in the property damage top five list is hail, which is replaced by heat-related events on the fatality and injury lists. This makes sense as heat-related events do not cause property damage, whereas all the others do.
library(dplyr)
# Download the data file (comment out if already downloaded)
#download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "repdata_data_StormData.csv.bz2")
# Read the data into a dataframe
StormData <- tbl_df(read.csv("repdata_data_StormData.csv.bz2", stringsAsFactors=FALSE))
# First re-categorize more specific events into more general categories for more meaningful summaries.
# Since there are so many categories, the goal here is to only modify those with the most impact.
# We also don't want to be excessive in our manipulation of the raw data as this increases the potential
# for biasing the data inappropriately.
StormData$EVTYPE <- gsub(".*FLOOD.*", "FLOOD", StormData$EVTYPE, perl=TRUE)
StormData$EVTYPE <- gsub(".*HEAT.*", "HEAT", StormData$EVTYPE, perl=TRUE)
StormData$EVTYPE <- gsub(".*WIND.*", "WIND", StormData$EVTYPE, perl=TRUE)
# Create a dataframe selecting only the columns of interest
StormData <- StormData %>% select(EVTYPE, FATALITIES, INJURIES, PROPDMG) %>% group_by(EVTYPE)
Which events have the largest number of fatalities over all time?
library(dplyr)
library(ggplot2)
StormFat <- StormData %>%
# (1) Select the columns of interest
select(EVTYPE, FATALITIES) %>%
# (2) Sum the number of fatalities for each event type
summarise(TotalFatal = sum(FATALITIES)) %>%
# (3) Arrange the sums in descending order;
arrange(desc(TotalFatal)) %>%
# (4) Calculate the percentage of all fatalaties for the corresponding event type
mutate(Percentage = TotalFatal/sum(TotalFatal)*100)
# Look at the most impactful events w.r.t. fatalities
StormFat
## Source: local data frame [672 x 3]
##
## EVTYPE TotalFatal Percentage
## 1 TORNADO 5633 37.194
## 2 HEAT 3138 20.720
## 3 FLOOD 1523 10.056
## 4 WIND 1446 9.548
## 5 LIGHTNING 816 5.388
## 6 RIP CURRENT 368 2.430
## 7 AVALANCHE 224 1.479
## 8 WINTER STORM 206 1.360
## 9 RIP CURRENTS 204 1.347
## 10 EXTREME COLD 160 1.056
## .. ... ... ...
# Keep the events in descending order for plotting
StormFat$EVTYPE <- factor(StormFat$EVTYPE, levels = StormFat$EVTYPE, ordered = TRUE)
# Generate a plot summarizing the top 5 most deadly events
ggplot(StormFat[1:5, 1:2], aes(x=EVTYPE, y=TotalFatal)) +
geom_bar(stat="identity") +
xlab("Event Type") +
ylab("Total Fatalities") +
ggtitle("Weather Events With the Greatest Fatalities")
Which events have the largest number of injuries over all time?
library(dplyr)
library(ggplot2)
StormInj <- StormData %>%
# (1) Select the columns of interest
select(EVTYPE, INJURIES) %>%
# (2) Sum the number of injuries for each event type
summarise(TotalInj = sum(INJURIES)) %>%
# (3) Arrange the sums in descending order;
arrange(desc(TotalInj)) %>%
# (4) Calculate the percentage of all injuries for the corresponding event type
mutate(Percentage = TotalInj/sum(TotalInj)*100)
# Look at the most impactful events w.r.t. injuries
StormInj
## Source: local data frame [672 x 3]
##
## EVTYPE TotalInj Percentage
## 1 TORNADO 91346 65.0020
## 2 WIND 11495 8.1799
## 3 HEAT 9154 6.5140
## 4 FLOOD 8603 6.1219
## 5 LIGHTNING 5230 3.7217
## 6 ICE STORM 1975 1.4054
## 7 HAIL 1361 0.9685
## 8 WINTER STORM 1321 0.9400
## 9 HURRICANE/TYPHOON 1275 0.9073
## 10 HEAVY SNOW 1021 0.7265
## .. ... ... ...
# Keep the events in descending order for plotting
StormInj$EVTYPE <- factor(StormInj$EVTYPE, levels = StormInj$EVTYPE, ordered = TRUE)
# Generate a plot summarizing the top 5 most dangerous events
ggplot(StormInj[1:5, 1:2], aes(x=EVTYPE, y=TotalInj)) +
geom_bar(stat="identity") +
xlab("Event Type") +
ylab("Total Injuries") +
ggtitle("Weather Events With the Greatest Injuries")
Which events cause the most property damage?
library(dplyr)
library(ggplot2)
StormEcon <- StormData %>%
# (1) Select the columns of interest
select(EVTYPE, PROPDMG) %>%
# (2) Sum the damage cost for each event type
summarise(TotalDmg = sum(PROPDMG)) %>%
# (3) Arrange the sums in descending order;
arrange(desc(TotalDmg)) %>%
# (4) Calculate the percentage of damage costs for the corresponding event type
mutate(Percentage = TotalDmg/sum(TotalDmg)*100)
# Look at the most impactful events w.r.t. property damage costs
StormEcon
## Source: local data frame [672 x 3]
##
## EVTYPE TotalDmg Percentage
## 1 TORNADO 3212258 29.5122
## 2 WIND 3133429 28.7880
## 3 FLOOD 2434047 22.3625
## 4 HAIL 688693 6.3273
## 5 LIGHTNING 603352 5.5432
## 6 WINTER STORM 132721 1.2194
## 7 HEAVY SNOW 122252 1.1232
## 8 WILDFIRE 84459 0.7760
## 9 ICE STORM 66001 0.6064
## 10 HEAVY RAIN 50842 0.4671
## .. ... ... ...
# Keep the events in descending order for plotting
StormEcon$EVTYPE <- factor(StormEcon$EVTYPE, levels = StormEcon$EVTYPE, ordered = TRUE)
# Generate a plot summarizing the top 5 most costly events
ggplot(StormEcon[1:5, 1:2], aes(x=EVTYPE, y=TotalDmg)) +
geom_bar(stat="identity") +
xlab("Event Type") +
ylab("Property Damage ($)") +
ggtitle("Weather Events With the Greatest Economic Impact")