Meteorological events often have significant impacts on the US economy and public health. The purpose of this analysis is to provide insight into the following questions:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
library(data.table) # Use the data.table package for speed
library(ggplot2) # Use ggplot2 for pretty graphs
# Save/Load processed file as a .RData file for speed
if (!file.exists('storm.RData'))
{
# Download, unzip and read in data
dataUrl <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
zipFile <- 'repdata_data_StormData.csv.bz2'
download.file(dataUrl,zipFile,method='curl')
storm <- as.data.table(read.csv(zipFile))
# The PROPDMGEXP is coded using a set of numbers, symbols and letters.
# The following transformation is applied to the column in order to
# determine a numeric value for damage.
storm$PropDmgExponent[storm$PROPDMGEXP == ""] <- 1e+00
storm$PropDmgExponent[storm$PROPDMGEXP == "-"] <- 0
storm$PropDmgExponent[storm$PROPDMGEXP == "?"] <- 0
storm$PropDmgExponent[storm$PROPDMGEXP == "+"] <- 0
storm$PropDmgExponent[storm$PROPDMGEXP == "0"] <- 1e+00
storm$PropDmgExponent[storm$PROPDMGEXP == "1"] <- 1e+01
storm$PropDmgExponent[storm$PROPDMGEXP == "2"] <- 1e+02
storm$PropDmgExponent[storm$PROPDMGEXP == "3"] <- 1e+03
storm$PropDmgExponent[storm$PROPDMGEXP == "4"] <- 1e+04
storm$PropDmgExponent[storm$PROPDMGEXP == "5"] <- 1e+05
storm$PropDmgExponent[storm$PROPDMGEXP == "6"] <- 1e+06
storm$PropDmgExponent[storm$PROPDMGEXP == "7"] <- 1e+07
storm$PropDmgExponent[storm$PROPDMGEXP == "8"] <- 1e+08
storm$PropDmgExponent[storm$PROPDMGEXP == "B"] <- 1e+09
storm$PropDmgExponent[storm$PROPDMGEXP == "h"] <- 1e+02
storm$PropDmgExponent[storm$PROPDMGEXP == "H"] <- 1e+02
storm$PropDmgExponent[storm$PROPDMGEXP == "K"] <- 1e+03
storm$PropDmgExponent[storm$PROPDMGEXP == "m"] <- 1e+06
storm$PropDmgExponent[storm$PROPDMGEXP == "M"] <- 1e+06
storm$PropDmgValue <- storm$PROPDMG * storm$PropDmgExponent
# Save data so you don't have to do the formatting again.
save(storm, file='storm.RData')
} else
{
load('storm.RData')
}
# Get damage values by event
propDmgByEvent <- storm[,list(PropDmgValue = sum(PropDmgValue)),by=EVTYPE]
FatalitiesByEvent <- storm[,list(FATALITIES = sum(FATALITIES)), by=EVTYPE]
InjuriesByEvent <- storm[,list(INJURIES = sum(INJURIES)), by=EVTYPE]
# Order (using data.table keys) and extract the worst 10 events
setkey(propDmgByEvent, PropDmgValue)
setkey(FatalitiesByEvent, FATALITIES)
setkey(InjuriesByEvent, INJURIES)
propDmgByEvent <- tail(propDmgByEvent, 10)
FatalitiesByEvent <- tail(FatalitiesByEvent, 10)
InjuriesByEvent <- tail(InjuriesByEvent, 10)
# Plot Injuries by Event
g <- ggplot(data=InjuriesByEvent, aes(x=as.factor(EVTYPE), y=INJURIES))
g <- g + geom_bar(stat='identity')
g <- g + theme(axis.text.x = element_text(angle = 90, hjust=1))
g <- g + labs(title='Top 10 Events causing Injury', x='Event Type', y='Injuries')
g
Tornados appear to cause the most injuries of any major meteorological event. All other event types are significantly less.
# Plot Fatalities by Event
g <- ggplot(data=FatalitiesByEvent, aes(x=as.factor(EVTYPE), y=FATALITIES))
g <- g + geom_bar(stat='identity')
g <- g + theme(axis.text.x = element_text(angle = 90, hjust=1))
g <- g + labs(title='Top 10 Events causing Fatality', x='Event Type', y='Fatalities')
g
The trend established with the injuries plot is similar to fatalities. It is clear that Tornado is by far the most damaging meteorological event impacting public health. Excessive heat and strong winds are also major dangers, but tornado-related injuries and fatalities far outpace the other factors.
# Plot Property Damages by Event
g <- ggplot(data=propDmgByEvent, aes(x=as.factor(EVTYPE), y=PropDmgValue))
g <- g + geom_bar(stat='identity')
g <- g + theme(axis.text.x = element_text(angle = 90, hjust=1))
g <- g + labs(title='Top 10 Events causing Property Damage',
x='Event Type', y='Injuries')
g
Flooding seems to be the predominant cause of property damage, followed closely by tornado and hurricane.