This analysis draws from NOAA storm data to determine the impact of natural disasters on the population and economy across the United States. Review of the top events that cause death or injury reveals that tornados have, by far, the greatest impact on the US population. Floods have the greatest financial impact, accounting for more than $150 billion in damage.
Data Processing
Loading the data.
library(R.utils)
library(data.table)
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, destfile = "./storm_data.csv.bz2")
bunzip2("./storm_data.csv.bz2", "storm_data.csv")
df <- fread("storm_data.csv")
##
Read 0.0% of 967216 rows
Read 22.7% of 967216 rows
Read 37.2% of 967216 rows
Read 50.7% of 967216 rows
Read 56.9% of 967216 rows
Read 70.3% of 967216 rows
Read 78.6% of 967216 rows
Read 84.8% of 967216 rows
Read 902297 rows and 37 (of 37) columns from 0.523 GB file in 00:00:13
Cleaning data for analysis.
The dataset contains some discrepencies in how the EVTYPE variable is coded. I originally grouped the set into some main categories (wind, flood, fire, tornado, etc) but lost some detail in that process. We’ll begin with case conversion.
df$EVTYPE <- toupper(df$EVTYPE)
df$PROPDMGEXP <- toupper(df$PROPDMGEXP)
df$CROPDMGEXP <- toupper(df$CROPDMGEXP)
For the economic harm variables, they are coded with a multiplier variable contained in a separate column. I’ve recoded that column and multiplied them together in a new column.
df$PROPDMGEXP[df$PROPDMGEXP == "H"] <- 100
df$PROPDMGEXP[df$PROPDMGEXP == "K"] <- 1000
df$PROPDMGEXP[df$PROPDMGEXP == "M"] <- 1000000
df$PROPDMGEXP[df$PROPDMGEXP == "B"] <- 1000000000
df$CROPDMGEXP[df$CROPDMGEXP == "K"] <- 1000
df$CROPDMGEXP[df$CROPDMGEXP == "M"] <- 1000000
df$CROPDMGEXP[df$CROPDMGEXP == "B"] <- 1000000000
df$PROPDMG2 <- df$PROPDMG * as.numeric(df$PROPDMGEXP)
## Warning: NAs introduced by coercion
df$CROPDMG2 <- df$CROPDMG * as.numeric(df$CROPDMGEXP)
## Warning: NAs introduced by coercion
Creating new dataframe “harm” that contains the number of fatalities, injuries, and casualties for each event type.
injuries <- aggregate(INJURIES ~ EVTYPE, data = df, sum)
fatalities <- aggregate(FATALITIES ~ EVTYPE, data = df, sum)
harm <- merge(injuries, fatalities, by = "EVTYPE")
harm$SUM <- harm$INJURIES + harm$FATALITIES
Creating new dataframe “cost” that contains the monetary cost of damage to property, crops, and the sum of those two amounts.
prop <- aggregate(PROPDMG2 ~ EVTYPE, data = df, sum)
crop <- aggregate(CROPDMG2 ~ EVTYPE, data = df, sum)
cost <- merge(prop, crop, by = "EVTYPE")
cost$SUM <- cost$PROPDMG2 + cost$CROPDMG2
Results
Across the United States, which types of events (as indicated in the 𝙴𝚅𝚃𝚈𝙿𝙴 variable) are most harmful with respect to population health?
Events that cause the most fatalities:
attach(harm)
fatal <- head(harm[order(-FATALITIES),], n = 10)
detach(harm)
print(fatal$EVTYPE)
## [1] "TORNADO" "EXCESSIVE HEAT" "FLASH FLOOD" "HEAT"
## [5] "LIGHTNING" "TSTM WIND" "FLOOD" "RIP CURRENT"
## [9] "HIGH WIND" "AVALANCHE"
Events that cause the most injuries:
attach(harm)
injure <- head(harm[order(-INJURIES),], n = 10)
detach(harm)
print(injure$EVTYPE)
## [1] "TORNADO" "TSTM WIND" "FLOOD"
## [4] "EXCESSIVE HEAT" "LIGHTNING" "HEAT"
## [7] "ICE STORM" "FLASH FLOOD" "THUNDERSTORM WIND"
## [10] "HAIL"
Events that cause the most casualties (fatalities + injuries):
attach(harm)
total <- head(harm[order(-SUM),], n = 10)
detach(harm)
print(total$EVTYPE)
## [1] "TORNADO" "EXCESSIVE HEAT" "TSTM WIND"
## [4] "FLOOD" "LIGHTNING" "HEAT"
## [7] "FLASH FLOOD" "ICE STORM" "THUNDERSTORM WIND"
## [10] "WINTER STORM"
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.4
title <- "Most Harmful Events to Population Health"
x <- "Event Type"
y <- "Total Fatalities and Injuries"
lab <- theme(axis.text.x=element_text(angle=60, hjust=1))
sum <- ggplot(data = total, aes(EVTYPE, SUM)) + geom_point()
sum + geom_smooth(method = "lm") + ggtitle(title) + xlab(x) + ylab(y) + lab

Across the United States, which types of events have the greatest economic consequences?
Events that cause the most property damage:
attach(cost)
props <- head(cost[order(-PROPDMG2),], n = 10)
detach(cost)
print(props$EVTYPE)
## [1] "FLOOD" "HURRICANE/TYPHOON" "TORNADO"
## [4] "STORM SURGE" "FLASH FLOOD" "HAIL"
## [7] "HURRICANE" "TROPICAL STORM" "WINTER STORM"
## [10] "HIGH WIND"
Events that cause the most damage to crops:
attach(cost)
crops <- head(cost[order(-CROPDMG2),], n = 10)
detach(cost)
print(crops$EVTYPE)
## [1] "DROUGHT" "FLOOD" "RIVER FLOOD"
## [4] "ICE STORM" "HAIL" "HURRICANE"
## [7] "HURRICANE/TYPHOON" "FLASH FLOOD" "EXTREME COLD"
## [10] "FROST/FREEZE"
Events that have the greatest economic consequences:
attach(cost)
totals <- head(cost[order(-SUM),], n = 10)
detach(cost)
print(totals$EVTYPE)
## [1] "FLOOD" "HURRICANE/TYPHOON" "TORNADO"
## [4] "STORM SURGE" "HAIL" "FLASH FLOOD"
## [7] "DROUGHT" "HURRICANE" "RIVER FLOOD"
## [10] "ICE STORM"
title <- "Most Harmful Events to the Economy"
x <- "Event Type"
y <- "Total Property and Crop Damage"
lab <- theme(axis.text.x=element_text(angle=60, hjust=1))
sum <- ggplot(data = totals, aes(EVTYPE, SUM)) + geom_point()
sum + geom_smooth(method = "lm") + ggtitle(title) + xlab(x) + ylab(y) + lab
