This analysis uses Storm Data collected by the U.S. National Oceanic and Atmospheric Administration (NOAA) to consider which types of storms and weather events result in the greatest number of fatalities, injuries, property damage and crop damage. In order to accomplish this in a reproduceable fashion, we will provide code and prose on how the NOAA data was processed, cleaned and subsetted, and then additional details about the analysis and visualization of the resulting data.
First we set our library requirements and load in the raw data.
require(plyr)
require(ggplot2)
noaa <- read.csv('repdata%2Fdata%2FStormData.csv.bz2')
names(noaa) <- tolower(names(noaa))
Then we subset the data so that it only reflects the time period after 1996. The data collection methods from 1950 to 1954 primarily focused on tornados. The period from 1954 to 1995 is focused only on tornados, thunderstorm wind, and hail. Our study will cover the period from 1996 to 2011.
noaa$bgn_date <- as.Date(noaa$bgn_date,"%m/%d/%Y")
noaa2 <- subset(noaa, bgn_date >= "1996-01-01")
Next we subset so our data only includes columns for variables related to event types, fatalities, injuries, property damage, and crop damage.
noaa_sub <- select(noaa2, evtype, fatalities, injuries, propdmg, propdmgexp, cropdmg, cropdmgexp)
Unfortunately the event type data is extremely messy and would require considerable work to categorize. We can see evidence of this by briefly checking for the number of evtypes that begin with the letter “C”.
sample <- noaa_sub$evtype
sample <- grep("^[cC].*", sample, value = T)
table(sample)
## sample
## COASTAL FLOODING/EROSION COASTAL EROSION
## 1 1
## Coastal Flood COASTAL FLOOD
## 6 589
## coastal flooding Coastal Flooding
## 2 38
## COASTAL FLOODING COASTAL FLOODING/EROSION
## 107 5
## Coastal Storm COASTAL STORM
## 2 8
## COASTALFLOOD COASTALSTORM
## 1 1
## Cold COLD
## 10 34
## Cold and Frost COLD AND FROST
## 6 1
## COLD AND SNOW Cold Temperature
## 1 2
## COLD TEMPERATURES COLD WEATHER
## 4 1
## COLD WIND CHILL TEMPERATURES COLD/WIND CHILL
## 6 539
## COOL SPELL CSTL FLOODING/EROSION
## 1 2
An effective approach will be to simply standardize the cases of evtype using ‘toupper’. This will cut the event types down by half. Lazy, but effective.
ev_all <- noaa_sub$evtype
str(ev_all)
## Factor w/ 985 levels " HIGH SURF ADVISORY",..: 972 834 856 856 856 244 359 856 856 856 ...
noaa_sub$evtype <- as.factor(toupper(noaa_sub$evtype))
str(noaa_sub$evtype)
## Factor w/ 438 levels " HIGH SURF ADVISORY",..: 432 358 365 365 365 113 142 365 365 365 ...
Next we need to clean the prop and crop dmg and corresponding exponenatial fields to make a single clean field indicating total damage cost per event.
table(noaa_sub$propdmgexp)
##
## - ? + 0 1 2 3 4 5
## 276185 0 0 0 1 0 0 0 0 0
## 6 7 8 B h H K m M
## 0 0 0 32 0 0 369938 0 7374
table(noaa_sub$cropdmgexp)
##
## ? 0 2 B k K m M
## 373069 0 0 0 4 0 278686 0 1771
We can see that there only a few variables we need to account for: B, K, M. We’ll write a function called ‘calcfunc’ that handles those cases plus the majority of instances where the field is blank. We then apply that to the prop and crop fields to create a new field indicating total cost.
calcfunc <- function(damage, exp) {
if (exp == ""){
damage
} else if (exp == 'K'){
damage * 1000
} else if (exp == "M"){
damage * 1e+06
} else if (exp == "B"){
damage * 1e+09
} else 0
}
noaa_sub$propcost <- unlist(mapply(calcfunc, noaa_sub$propdmg, noaa_sub$propdmgexp, SIMPLIFY=FALSE))
noaa_sub$cropcost <- unlist(mapply(calcfunc, noaa_sub$cropdmg, noaa_sub$cropdmgexp, SIMPLIFY=FALSE))
First we took a look at the top 5 events with the most fatalities.
fatal <- arrange(aggregate(fatalities ~ evtype, noaa_sub, sum), fatalities, decreasing = T)
fatal <- head(fatal, 5)
names(fatal) <- c("evtype", "total")
fatal
## evtype total
## 1 EXCESSIVE HEAT 1797
## 2 TORNADO 1511
## 3 FLASH FLOOD 887
## 4 LIGHTNING 651
## 5 FLOOD 414
Next we do the same for injuries.
injure <- arrange(aggregate(injuries ~ evtype, noaa_sub, sum), injuries, decreasing = T)
injure <- head(injure, 5)
names(injure) <- c("evtype", "total")
injure
## evtype total
## 1 TORNADO 20667
## 2 FLOOD 6758
## 3 EXCESSIVE HEAT 6391
## 4 LIGHTNING 4141
## 5 TSTM WIND 3629
Next we plot them together.
harmful <- rbind(mutate(fatal, factor="fatalities"), mutate(injure, factor="injuries"))
ggplot(harmful, aes(evtype, total, fill=factor)) +
geom_bar(position='stack', stat='identity') +
labs(title = "Public Health - Most Harmful Events since 1996", fill = "", x = "Event Type", y = "# of Fatalities / Injuries") +
theme(axis.text.x = element_text(angle = 45, hjust=1))
From this we can see clearly the events most detrimental to public health are tornados and excessive heat.
First we took a look at the top 5 events with the most property damage.
property <- aggregate(propcost ~ evtype, noaa_sub, sum)
property <- property[order(-property$propcost),]
property <- head(property, 5)
names(property) <- c("evtype", "total")
property
## evtype total
## 88 FLOOD 1.439e+11
## 150 HURRICANE/TYPHOON 6.931e+10
## 279 STORM SURGE 4.319e+10
## 358 TORNADO 2.462e+10
## 85 FLASH FLOOD 1.522e+10
Next we do the same for crops.
crop <- aggregate(cropcost ~ evtype, noaa_sub, sum)
crop <- crop[order(-crop$cropcost),]
crop <- head(crop, 5)
names(crop) <- c("evtype", "total")
crop
## evtype total
## 53 DROUGHT 1.337e+10
## 88 FLOOD 4.975e+09
## 148 HURRICANE 2.741e+09
## 150 HURRICANE/TYPHOON 2.608e+09
## 113 HAIL 2.476e+09
Next we plot them together.
ecoimpact <- rbind(mutate(property, factor="property damage"), mutate(crop, factor="crop damage"))
ggplot(ecoimpact, aes(evtype, total, fill=factor)) +
geom_bar(position='stack', stat='identity') +
labs(title = "Economic Impact - Most Harmful Events since 1996", fill = "", x = "Event Type", y = "$s of Damage") +
theme(axis.text.x = element_text(angle = 45, hjust=1))
From this analysis we can conclude that floods and hurricane/typhoon have caused the most overall property damage, however drought has the highest impact on crop damage.
From 1996 to 2011, the severe weather events most detrimental to public health were tornados and excessive heat. Floods and hurricanes have caused the most overall property damage, and drought has exhibited the highest impact on crop damage.