Hurricanes and other severe weather events can have public health and economic consequences. These can range from fatalities and injuries to property and crop damage. Identifying the most dangerous events can help to appropriate limited resources to the most vulnerable areas in hopes of preventing possible future damage.
This paper will explore the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database, which “tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.” [1] This dataset covers the period from 1950 to November 2011.
# Download the bz2 compressed file if it does not exist in your working
# directory.
if (!file.exists("repdata-data-StormData.csv.bz2")) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
"repdata-data-StormData.csv.bz2")
}
# Initially read in data with csv. fread has trouble with character escapes
# and double quotations in the REMARKS column. Read from the bz2file, this
# may take awhile.
stormcsv <- read.csv(bzfile("repdata-data-StormData.csv.bz2", "repdata-data-StormData.csv"))
library(data.table)
## Convert to data.table for easier dataframe manipulation.
storm <- data.table(stormcsv)
## Warning: closing unused connection 5 (repdata-data-StormData.csv.bz2)
# Remove the old data.frame
rm(stormcsv)
## There are 902297 unique entries for each row in REFNUM. The sum of each
## unique row also matches the sum of 1:902297.
length(unique(storm$REFNUM))
## [1] 902297
sum(storm$REFNUM)
## [1] 4.071e+11
sum(as.numeric(seq(902297)))
## [1] 4.071e+11
# These match so it appears that there are 902297 unique records in this
# dataset.
The data start with 985 different entries for severe weather type (EVTYPE)
Many are overlapping and misspelled versions of the same thing.
We will consolidate down to about 150 before continuin to process the data.
This is done by systematically looking at tables of word stems to try to find values to compress to, while accounting for mispellings. We might get this even lower, or preserve granularity in a longer exploratory paper.
length(table(storm$EVTYPE))
## [1] 985
storm$EVTYPE <- tolower(storm$EVTYPE)
# length(table(storm$EVTYPE))
library(tm)
storm$EVTYPE <- removePunctuation(storm$EVTYPE)
# length(table(storm$EVTYPE)) Summary EVTYPEs contain no injuries or
# fatalities, and refer to individual Storm Data entries for detailed info
# in their REMARKS sections. They also provide no EVTPE information, and
# only 0's in most columns, with only long REMARKS referring to individual
# records for details. These can be removed while preserving the actual data
# in the data entries the REMARKS refer to. subset(storm, grepl('summary',
# storm$EVTYPE))
storm <- subset(storm, !grepl("summary", storm$EVTYPE))
# length(table(storm$EVTYPE))
# table(grep('.*water.*spout', storm$EVTYPE, value = TRUE))
storm$EVTYPE <- gsub(".*water.*spout.*", "waterspout", storm$EVTYPE)
# length(table(storm$EVTYPE))
# thunderstorm winds has 20846 entries and thunderstorm wind has 82565
# entries. There are various misspellinged versions of this as well.
# Consolidate a lot of these into by searching for entries containing 'thun
# st wi' and combining all into one 'thunderstorm winds' label. marine
# thunderstorm winds (third largest thun st win category) is also included
# since it only has 5812 which is very small in relation to the overall set
# of 900k+, and we are simply looking for overall trends in this assessment.
# table(storm$EVTYPE) table(grep('.*thun.*st.*win.*', storm$EVTYPE, value =
# TRUE))
storm$EVTYPE <- gsub(".*thun.*st.*win.*", "thunderstorm winds", storm$EVTYPE)
# length(table(storm$EVTYPE))
# There is a very small amount of granular tstm data with mph(?) numbers
# attached. With most the vast majority having single digit numbers of
# entries, these can be folded into the larger tstm wind cateogry that has
# 219942 entries. table(grep('.*tstm.*', storm$EVTYPE, value = TRUE))
storm$EVTYPE <- gsub(".*tstm.*", "tstm wind", storm$EVTYPE)
# length(table(storm$EVTYPE))
# lightning is the largest category 15755, misspelled version are folded in.
# table(grep('.*li.*nin.*', storm$EVTYPE, value = TRUE))
storm$EVTYPE <- gsub(".*li.*nin.*", "lightning", storm$EVTYPE)
# length(table(storm$EVTYPE))
## high wind 20214, high winds 1534 table(grep('.*hi.*win.*', storm$EVTYPE,
## value = TRUE))
storm$EVTYPE <- gsub(".*hi.*win.*", "high wind", storm$EVTYPE)
# length(table(storm$EVTYPE))
## hail table(storm$EVTYPE) table(grep('.*hail.*', storm$EVTYPE, value =
## TRUE))
storm$EVTYPE <- gsub(".*hail.*", "hail", storm$EVTYPE)
# length(table(storm$EVTYPE))
## coastal flood table(storm$EVTYPE) table(grep('.*coast.*flood.*',
## storm$EVTYPE, value = TRUE))
storm$EVTYPE <- gsub(".*coast.*flood.*", "coastal flood", storm$EVTYPE)
# length(table(storm$EVTYPE))
## flash flood table(storm$EVTYPE) table(grep('.*flas.*flood.*',
## storm$EVTYPE, value = TRUE))
storm$EVTYPE <- gsub(".*flas.*flood.*", "flash flood", storm$EVTYPE)
# length(table(storm$EVTYPE))
## river flood table(storm$EVTYPE) table(grep('.*riv.*flood.*', storm$EVTYPE,
## value = TRUE))
storm$EVTYPE <- gsub(".*riv.*flood.*", "river flood", storm$EVTYPE)
# length(table(storm$EVTYPE))
## urban flood table(storm$EVTYPE) table(grep('.*urban.*flood.*',
## storm$EVTYPE, value = TRUE))
storm$EVTYPE <- gsub(".*urban.*flood.*", "urban flood", storm$EVTYPE)
# length(table(storm$EVTYPE))
## tidal flood table(storm$EVTYPE) table(grep('.*tidal.*flood.*',
## storm$EVTYPE, value = TRUE))
storm$EVTYPE <- gsub(".*tidal.*flood.*", "tidal flood", storm$EVTYPE)
# length(table(storm$EVTYPE))
## lakeshore flood table(storm$EVTYPE) table(grep('.*lake.*flood.*',
## storm$EVTYPE, value = TRUE))
storm$EVTYPE <- gsub(".*lake.*flood.*", "lakeshore flood", storm$EVTYPE)
# length(table(storm$EVTYPE))
## flash flood again, flash at end table(storm$EVTYPE)
## table(grep('.*flood.*flash.*', storm$EVTYPE, value = TRUE))
storm$EVTYPE <- gsub(".*flood.*flash.*", "flash flood", storm$EVTYPE)
# length(table(storm$EVTYPE))
## flash flood again, floooding table(storm$EVTYPE) table(grep('.*flash.*',
## storm$EVTYPE, value = TRUE))
storm$EVTYPE <- gsub(".*flash.*", "flash flood", storm$EVTYPE)
# length(table(storm$EVTYPE))
## flood again, flash at end table(storm$EVTYPE) table(grep('^flood.*',
## storm$EVTYPE, value = TRUE))
storm$EVTYPE <- gsub("^flood.*", "flood", storm$EVTYPE)
# length(table(storm$EVTYPE))
## misc. floods table(storm$EVTYPE)
## table(grep('.*(.*(str|cst|ice|heavy|break|rural|beach|highway|major|minor|small|snow|street|stream|local).*)flood.*',
## storm$EVTYPE, value = TRUE))
storm$EVTYPE <- gsub(".*(.*(str|cst|ice|heavy|break|rural|beach|highway|major|minor|small|snow|street|stream|local).*)flood.*",
"flood", storm$EVTYPE)
# length(table(storm$EVTYPE))
## heat table(storm$EVTYPE) table(grep('.*excess.*heat.*', storm$EVTYPE,
## value = TRUE))
storm$EVTYPE <- gsub(".*excess.*heat.*", "heat", storm$EVTYPE)
# length(table(storm$EVTYPE))
## rip current table(storm$EVTYPE) table(grep('.*rip curr.*', storm$EVTYPE,
## value = TRUE))
storm$EVTYPE <- gsub(".*rip curr.*", "rip current", storm$EVTYPE)
# length(table(storm$EVTYPE))
## extreme table(storm$EVTYPE) table(grep('.*extreme.*chil.*', storm$EVTYPE,
## value = TRUE))
storm$EVTYPE <- gsub(".*extreme.*chil.*", "extreme wind chill", storm$EVTYPE)
# length(table(storm$EVTYPE))
## tornado sort(table(storm$EVTYPE)) table(grep('.*tornado.*', storm$EVTYPE,
## value = TRUE))
storm$EVTYPE <- gsub(".*tornado.*", "tornado", storm$EVTYPE)
# length(table(storm$EVTYPE))
## heavy snow sort(table(storm$EVTYPE)) table(grep('.*heavy.*sno.*',
## storm$EVTYPE, value = TRUE))
storm$EVTYPE <- gsub(".*heavy.*sno.*", "heavy snow", storm$EVTYPE)
# length(table(storm$EVTYPE))
## heavy rain sort(table(storm$EVTYPE)) table(grep('.*heavy.*rai.*',
## storm$EVTYPE, value = TRUE))
storm$EVTYPE <- gsub(".*heavy.*rain.*", "heavy rain", storm$EVTYPE)
# length(table(storm$EVTYPE))
## winter storm sort(table(storm$EVTYPE)) table(grep('.*win.*sto.*',
## storm$EVTYPE, value = TRUE))
storm$EVTYPE <- gsub(".*win.*sto.*", "winter storm", storm$EVTYPE)
# length(table(storm$EVTYPE))
## heavy snow sort(table(storm$EVTYPE)) table(grep('.*win.*wea.*',
## storm$EVTYPE, value = TRUE))
storm$EVTYPE <- gsub(".*win.*wea.*", "winter weather", storm$EVTYPE)
# length(table(storm$EVTYPE))
## funnel cloud sort(table(storm$EVTYPE)) table(grep('.*funnel.*',
## storm$EVTYPE, value = TRUE))
storm$EVTYPE <- gsub(".*funnel.*", "funnel cloud", storm$EVTYPE)
# length(table(storm$EVTYPE))
## strong wind sort(table(storm$EVTYPE)) table(grep('.*str.*win.*',
## storm$EVTYPE, value = TRUE))
storm$EVTYPE <- gsub(".*str.*win.*", "strong wind", storm$EVTYPE)
# length(table(storm$EVTYPE))
## these streams turned out to be floods sort(table(storm$EVTYPE))
## table(grep('.*stream.*', storm$EVTYPE, value = TRUE))
storm$EVTYPE <- gsub(".*stream.*", "stream flood", storm$EVTYPE)
# length(table(storm$EVTYPE))
## wild fire sort(table(storm$EVTYPE)) table(grep('.*wild*', storm$EVTYPE,
## value = TRUE))
storm$EVTYPE <- gsub(".*wild.*", "wildfire", storm$EVTYPE)
# length(table(storm$EVTYPE))
## blizzard sort(table(storm$EVTYPE)) table(grep('.*bliz*', storm$EVTYPE,
## value = TRUE))
storm$EVTYPE <- gsub(".*bliz.*", "blizzard", storm$EVTYPE)
# length(table(storm$EVTYPE))
## drought sort(table(storm$EVTYPE)) table(grep('.*drou.*', storm$EVTYPE,
## value = TRUE))
storm$EVTYPE <- gsub(".*drou.*", "drought", storm$EVTYPE)
# length(table(storm$EVTYPE))
## HEAT sort(table(storm$EVTYPE)) table(grep('.*heat.*', storm$EVTYPE, value
## = TRUE))
storm$EVTYPE <- gsub(".*heat.*", "heat", storm$EVTYPE)
# length(table(storm$EVTYPE))
## ice sort(table(storm$EVTYPE)) table(grep('.*ice.*', storm$EVTYPE, value =
## TRUE))
storm$EVTYPE <- gsub(".*ice.*", "ice storm", storm$EVTYPE)
# length(table(storm$EVTYPE))
## frost freeze sort(table(storm$EVTYPE)) table(grep('.*fros.*',
## storm$EVTYPE, value = TRUE))
storm$EVTYPE <- gsub(".*fros.*", "frost freeze", storm$EVTYPE)
# length(table(storm$EVTYPE))
## fog sort(table(storm$EVTYPE)) table(grep('.*fog.*', storm$EVTYPE, value =
## TRUE))
storm$EVTYPE <- gsub(".*fog.*", "dense fog", storm$EVTYPE)
# length(table(storm$EVTYPE))
## chill sort(table(storm$EVTYPE)) table(grep('.*chill.*', storm$EVTYPE,
## value = TRUE))
storm$EVTYPE <- gsub(".*chill.*", "extreme wind chill", storm$EVTYPE)
# length(table(storm$EVTYPE))
## thunderstorm sort(table(storm$EVTYPE)) table(grep('.*understorm$',
## storm$EVTYPE, value = TRUE))
storm$EVTYPE <- gsub(".*understorm$", "thunderstorm", storm$EVTYPE)
# length(table(storm$EVTYPE))
## thunderstorm sort(table(storm$EVTYPE)) table(grep('.*understorms$',
## storm$EVTYPE, value = TRUE))
storm$EVTYPE <- gsub(".*understorms$", "thunderstorm", storm$EVTYPE)
# length(table(storm$EVTYPE))
## snow sort(table(storm$EVTYPE)) table(grep('.*snow.*', storm$EVTYPE, value
## = TRUE))
storm$EVTYPE <- gsub(".*snow.*", "heavy snow", storm$EVTYPE)
# length(table(storm$EVTYPE))
## snow sort(table(storm$EVTYPE)) table(grep('.*rain.*', storm$EVTYPE, value
## = TRUE))
storm$EVTYPE <- gsub(".*rain.*", "heavy rain", storm$EVTYPE)
# length(table(storm$EVTYPE))
## dry microburst sort(table(storm$EVTYPE)) table(grep('.*dry.*micro.*',
## storm$EVTYPE, value = TRUE))
storm$EVTYPE <- gsub(".*dry.*micro.*", "dry microburst", storm$EVTYPE)
# length(table(storm$EVTYPE))
## hurricane sort(table(storm$EVTYPE)) table(grep('.*hurri.*', storm$EVTYPE,
## value = TRUE))
storm$EVTYPE <- gsub(".*hurri.*", "hurricane", storm$EVTYPE)
# length(table(storm$EVTYPE))
## storm surge folded into hurricane as they occur during hurricanes [6]
## sort(table(storm$EVTYPE)) table(grep('.*stor.*surg.*', storm$EVTYPE, value
## = TRUE))
storm$EVTYPE <- gsub(".*stor.*surg.*", "hurricane", storm$EVTYPE)
# length(table(storm$EVTYPE))
## tstm = thunderstorm sort(table(storm$EVTYPE)) table(grep('.*thund.*win.*',
## storm$EVTYPE, value = TRUE))
storm$EVTYPE <- gsub(".*thund.*win.*", "thunderstorm winds", storm$EVTYPE)
# length(table(storm$EVTYPE))
## tstm = thunderstorm sort(table(storm$EVTYPE)) table(grep('.*tstm*',
## storm$EVTYPE, value = TRUE))
storm$EVTYPE <- gsub(".*tstm.*", "thunderstorm winds", storm$EVTYPE)
# length(table(storm$EVTYPE))
## Types of Floods are being consolidated into one 'Flood' category for this
## overview analysis. Leave this step out in future data exploration for more
## granular information. sort(table(storm$EVTYPE)) table(grep('.*flood.*',
## storm$EVTYPE, value = TRUE))
storm$EVTYPE <- gsub(".*flood.*", "flood", storm$EVTYPE)
# length(table(storm$EVTYPE))
Storm surge was folded into hurricane as they occur during hurricanes [2]
## Only rows with at least 1 fatality, injuries, or dollar damage is
## important for this analysis
storm <- storm[storm$FATALITIES > 0 | storm$INJURIES > 0 | storm$PROPDMG > 0 |
storm$CROPDMG > 0, ]
length(table(storm$EVTYPE))
## [1] 144
We have dropped the different values of severe weather events (EVTYPE) from 985 to 144. This should be sufficient for this analysis to find overall trends in the data.
########### Add dates
storm$date <- as.Date(storm$BGN_DATE, "%m/%e/%Y %H:%M:%S")
storm$year.month <- strftime(storm$date, format = "%Y/%m")
storm$year <- strftime(storm$date, format = "%Y")
Here we sum up all fatalities and injuries over the 61 year period resulting from each of the 144 severe weather events.
fatal <- storm[, sum(FATALITIES), by = EVTYPE]
# Order in descending order by damage
fatal <- fatal[order(-V1)]
setnames(fatal, "V1", "Fatalities")
injury <- storm[, sum(INJURIES), by = EVTYPE]
# Order in descending order by damage
injury <- injury[order(-V1)]
setnames(injury, "V1", "Injuries")
head(injury, n = 20)
## EVTYPE Injuries
## 1: tornado 91364
## 2: thunderstorm winds 9505
## 3: heat 9209
## 4: flood 8683
## 5: lightning 5231
## 6: ice storm 2154
## 7: wildfire 1606
## 8: high wind 1523
## 9: hail 1371
## 10: hurricane 1371
## 11: winter storm 1338
## 12: heavy snow 1121
## 13: dense fog 1077
## 14: blizzard 805
## 15: winter weather 538
## 16: rip current 529
## 17: dust storm 440
## 18: tropical storm 340
## 19: strong wind 323
## 20: heavy rain 303
head(fatal, n = 20)
## EVTYPE Fatalities
## 1: tornado 5633
## 2: heat 3134
## 3: flood 1553
## 4: lightning 817
## 5: thunderstorm winds 754
## 6: rip current 577
## 7: high wind 300
## 8: extreme wind chill 237
## 9: avalanche 224
## 10: winter storm 217
## 11: extreme cold 162
## 12: heavy snow 159
## 13: hurricane 157
## 14: strong wind 125
## 15: heavy rain 108
## 16: high surf 104
## 17: ice storm 102
## 18: blizzard 101
## 19: wildfire 90
## 20: dense fog 81
Here we take the top 20 most dangerous events in terms of fatalities and injuries and combine them into one dataset to plot them side-by-side.
## Take the Top 20 from each set and merge into one set for plotting
harmful <- merge(head(injury, n = 20), head(fatal, n = 20), by = "EVTYPE", all = TRUE)
harmful <- harmful[order(-Injuries, -Fatalities)]
## Error: object 'Injuries' not found
library(stringi)
## Warning: package 'stringi' was built under R version 3.0.3
library(reshape2)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.0.3
##
## Attaching package: 'ggplot2'
##
## The following object is masked from 'package:stringi':
##
## %+%
## Here we reshape the dataset into long format so a dodged barplot can be
## made with the top 20 events.
dat.harm <- melt(harmful, id.vars = "EVTYPE")
## stringi package has a function for capitalizing all words in a string.
## This will make the event types for plot-friendly.
dat.harm$EVTYPE <- stri_trans_totitle(dat.harm$EVTYPE)
It looks as though tornados are by far the most dangerous natural disaster in terms of injuries and fatalities over the 61 year period from 1950 to 2011.
## We can make a dodged barplot containing fatalities and injuries and
## reporting the number of each above each bar.
ggplot(dat.harm, aes(x = reorder(EVTYPE, -value), y = value, fill = variable)) +
geom_bar(stat = "identity", position = "dodge") + theme(axis.text.x = element_text(angle = 90)) +
geom_text(aes(label = value, ymax = 94000), position = position_dodge(width = 0.9),
hjust = -0.2, vjust = 0.25, show_guide = F, angle = 90, color = "black",
size = 3.5) + ggtitle("Top 20 Severe Weather Events and reported Fatalities and Injuries \n from 1950 to 2011") +
ylab("Number of Fatalities/Injuries") + xlab("Sever Weather Events") + scale_fill_discrete("Type of Harm")
## Warning: Removed 8 rows containing missing values (geom_text).
Now we look at economic damage of sever weather events over this 61-year period.
But first, we must do some more data-processing to convert two columns of data for each property/crop damage economic measurement (PROPDM/CROPDMG and PROPDMGEXP/CROPDMGEXP) into numeric damage estimates.
One entry of $115 Billion in damages for California flooding in January 2006 looks incorrect, and is changed to $115 million.
## Sanity Check: The largest single event for property damage shows up as a
## flood in California at $115 Billion, but the remarks and some googling
## show this to be most likely keyed in wrong, and should be in millions
## (more in line with Crop Damage in millions as well)
# subset(storm, storm$PROPDMGEXP == 'B')[31] #too verbose, uncomment to read
storm[storm$PROPDMGEXP == "B" & year == 2006, ]$PROPDMGEXP <- "M"
# subset(storm, storm$PROPDMGEXP == 'B') names(storm) paste(storm$PROPDMG,
# storm$PROPDMGEXP, sep = '') storm2<-storm
storm$PROPDMGEXP <- tolower(storm$PROPDMGEXP)
storm[storm$PROPDMGEXP == "h", c("PROPDMGEXP")] <- "1e2"
storm[storm$PROPDMGEXP == "k", c("PROPDMGEXP")] <- "1e3"
storm[storm$PROPDMGEXP == "m", c("PROPDMGEXP")] <- "1e6"
storm[storm$PROPDMGEXP == "b", c("PROPDMGEXP")] <- "1e9"
storm[storm$PROPDMGEXP == "+", c("PROPDMGEXP")] <- ""
storm[storm$PROPDMGEXP == "-", c("PROPDMGEXP")] <- ""
## Record 2 shows $1500 damage with 150^0 EXP, which is actually a multiplier
## of 10. But then record 5 shows 0.5^0; if we applied *10, this would be $5
## damage for destroying a home, a trailer, utility poles.
## head(subset(storm,storm[,PROPDMGEXP == 0])) #too verbose, uncomment to
## read
## Record 2 shows 68^7 EXP which would be an absurd amount of dollar damage
## for eight inches of rain. head(subset(storm,storm[,PROPDMGEXP == 7]))#too
## verbose, uncomment to read
## In light of just these few examples, this makes the EXP reported in
## integer numbers (not k/m/h/b) wholly unreliable and thus these are removed
## from this analysis to be included included only after extensive research
## beyond the scope of this analysis. This will also be applied to crop
## damage figures.
storm[storm$PROPDMGEXP == "0", c("PROPDMGEXP")] <- ""
storm[storm$PROPDMGEXP == "2", c("PROPDMGEXP")] <- ""
storm[storm$PROPDMGEXP == "3", c("PROPDMGEXP")] <- ""
storm[storm$PROPDMGEXP == "4", c("PROPDMGEXP")] <- ""
storm[storm$PROPDMGEXP == "5", c("PROPDMGEXP")] <- ""
storm[storm$PROPDMGEXP == "6", c("PROPDMGEXP")] <- ""
storm[storm$PROPDMGEXP == "7", c("PROPDMGEXP")] <- ""
### Convert blanks to 0 for multiplication, and summations
storm[storm$PROPDMGEXP == "", c("PROPDMGEXP")] <- "0"
table(storm$PROPDMGEXP)
##
## 0 1e2 1e3 1e6 1e9
## 11831 7 231428 11328 39
storm$property.damage <- as.numeric(storm$PROPDMGEXP) * as.numeric(storm$PROPDMG)
###############
storm$CROPDMGEXP <- tolower(storm$CROPDMGEXP)
storm[storm$CROPDMGEXP == "h", c("CROPDMGEXP")] <- "1e2"
storm[storm$CROPDMGEXP == "k", c("CROPDMGEXP")] <- "1e3"
storm[storm$CROPDMGEXP == "m", c("CROPDMGEXP")] <- "1e6"
storm[storm$CROPDMGEXP == "b", c("CROPDMGEXP")] <- "1e9"
storm[storm$CROPDMGEXP == "+", c("CROPDMGEXP")] <- ""
storm[storm$CROPDMGEXP == "-", c("CROPDMGEXP")] <- ""
storm[storm$CROPDMGEXP == "0", c("CROPDMGEXP")] <- ""
storm[storm$CROPDMGEXP == "2", c("CROPDMGEXP")] <- ""
storm[storm$CROPDMGEXP == "3", c("CROPDMGEXP")] <- ""
storm[storm$CROPDMGEXP == "4", c("CROPDMGEXP")] <- ""
storm[storm$CROPDMGEXP == "5", c("CROPDMGEXP")] <- ""
storm[storm$CROPDMGEXP == "6", c("CROPDMGEXP")] <- ""
storm[storm$CROPDMGEXP == "7", c("CROPDMGEXP")] <- ""
storm[storm$CROPDMGEXP == "?", c("CROPDMGEXP")] <- ""
storm[storm$CROPDMGEXP == "", c("CROPDMGEXP")] <- "0"
storm$crop.damage <- as.numeric(storm$CROPDMGEXP) * as.numeric(storm$CROPDMG)
Here we sum up all property and crop damage over the 61-year period for each severe weather event and plot the totals for the top 20 most costly events side-by-side.
The two top 20 sets are created separately, then the top 20 are combined to plot side-by-side.
prop.damage <- storm[, sum(property.damage), by = EVTYPE]
# Order in descending order by damage
prop.damage <- prop.damage[order(-V1)]
setnames(prop.damage, "V1", "Property Damage")
crop.damage <- storm[, sum(crop.damage), by = EVTYPE]
# Order in descending order by damage
crop.damage <- crop.damage[order(-V1)]
setnames(crop.damage, "V1", "Crop Damage")
## take the Top 20 from each set and merge into one set for plotting
damage <- merge(head(prop.damage, n = 20), head(crop.damage, n = 20), by = "EVTYPE",
all = TRUE)
damage <- damage[order(-Property_Damage, -Crop_Damage)]
## Error: object 'Property_Damage' not found
damage
## EVTYPE Property Damage Crop Damage
## 1: blizzard 6.593e+08 1.121e+08
## 2: damaging freeze NA 2.962e+08
## 3: drought 1.046e+09 1.397e+10
## 4: excessive wetness NA 1.420e+08
## 5: extreme cold NA 1.313e+09
## 6: flood 5.270e+10 1.239e+10
## 7: freeze NA 4.567e+08
## 8: frost freeze NA 1.202e+09
## 9: hail 1.597e+10 3.047e+09
## 10: heat NA 9.044e+08
## 11: heavy rain 3.234e+09 8.062e+08
## 12: heavy snow 1.020e+09 1.347e+08
## 13: high wind 6.166e+09 7.018e+08
## 14: hurricane 1.326e+11 5.506e+09
## 15: ice storm 3.964e+09 5.022e+09
## 16: landslide 3.246e+08 NA
## 17: lightning 9.387e+08 NA
## 18: strong wind 1.816e+08 6.995e+07
## 19: thunderstorm 1.208e+09 NA
## 20: thunderstorm winds 1.137e+10 1.256e+09
## 21: tornado 5.694e+10 4.150e+08
## 22: tropical storm 7.704e+09 6.783e+08
## 23: tsunami 1.441e+08 NA
## 24: typhoon 6.002e+08 NA
## 25: wildfire 8.492e+09 4.028e+08
## 26: winter storm 6.690e+09 NA
## EVTYPE Property Damage Crop Damage
allDamage <- melt(damage, id.vars = "EVTYPE")
## stringi package has a function for capitalizing all words in a string
allDamage$EVTYPE <- stri_trans_totitle(allDamage$EVTYPE)
Hurricanes, floods, and tornados take the lion's share of the property damage tolls, while droughts and floods account for the largest portions of crop damage estimates.
## We can make a dodged barplot containing fatalities and injuries and
## reporting the number of each above each bar.
ggplot(allDamage, aes(x = reorder(EVTYPE, -value), y = value/1e+09, fill = variable)) +
geom_bar(stat = "identity", position = "dodge") + theme(axis.text.x = element_text(angle = 90)) +
geom_text(aes(label = paste("$", round(value/1e+09, 2), "B", sep = ""),
ymax = 150), position = position_dodge(width = 0.9), hjust = -0.2, vjust = 0.25,
show_guide = F, angle = 90, color = "black", size = 3.5) + ggtitle("Top 20 Severe Weather Events and reported damage to property and crops \n from 1950 to 2011") +
ylab("Damage in billions of dollars") + xlab("Sever Weather Events") + scale_fill_discrete("Type of Damage")
## Warning: Removed 12 rows containing missing values (geom_text).
There seems to be a disconnect between the most dangerous events to public health and the most costly in terms of economic impact. We can explore this more closely by breaking down the property damage into most costly months for further analysis.
prop.damage.year <- storm[, sum(property.damage), by = list(EVTYPE, year.month)]
# Order in descending order by damage
prop.damage.year <- prop.damage.year[order(-V1)]
setnames(prop.damage.year, "V1", "Property.Damage")
Here we see that while tornados are the most dangerous to public health, hurricanes seem to be what cause the most economic damage - Hurricanes Dennis, Emily, Katrina, Rita, and Wilma in August-October 2005 [3] and Hurricanes Ivan, Charley, Frances, and Jeanne in August to September of 2004. [4]
Tornados also don't seem to have excessively large one-month contributions (but possibly many smaller ones) while we seem to be unprepared for the hurricanes and storm surges which incur major costs in more infrequent but much larger monthly sums.
We also see that tornados had more large damage months between 1965 and 1985, and didn't contribute large month's of damages again until April of 2011 during the “2011 tornado outbreak.” [5]
Incidentally, this figure also does not include Hurricane Sandy from 2012, which was the second-costliest hurricane in United States history. [6]
prop50 <- head(prop.damage.year, n = 50)
prop50$EVTYPE
## [1] "hurricane" "hurricane" "hurricane"
## [4] "hurricane" "hurricane" "tornado"
## [7] "hurricane" "tropical storm" "flood"
## [10] "winter storm" "flood" "flood"
## [13] "hurricane" "hurricane" "flood"
## [16] "tornado" "hurricane" "hail"
## [19] "heavy rain" "flood" "high wind"
## [22] "flood" "wildfire" "hurricane"
## [25] "flood" "hail" "thunderstorm winds"
## [28] "tornado" "flood" "wildfire"
## [31] "tornado" "flood" "flood"
## [34] "tornado" "tornado" "tornado"
## [37] "thunderstorm winds" "hurricane" "thunderstorm"
## [40] "tornado" "hurricane" "high wind"
## [43] "flood" "tornado" "flood"
## [46] "ice storm" "thunderstorm winds" "flood"
## [49] "tornado" "tropical storm"
library(RColorBrewer)
## Create a custom palette gradient of 12 colors using RColorBrewer 'set1',
## since the actual set only contains 9 colors for use. We'll need more than
## 9 colors since there are more than 9 severe weather events types that will
## be plotted.
colourCount = 12
getPalette = colorRampPalette(brewer.pal(9, "Set1"))
ggplot(prop50, aes(x = year.month, y = Property.Damage/1e+09, fill = EVTYPE)) +
geom_bar() + theme(axis.text.x = element_text(angle = 0)) + scale_fill_manual(values = getPalette(colourCount)) +
geom_text(aes(label = paste("$", round(Property.Damage/1e+09, 2), "B", sep = ""),
ymax = 80), position = position_dodge(width = 0.9), hjust = -0.2, vjust = 0.25,
show_guide = F, angle = 0, color = "black", size = 3) + coord_flip() +
ggtitle("Top 50 Months in Terms of Property Damage from Severe Weather Events \n from 1950 - 2011") +
ylab("In Billions of Dollars \n (Each Weather Event's Property Damage is reported in each month's bar, smallest first if more than 1 in 1 month.)") +
xlab("")
## Mapping a variable to y and also using stat="bin".
## With stat="bin", it will attempt to set the y value to the count of cases in each group.
## This can result in unexpected behavior and will not be allowed in a future version of ggplot2.
## If you want y to represent counts of cases, use stat="bin" and don't map a variable to y.
## If you want y to represent values in the data, use stat="identity".
## See ?geom_bar for examples. (Deprecated; last used in version 0.9.2)
These data show that we seem to be unprepared for the economic devestation resulting from hurricanes over the last decade, while tornados are by far the largest source of recorded public health harm. Droughts and flooding also have major economic impacts on crops.
[1] Peng, R. https://class.coursera.org/repdata-002/human_grading/view/courses/972084/assessments/4/submissions Last Accessed: 5/24/2014.
[2] http://en.wikipedia.org/wiki/Hurricane_Katrina Last Accessed: 5/24/2014.
[3] http://en.wikipedia.org/wiki/2005_Atlantic_hurricane_season Last Accessed: 5/24/2014.
[4] http://en.wikipedia.org/wiki/April_25%E2%80%9328,_2011_tornado_outbreak Last Accessed: 5/24/2014.
[5] http://en.wikipedia.org/wiki/Hurricane_Sandy Last Accessed: 5/24/2014.
[6] http://education.nationalgeographic.com/education/encyclopedia/storm-surge/?ar_a=1 Last Accessed: 5/24/2014.