This report describes an analysis that sought to generate answers to the questions of: 1) which types of storm events are most harmful to population health in the U.S., and 2) which types of events have the greatest economic consequences in the U.S. To evaluate these questions, storm data were downloaded, loaded into an R data frame, and processed to remove all data from prior to when records began being kept on damages (both monetary and human) associated with all of the 48 currently-official storm types. To simplify the large data frame and thus make it easier to work with, records were removed in which no monetary damages, injuries, or fatalities were recorded. All variables other than event type and damages (of the monetary and health sorts) were removed. A list of the 48 official event types was loaded into a second data frame, and event types with names similar to these were matched to them using a multi-part process. Event-specific dollar damages (property and crop) were calculated, as well as “total” (property plus crop) damages. Damages (property, crop, and total), fatalities, and injuries were then aggregated over each of the 48 event types, and ranked in order from highest to lowest. The ranked lists were truncated to include only the top five event types in each category, then total damages, total fatalities, and total injuries for these top five storm event types were each displayed in barplot form.
# Loading needed packages, data table, and table of official event types
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(stringdist)
## Warning: package 'stringdist' was built under R version 3.5.3
stormdata <- read.csv("repdata_data_StormData.csv")
EventsTable <- read.csv("EventsTable.csv")
Storm data were downloaded from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2, unzipped to file “repdata_data_StormData.csv”, and loaded as data frame “stormdata” using the read.csv command. After stripping out the times from the date field, and converting dates to POSIXct format, the filter function in dplyr was used to create a subset of the data that contained only data for dates after December 31, 1995 (i.e., the approximate date when records began to be kept on damages - human and monetary - from all 48 currently-official storm types). This data subset was stored as data frame “stormdata.latr”. Using the dlpyr function “subset”, a more restricted version of the data frame was then generated by choosing (other than EVTYPE) only the variabes associated with monetary damage, fatalities, and injuries (i.e., FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP). This was saved as data frame “stormdata.wrk”, from which a further subset data frame (“stormdata.hrt”) was created, by using the filter function of dplyr to select entries in which one or more of these entries (other than EVTYPE) was non-zero.
A master event-type file containing the 48 official event type names was generated in Microsoft Excel from the entries in Table 2.1.1 of repdata_peer2.pdf, and saved in csv format. Event names in this file were saved in all lower case format. This file was then loaded into R as a second data frame using read.csv. The EVTYPE entries in stormdata.hrt were converted to all lower case format to facilitate comparison of event names with those in the master list. Several event types in stormdata.hrt that had names similar, but not identical, to names in the master list were renamed to conform with event names in the master list, using the gsub function and direct boolean name-matching and replacement. Partial event name-matching of the modified EVTYPE entries against the event names in the master list was then conducted using the amatch function in the stringdist package, and the matched name saved in a new column (“partmatch”) added to the modified data file.
Entry-specific damages (property and crop) in dollars were calculated, and stored in new columns stormdata.hrt\(PrpDmgAmt and stormdata.hrt\)CrpDmgAmt, respectively. An additional “total” (property plus crop) damages column was created by adding PrpDmgAmt to CrpDmgAmt, and added to the data frame as column stormdata.hrt$TotDmgAmt.
Damage amounts (property, crop, and total), fatalities, and injuries were each aggregated over all entries by event type in field “partmatch” (containing the 48 official event type names), and ranked from largest to smallest. The five resulting rank-ordered data frames were stored with under the names PropTotlDmg, CropTotlDmg, SumTotlDmg, TotalFatal, and TotalInjuries, respectively. In each of these data frames the damage/fatality/injury field was given a suitably descriptive name, e.g. “”PropDamage“. Truncated version of each of these data frames were then generated, containing only the top five storm event types for each damage type. Finally, barplots of total damage, fatalities, and injuries were generated.
stormdata$DATE <- strptime(stormdata$BGN_DATE, format = "%m/%d/%Y %H:%M:%S")
stormdata$DATE <- as.POSIXct(stormdata$DATE)
cutdate <- as.POSIXct("1995-12-31", origin="1970-01-01")
stormdata.latr <- filter(stormdata, DATE > cutdate)
stormdata.wrk <- select(stormdata.latr, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
stormdata.hrt <- filter(stormdata.wrk, FATALITIES>0 | INJURIES>0 | PROPDMG>0 | CROPDMG>0)
# Convert all letters to lower case to make name matching easier
stormdata.hrt$EVTYPE<- sapply(stormdata.hrt$EVTYPE, tolower)
# Fix specific problems
stormdata.hrt <- data.frame(lapply(stormdata.hrt, function(x){gsub("tstm", "thunderstorm", x)}))
stormdata.hrt <- data.frame(lapply(stormdata.hrt, function(x){gsub("freezing rain", "winter weather", x)}))
stormdata.hrt <- data.frame(lapply(stormdata.hrt, function(x){gsub("urban/sml stream fld", "flood", x)}))
stormdata.hrt <- data.frame(lapply(stormdata.hrt, function(x){gsub("heavy surf/high surf", "high surf", x)}))
stormdata.hrt <- data.frame(lapply(stormdata.hrt, function(x){gsub("hurricane/typhoon", "hurricane (typhoon)", x)}))
stormdata.hrt <- data.frame(lapply(stormdata.hrt, function(x){gsub("high wind/hail", "high wind", x)}))
stormdata.hrt <- data.frame(lapply(stormdata.hrt, function(x){gsub("high wind (g40)", "high wind", x)}))
stormdata.hrt <- data.frame(lapply(stormdata.hrt, function(x){gsub("dry microburst", "high wind", x)}))
stormdata.hrt <- data.frame(lapply(stormdata.hrt, function(x){gsub("wet microburst", "high wind", x)}))
stormdata.hrt <- data.frame(lapply(stormdata.hrt, function(x){gsub("strong winds", "high wind", x)}))
stormdata.hrt <- data.frame(lapply(stormdata.hrt, function(x){gsub("gusty wind", "high wind", x)}))
stormdata.hrt <- data.frame(lapply(stormdata.hrt, function(x){gsub("whirlwind", "high wind", x)}))
stormdata.hrt <- data.frame(lapply(stormdata.hrt, function(x){gsub("(g35)", "", x)}))
stormdata.hrt <- data.frame(lapply(stormdata.hrt, function(x){gsub("(g40)", "", x)}))
stormdata.hrt <- data.frame(lapply(stormdata.hrt, function(x){gsub("(g41)", "", x)}))
stormdata.hrt <- data.frame(lapply(stormdata.hrt, function(x){gsub("(g45)", "", x)}))
stormdata.hrt <- data.frame(lapply(stormdata.hrt, function(x){gsub("thunderstorm wind/hail", "thunderstorm wind", x)}))
stormdata.hrt <- data.frame(lapply(stormdata.hrt, function(x){gsub("thunderstorm wind ()", "thunderstorm wind", x)}))
stormdata.hrt <- data.frame(lapply(stormdata.hrt, function(x){gsub("river flood", "flood", x)}))
stormdata.hrt <- data.frame(lapply(stormdata.hrt, function(x){gsub("flooding", "flood", x)}))
stormdata.hrt <- data.frame(lapply(stormdata.hrt, function(x){gsub("rip currents", "rip current", x)}))
stormdata.hrt <- data.frame(lapply(stormdata.hrt, function(x){gsub("wild/forest fire", "wildfire", x)}))
stormdata.hrt <- data.frame(lapply(stormdata.hrt, function(x){gsub("winter weather/mix", "winter weather", x)}))
stormdata.hrt <- data.frame(lapply(stormdata.hrt, function(x){gsub("extreme windchill", "extreme cold/wind chill", x)}))
stormdata.hrt <- data.frame(lapply(stormdata.hrt, function(x){gsub("landspout", "tornado", x)}))
stormdata.hrt <- data.frame(lapply(stormdata.hrt, function(x){gsub("heavy surf", "high surf", x)}))
stormdata.hrt <- data.frame(lapply(stormdata.hrt, function(x){gsub("coastal flood/erosion", "coastal flood", x)}))
stormdata.hrt <- data.frame(lapply(stormdata.hrt, function(x){gsub("coastal flood/erosion", "coastal flood", x)}))
findwind <- stormdata.hrt$EVTYPE == "wind"
stormdata.hrt$EVTYPE[findwind] = "high wind"
findthunwind <- stormdata.hrt$EVTYPE == "thunderstorm wind()"
stormdata.hrt$EVTYPE[findthunwind] = "thunderstorm wind"
findhighwinds <- stormdata.hrt$EVTYPE == "high winds"
stormdata.hrt$EVTYPE[findhighwinds] = "high wind"
findhighwindx <- stormdata.hrt$EVTYPE == "high wind ()"
stormdata.hrt$EVTYPE[findhighwindx] = "high wind"
findhurricane <- stormdata.hrt$EVTYPE == "hurricane"
stormdata.hrt$EVTYPE[findhurricane] = "hurricane (typhoon)"
findhurricaneEd <- stormdata.hrt$EVTYPE == "hurricane (typhoon) edouard"
stormdata.hrt$EVTYPE[findhurricaneEd] = "hurricane (typhoon)"
findcold <- stormdata.hrt$EVTYPE == "cold"
stormdata.hrt$EVTYPE[findcold] = "cold/wind chill"
findextrcold <- stormdata.hrt$EVTYPE == "extreme cold"
stormdata.hrt$EVTYPE[findextrcold] = "extreme cold/wind chill"
findwindchill <- stormdata.hrt$EVTYPE == "wind chill"
stormdata.hrt$EVTYPE[findwindchill] = "extreme cold/wind chill"
findstormsurge <- stormdata.hrt$EVTYPE == "storm surge"
stormdata.hrt$EVTYPE[findstormsurge] = "storm surge/tide"
findflflflood <- stormdata.hrt$EVTYPE == "flash flood/flood"
stormdata.hrt$EVTYPE[findflflflood] = "flood"
findflflashfl <- stormdata.hrt$EVTYPE == "flood/flash/flood"
stormdata.hrt$EVTYPE[findflflashfl] = "flood"
# Match event types against the official list of 48 types
stormdata.hrt$partmatch <- EventsTable$EventName[amatch(stormdata.hrt$EVTYPE, EventsTable$EventName, method = "lcs", useBytes = FALSE, weight = c(d = 0.5, i = 0.5, s = 0.9, t = 0.9), maxDist=3.0)]
# Add columns for property and crop damage in $, and initialize them with zeroes
stormdata.hrt$PrpDmgAmt <- 0
stormdata.hrt$CrpDmgAmt <- 0
# PROPDMG, CRPDMG, FATALITIES and INJURIES are factor variables. Make them numeric instead.
stormdata.hrt$PROPDMG <- as.numeric(stormdata.hrt$PROPDMG)
stormdata.hrt$CROPDMG <- as.numeric(stormdata.hrt$CROPDMG)
stormdata.hrt$FATALITIES <- as.numeric(stormdata.hrt$FATALITIES)
stormdata.hrt$INJURIES <- as.numeric(stormdata.hrt$INJURIES)
# Calculate event property damages
for (i in 1:length(stormdata.hrt$EVTYPE)){
if (stormdata.hrt$PROPDMGEXP[i]=="K"){
stormdata.hrt$PrpDmgAmt <- stormdata.hrt$PROPDMG*1000
} else if (stormdata.hrt$PROPDMGEXP[i]=="M") {
stormdata.hrt$PrpDmgAmt <- stormdata.hrt$PROPDMG*1000000
} else if (stormdata.hrt$PROPDMGEXP[i]=="B") {
stormdata.hrt$PrpDmgAmt <- stormdata.hrt$PROPDMG*1000000000
}
}
# calculate event crop damages
for (i in 1:length(stormdata.hrt$EVTYPE)){
if (stormdata.hrt$CROPDMGEXP[i]=="K"){
stormdata.hrt$CrpDmgAmt <- stormdata.hrt$CROPDMG*1000
} else if (stormdata.hrt$CROPDMGEXP[i]=="M") {
stormdata.hrt$CrpDmgAmt <- stormdata.hrt$CROPDMG*1000000
} else if (stormdata.hrt$CROPDMGEXP[i]=="B") {
stormdata.hrt$CrpDmgAmt <- stormdata.hrt$CROPDMG*1000000000
}
}
# Add a column with total damage $ (property + crop)
stormdata.hrt$TotDmgAmt <- stormdata.hrt$PrpDmgAmt + stormdata.hrt$CrpDmgAmt
# Summarize the 5 damage types by partmatch
PropTotlDmg <- aggregate(stormdata.hrt$PrpDmgAmt, by=list(Category=stormdata.hrt$partmatch), FUN=sum)
CropTotlDmg <- aggregate(stormdata.hrt$CrpDmgAmt, by=list(Category=stormdata.hrt$partmatch), FUN=sum)
SumTotlDmg <- aggregate(stormdata.hrt$TotDmgAmt, by=list(Category=stormdata.hrt$partmatch), FUN=sum)
TotalFatal <- aggregate(stormdata.hrt$FATALITIES, by=list(Category=stormdata.hrt$partmatch), FUN=sum)
TotalInjuries <- aggregate(stormdata.hrt$INJURIES, by=list(Category=stormdata.hrt$partmatch), FUN=sum)
# Change $ to B$ to make plots easier
PropTotlDmg$x <- PropTotlDmg$x/10^9
CropTotlDmg$x <- CropTotlDmg$x/10^9
SumTotlDmg$x <- SumTotlDmg$x/10^9
# Order the 3 lists of events by $ amount (or number killed/injured), from highest to lowest
PropTotlDmg <- PropTotlDmg[order(-PropTotlDmg$x),]
CropTotlDmg <- CropTotlDmg[order(-CropTotlDmg$x),]
SumTotlDmg <- SumTotlDmg[order(-SumTotlDmg$x),]
TotalFatal <- TotalFatal[order(-TotalFatal$x),]
TotalInjuries <- TotalInjuries[order(-TotalInjuries$x),]
colnames(PropTotlDmg)[colnames(PropTotlDmg)=="x"] <- "PropDamage"
colnames(CropTotlDmg)[colnames(CropTotlDmg)=="x"] <- "CropDamage"
colnames(SumTotlDmg)[colnames(SumTotlDmg)=="x"] <- "SumDamage"
colnames(TotalFatal)[colnames(TotalFatal)=="x"] <- "Fatalities"
colnames(TotalInjuries)[colnames(TotalInjuries)=="x"] <- "Injuries"
print(Top5PropDmg <- PropTotlDmg[1:5,])
## Category PropDamage
## 37 thunderstorm wind 51.756704
## 13 flash flood 10.200296
## 18 hail 9.846887
## 38 tornado 6.950945
## 14 flood 5.591033
print(Top5CropDmg <- CropTotlDmg[1:5,])
## Category CropDamage
## 18 hail 1.501893
## 37 thunderstorm wind 0.852257
## 13 flash flood 0.340231
## 14 flood 0.310636
## 38 tornado 0.210644
print(Top5SumDmg <- SumTotlDmg[1:5,])
## Category SumDamage
## 37 thunderstorm wind 52.608961
## 18 hail 11.348780
## 13 flash flood 10.540527
## 38 tornado 7.161589
## 14 flood 5.901669
print(Top5Fatal <- TotalFatal[1:5,])
## Category Fatalities
## 37 thunderstorm wind 106230
## 18 hail 22686
## 13 flash flood 22084
## 38 tornado 16638
## 28 lightning 12241
print(Top5Injuries <- TotalInjuries[1:5,])
## Category Injuries
## 37 thunderstorm wind 154816
## 38 tornado 89144
## 28 lightning 49864
## 13 flash flood 27827
## 18 hail 26827
As seen in the above five lists, Thunderstorm Wind came in first in terms of property damage, total damage, fatalities and injuries. Crop damage was the only category in which Thunderstorm Wind did not come in first, and in that category Thunderstorm Wind came in second. When added to property damage to create total damage, Thunderstorm Wind was number one, indicating that thunderstorm winds caused more total monetary damage, fatalities, and injuries than any other storm event category.
The lists of top five damage causing events were remarkably similar among the five damage categories, with four event types - Thunderstorm Wind, Hail, Tornado, and Flash Flood - appearing in all of them. In addition to these four, the three monetary damage lists included Flood, while the fatality and injury lists both included Lightning.
Magnitudes of the total damage, fatality, and injury tallies are displayed in below graphically, in barplot form:
#Figure 1
yvec <- Top5SumDmg$SumDamage
names(yvec) <- Top5SumDmg$Category
plot1 <- barplot(yvec, col=rgb(0.3,0.4,0.3,0.7), xlab="Event Type", cex.xlab = 0.6, ylim=c(0,1.2*max(yvec)), ylab="Total Damages ($B)", cex.names=0.6, main="Property + Crop Damages in Billions of Dollars \n from Top 5 Storm Event Types After 1995", cex.main=0.8)
## Warning in plot.window(xlim, ylim, log = log, ...): "cex.xlab" is not a
## graphical parameter
## Warning in axis(if (horiz) 2 else 1, at = at.l, labels = names.arg, lty =
## axis.lty, : "cex.xlab" is not a graphical parameter
## Warning in title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...):
## "cex.xlab" is not a graphical parameter
## Warning in axis(if (horiz) 1 else 2, cex.axis = cex.axis, ...): "cex.xlab"
## is not a graphical parameter
abline(h=0)
text(x=plot1,y=c(yvec[1:5]),labels=round(c(yvec[1:5]),1),pos=3)
# Figure 2
yvec <- Top5Fatal$Fatalities
names(yvec) <- Top5Fatal$Category
plot2 <- barplot(yvec, col=rgb(0.6,0.3,0.2,0.2), yaxt="n", xlab="Event Type", cex.xlab = 0.6,ylim=c(0,1.2*max(yvec)),ylab="Total Fatalities", cex.names=0.6, main="Total Fatalities from Top 5 Storm Event Types \n After 1995", cex.main=0.8)
## Warning in plot.window(xlim, ylim, log = log, ...): "cex.xlab" is not a
## graphical parameter
## Warning in axis(if (horiz) 2 else 1, at = at.l, labels = names.arg, lty =
## axis.lty, : "cex.xlab" is not a graphical parameter
## Warning in title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...):
## "cex.xlab" is not a graphical parameter
## Warning in axis(if (horiz) 1 else 2, cex.axis = cex.axis, ...): "cex.xlab"
## is not a graphical parameter
axis(2, axTicks(2), format(axTicks(2), scientific = F))
abline(h=0)
text(x=plot2,y=c(yvec[1:5]),labels=c(yvec[1:5]),pos=3)
# Figure 3
yvec <- Top5Injuries$Injuries
names(yvec) <- Top5Injuries$Category
plot3 <- barplot(yvec, col=rgb(0.1,0.8,0.2,0.4), xlab="Event Type", cex.xlab = 0.4,ylim=c(0, 1.2*max(yvec)),ylab="Total Injuries", cex.names=0.6, main="Total Injuries from Top 5 Storm Event Types \n After 1995", cex.main=0.8)
## Warning in plot.window(xlim, ylim, log = log, ...): "cex.xlab" is not a
## graphical parameter
## Warning in axis(if (horiz) 2 else 1, at = at.l, labels = names.arg, lty =
## axis.lty, : "cex.xlab" is not a graphical parameter
## Warning in title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...):
## "cex.xlab" is not a graphical parameter
## Warning in axis(if (horiz) 1 else 2, cex.axis = cex.axis, ...): "cex.xlab"
## is not a graphical parameter
abline(h=0)
text(x=plot3,y=c(yvec[1:5]),labels=c(yvec[1:5]),pos=3)