Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
The information below provides an analysis of data from the US National Oceanic and Atmospheric Administration (NOAA) storm database. The analysis considers injuries, fatalities, and monetary damage (to crops and property) of various types of weather events.
First we will load the storm data.
##load the data
storm <- read.csv("repdata_data_StormData.csv.bz2")
Next we will do some processing to normalize the data. First, we need to clean up the way the monetary damages are organized.
##create a function to convert abbreviations for numbers to actual numbers
conversion <- as.data.frame(cbind(key = c("", "-", "?", "+", "0", "1", "2", "3", "4", "5", "6", "7", "8", "H", "h", "K", "k", "M", "m", "B", "b"), value = c(1, 0, 0, 0, 1, 10, 100, 1000, 10000, 1e+05, 1e+06, 1e+07, 1e+08, 100, 100, 1000, 1000, 1e+06, 1e+06, 1e+09, 1e+09)))
convert <- function(x)
{
x <- factor(x)
levels(x) <- conversion[conversion$key %in% levels(x), "value"]
x <- as.numeric(paste(x))
return(x)
}
##convert property damage and crop damage expense variables to the numeric values
storm$CROPDMGEXP <- convert(storm$CROPDMGEXP)
storm$PROPDMGEXP <- convert(storm$PROPDMGEXP)
##CROPDMGEXP and PROPDMGEXP are multipliers that modify the CROPDMG and PROPDMG variables, so multiply the two together to get the total for property and crop damage
storm$CROPDMGtotal <- storm$CROPDMG * storm$CROPDMGEXP
storm$PROPDMGtotal <- storm$PROPDMG * storm$PROPDMGEXP
##create a single variable to capture the total monetary damage
storm$DMGTOTAL <- storm$CROPDMGtotal + storm$PROPDMGtotal
Now we need to work on the event type variable to organize the data.
##some of the events in the EVTYPE variable are lowercase, some uppercase, and some a mix. To make sure all events get counted in the right category, convert everything to lower case
storm$EVTYPE <- tolower(storm$EVTYPE)
##many similar events have different names, so to make for easier analysis, the events names will be cleaned up
storm$EVTYPE[grep("heat|drought|dry|fire|hot|warm|smoke|high", as.character(storm$EVTYPE), ignore.case = TRUE)] <- "heat/fire/drought"
storm$EVTYPE[grep("avalan|blizzard|chill|cold|cool|glaze|hypothermia|ice|icy|freez|frost|low temp|snow|wint|low", as.character(storm$EVTYPE), ignore.case = TRUE)]<- "cold/ice"
storm$EVTYPE[grep("fog|vog", as.character(storm$EVTYPE), ignore.case = TRUE)] <- "fog"
storm$EVTYPE[grep("coast|cstl|current|dam|drizzle|drown|eros|flood|floood|fld|shower|water|wave|lake|landslump|marine|precip|rain|river|stream|sea|surf|swell|tide|tidal|torrent|wet", as.character(storm$EVTYPE), ignore.case = TRUE)] <- "rain/flood/water"
storm$EVTYPE[grep("burst|cloud|depression|funnel|gust|hail|hurricane|landspout|storm|thunder|tornado|torndao|tstm|turbulence|typhoon|waterspout|wind|wnd|sleet", as.character(storm$EVTYPE), ignore.case = TRUE)] <- "storm/tornado/wind"
storm$EVTYPE[grep("light", as.character(storm$EVTYPE), ignore.case = TRUE)] <- "lightning"
storm$EVTYPE[grep("tsunami|volcan|slide", as.character(storm$EVTYPE), ignore.case = TRUE)]<- "seismic/landslide"
storm$EVTYPE[grep("dust", as.character(storm$EVTYPE), ignore.case = TRUE)]<- "dust"
Because this data set is large, we will create a new, smaller dataset that just includes the things we’re interested in.
##only include the variables of interest
vars <- names(storm) %in% c("EVTYPE", "FATALITIES", "INJURIES", "DMGTOTAL")
newstorm <- storm[vars]
##create a subset of the data excluding other events that don't fall into one of these grouped categories
newstorm2 <- newstorm[which(newstorm$EVTYPE == "heat/fire/drought"| newstorm$EVTYPE == "cold/ice"| newstorm$EVTYPE == "fog"| newstorm$EVTYPE == "rain/flood/water"|newstorm$EVTYPE == "storm/tornado/wind"|newstorm$EVTYPE == "lightning"|newstorm$EVTYPE == "seismic/landslide"| newstorm$EVTYPE == "dust"), ]
library(data.table)
##convert the data to a data table
storm.table <- data.table(newstorm2)
##aggregate sums for injuries, fatalities, and damage for each event type
aggstorm <- as.data.frame(storm.table[, j = list(injuries = sum(INJURIES, na.rm = TRUE), deaths = sum(FATALITIES, na.rm = TRUE), damage = sum(DMGTOTAL, na.rm = TRUE)), by = EVTYPE])
##create easier to read names for the x-axis
labels <- c("storms and wind", "cold and ice", "heat, fire, and drought", "rain and floods", "lightning", "fog", "dust", "seismic and landslide")
##function to wrap text in the x axis so it all fits
wrap.it <- function(x, len)
{
sapply(x, function(y) paste(strwrap(y, len),
collapse = "\n"),
USE.NAMES = FALSE)
}
# Call this function with a list or vector
wrap.labels <- function(x, len)
{
if (is.list(x))
{
lapply(x, wrap.it, len)
} else {
wrap.it(x, len)
}
}
wr.lap <- wrap.labels(labels, 10)
##plot each category of harm
barplot(aggstorm$injuries, names.arg = wr.lap, horiz = TRUE, xlab = "Total Injuries", las = 2, cex.names = 0.7, main = "Injuries by Weather Event")
barplot(aggstorm$deaths, names.arg = wr.lap, horiz = TRUE, xlab = "Total Fatalities", las = 2, cex.names = 0.7, main = "Fatalities by Weather Event")
barplot(aggstorm$damage, names.arg = wr.lap, horiz = TRUE, xlab = "Total Damage in US Dollars", las = 2, cex.names = 0.7, main = "Crop and Property Damage by Weather Event")
Based on this analysis, it is clear that storms and wind events are by far the most damaging in terms of fatalities, injuries, and damage to crops and property.