Storms have the ability to cause devistation to both human well-being and property. Damage caused by storms can be viewed along two metrics. Cost to human life, and cost of property damage. In this paper, we look at data from the National Weather Service and use it to examine the costs to the US for various weather events.
We can begin by loading the data from our compressed csv. We use the option strip.white=TRUE to elliminate extraneous spaces.
data <- read.csv(bzfile("StormData.csv.bz2"), strip.white = TRUE)
Cleaning the data is challenging with this dataset as there are inconsistencies in labels due to data entry variations.
Before cleaning the data, we have 985 types of events.
Let's start by normalizing both spaces and other special characters, as well enforcing a consistent case.
# Remove extraneous whitespace
data$EVTYPE <- gsub("\\s+", " ", as.character(data$EVTYPE))
data$EVTYPE <- gsub("^\\s+|\\s+$", "", data$EVTYPE)
# Normalize case
data$EVTYPE <- tolower(data$EVTYPE)
After ensure the case is correct, we have 883 types of events.
We also would like to remove small variations such as variations on the word “hurricane”:
unique(data$EVTYPE[grep("hurricane", data$EVTYPE)])
## [1] "hurricane opal/high winds" "hurricane erin"
## [3] "hurricane opal" "hurricane"
## [5] "hurricane-generated swells" "hurricane emily"
## [7] "hurricane gordon" "hurricane felix"
## [9] "hurricane edouard" "hurricane/typhoon"
We can eliminate the variations by replacing all EVTYPE strings containing the word with the word itself.
words_to_search <- c("hurricane", "tornado", "flood", "wind", "heat")
for (word in words_to_search) {
print(paste("Replacing", length(unique(data[grep(word, data$EVTYPE), ]$EVTYPE)),
"variations with", word, "."))
data[grep(word, data$EVTYPE), ]$EVTYPE <- word
}
## [1] "Replacing 10 variations with hurricane ."
## [1] "Replacing 15 variations with tornado ."
## [1] "Replacing 97 variations with flood ."
## [1] "Replacing 200 variations with wind ."
## [1] "Replacing 14 variations with heat ."
Now we must ensure the units and scale are consistent. Scaling is required for both property and crop damage quantities.
scaleValue <- function(value, scaler) {
if (is.na(scaler)) {
return(0)
}
if (tolower(scaler) == "k")
scaleValue <- 1000 * value
if (tolower(scaler) == "m")
scaleValue <- 1e+06 * value
if (tolower(scaler) == "h")
scaleValue <- 100 * value
if (tolower(scaler) == "b")
scaleValue <- 1e+09 * value else scaleValue <- value
}
data$property_damage <- mapply(data$PROPDMG, data$PROPDMGEXP, FUN = scaleValue)
data$crop_damage <- mapply(data$CROPDMG, data$CROPDMGEXP, FUN = scaleValue)
We can then aggregate to find totals for fatalities, injuries, property damage, and crop damage.
agg_data <- aggregate(cbind(FATALITIES, INJURIES, property_damage, crop_damage) ~
EVTYPE, data, FUN = sum)
If an event occured with no injuries, fatalities or property damage it is not interesting for this analysis.
interesting_data <- agg_data[agg_data$property_damage > 0 | agg_data$INJURIES >
0 | agg_data$FATALITIES > 0 | agg_data$crop_damage > 0, ]
top_n <- 10
The cost to population health of storms can be measured with injuries and fatalities.
top_n_fatality <- interesting_data[order(-interesting_data$FATALITIES)[1:top_n],
]
# dotchart(top_n_fatality$FATALITIES, top_n_fatality$EVTYPE)
chart_matrix <- as.matrix(rbind(top_n_fatality$FATALITIES, top_n_fatality$INJURIES))
colnames(chart_matrix) <- top_n_fatality$EVTYPE
rownames(chart_matrix) <- c("fatalities", "injuries")
dotchart(chart_matrix, col = rainbow(2), xlab = "incidents", main = paste("Top",
top_n, "event types in terms of human health costs"))
legend("bottomright", c("fatalities", "injuries"), fill = rainbow(2))
From this, we can see that tornadoes have the highest cost to human health followed by heat and flash flooding.
The economic cost can be measured in the US dollar cost of the damage to property and crops.
top_n_crop <- interesting_data[order(-(interesting_data$property_damage + interesting_data$crop_damage))[1:top_n],
]
chart_matrix <- as.matrix(rbind(top_n_crop$crop_damage, top_n_crop$property_damage))
colnames(chart_matrix) <- top_n_crop$EVTYPE
rownames(chart_matrix) <- c("crop", "property")
dotchart(chart_matrix, col = rainbow(2), xlab = "Damage in USD", main = paste("Top",
top_n, "event types in terms of economic costs"))
legend("bottomright", c("crop damage", "property damage"), fill = rainbow(2))
From this, we can see that floods have the highest economic cost followed by hurricanes and storm surges.
Tornados appear to be the worst on the cost to human lives, and floods have the highest costs in property values. One thing to keep in mind is that both weather events are localized to specific geographic regions. A realistic risk model for any particular location should take into account the location being considered.