Synopsis

This document looks at the effects of weather events on public health and economic wellbeing. Not only are there interesting differences between (i) injuries versus fatalities, and (ii) property versus crop damages, but (iii) the significance of a given type of weather event changes depending on whether you are thinking in terms of the overall, cummulative effects of a given type of weather event versus the average, individual effect of the tyical instantiation of a given type of weather event. For example, tornados kill more people than do tsunamis in the long run, but the average tsunami is deadlier than the average tornado.

Note that these findings are driven in part by decisions that the analysis made in grouping otherwise separate categories of (e.g., associating “glaze”, which is a specific type of icing event, with the broader category of ice-related weather events), and they are subject to change given different grouping decisions.

Data Processing

We unzip the dataset and read it into R as a dataframe.

dataFull <- read.table("repdata%2Fdata%2FStormData.csv.bz2",
                       header = TRUE, sep = ",")

For help interpreting the data, you can visit the National Weather Service Storm Data Documentation and the National Climatic Data Center Storm Events FAQ.

In our analysis, we are only interested in determining the effects of different weather events on (i) population health and (ii) economic indicators. As such, we will simplify things by isolating relevant variables into more manageable dataframe for easier manipulation.

Additionally, NOAA only started recording data for all weather event types at the start of 1996, as documented here. As such, we can further simply our dataframe by including only those observation recorded in and after 1996. Before doing so, we convert the “beginning date” variable to be in date class.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
data <- select(dataFull,
               EVTYPE, BGN_DATE, FATALITIES, INJURIES,
               PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

data$BGN_DATE <- as.Date(data$BGN_DATE, format = "%m/%d/%Y")
data <- filter(data, BGN_DATE >= "1996-01-01")

We have to clean a few variables. The “exponent” varables associated with the economic costs of weather events are full of troublesome values, as described in this document. We convert all of those values to characters, replace those characters ith the correct numeric value (e.g., ‘B’ and ‘b’ stand for ‘billions’ and will have an value of 9), and convert those variables to be in numeric class.

The “event type” also needs a great deal of cleaning. To do so, we convert all the stings in this variable to uppercase, remove leading and trailing whitespace, and attempt to correct the most common mispellings (e.g., “strom” to “storm”), and collect closely related observations (e.g., “heavy snow” versus “snow accumulation”). Note that, due to overlapping groups, the order in which you make these changes will effect the final classifications of observation event types (e.g., string marine winds may be classified with winds, generally). Depending on your purposes, you may want to group sub-categories of weather event types differently.

#Clean EXPONENTS

table(data$PROPDMGEXP)          # Table of PROP. EXPONENT values before cleaning
## 
##             -      ?      +      0      1      2      3      4      5 
## 276185      0      0      0      1      0      0      0      0      0 
##      6      7      8      B      h      H      K      m      M 
##      0      0      0     32      0      0 369938      0   7374
table(data$CROPDMGEXP)          # Table of CROP EXPONENT values before cleaning
## 
##             ?      0      2      B      k      K      m      M 
## 373069      0      0      0      4      0 278686      0   1771
data$PROPDMGEXP <- as.character(data$PROPDMGEXP)
data$CROPDMGEXP <- as.character(data$CROPDMGEXP)

data$PROPDMGEXP[data$PROPDMGEXP == 'B' | data$PROPDMGEXP == 'b'] <- '9'
data$PROPDMGEXP[data$PROPDMGEXP == 'M' | data$PROPDMGEXP == 'm'] <- '6'
data$PROPDMGEXP[data$PROPDMGEXP == 'K' | data$PROPDMGEXP == 'k'] <- '3'
data$PROPDMGEXP[data$PROPDMGEXP == 'H' | data$PROPDMGEXP == 'h'] <- '2'
data$PROPDMGEXP[data$PROPDMGEXP == '+'] <- '1'
data$PROPDMGEXP[data$PROPDMGEXP == '?' | data$PROPDMGEXP == '-' |
                  data$PROPDMGEXP == ''] <- '0'

data$CROPDMGEXP[data$CROPDMGEXP == 'B' | data$CROPDMGEXP == 'b'] <- '9'
data$CROPDMGEXP[data$CROPDMGEXP == 'M' | data$CROPDMGEXP == 'm'] <- '6'
data$CROPDMGEXP[data$CROPDMGEXP == 'K' | data$CROPDMGEXP == 'k'] <- '3'
data$CROPDMGEXP[data$CROPDMGEXP == 'H' | data$CROPDMGEXP == 'h'] <- '2'
data$CROPDMGEXP[data$CROPDMGEXP == '+'] <- '1'
data$CROPDMGEXP[data$CROPDMGEXP == '?' | data$CROPDMGEXP == '-' |
                  data$CROPDMGEXP == ''] <- '0'

data$PROPDMGEXP <- as.numeric(data$PROPDMGEXP)
data$CROPDMGEXP <- as.numeric(data$CROPDMGEXP)

table(data$PROPDMGEXP)          # Table of PROP. EXPONENT values after cleaning
## 
##      0      3      6      9 
## 276186 369938   7374     32
table(data$CROPDMGEXP)          # Table of CROP EXPONENT values after cleaning
## 
##      0      3      6      9 
## 373069 278686   1771      4
# Clean EVENT TYPES

length(unique(data$EVTYPE))     # Number of unique event types before cleaning
## [1] 516
data$EVTYPE <- toupper(data$EVTYPE)

trim <- function (x) gsub("^\\s+|\\s+$", "", x)
data$EVTYPE <- trim(data$EVTYPE)

data$EVTYPE[grepl("DUST", data$EVTYPE)] <- "DUST"
data$EVTYPE[grepl("HEAT", data$EVTYPE)] <- "HEAT"
data$EVTYPE[grepl("HOT", data$EVTYPE)] <- "HEAT"
data$EVTYPE[grepl("WARM", data$EVTYPE)] <- "HEAT"
data$EVTYPE[grepl("WIND", data$EVTYPE)] <- "WIND"
data$EVTYPE[grepl("WND", data$EVTYPE)] <- "WIND"
data$EVTYPE[grepl("ICE", data$EVTYPE)] <- "ICE"
data$EVTYPE[grepl("ICY", data$EVTYPE)] <- "ICE"
data$EVTYPE[grepl("GLAZE", data$EVTYPE)] <- "ICE"
data$EVTYPE[grepl("HAIL", data$EVTYPE)] <- "HAIL"
data$EVTYPE[grepl("TORNADO", data$EVTYPE)] <- "TORNADO"
data$EVTYPE[grepl("SLEET", data$EVTYPE)] <- "SLEET"
data$EVTYPE[grepl("WINTRY", data$EVTYPE)] <- "SLEET"
data$EVTYPE[grepl("MIX", data$EVTYPE)] <- "SLEET"
data$EVTYPE[grepl("FOG", data$EVTYPE)] <- "FOG"
data$EVTYPE[grepl("TIDE", data$EVTYPE)] <- "TIDE"
data$EVTYPE[grepl("FLOOD", data$EVTYPE)] <- "FLOOD"
data$EVTYPE[grepl("VOLC", data$EVTYPE)] <- "VOLCANO"
data$EVTYPE[grepl("CHILL", data$EVTYPE)] <- "COLD"
data$EVTYPE[grepl("COLD", data$EVTYPE)] <- "COLD"
data$EVTYPE[grepl("HYPO", data$EVTYPE)] <- "COLD"
data$EVTYPE[grepl("FROST", data$EVTYPE)] <- "COLD"
data$EVTYPE[grepl("FREEZ", data$EVTYPE)] <- "COLD"
data$EVTYPE[grepl("LOW TEMP", data$EVTYPE)] <- "COLD"
data$EVTYPE[grepl("SURF", data$EVTYPE)] <- "SURF"
data$EVTYPE[grepl("DROUGHT", data$EVTYPE)] <- "DROUGHT"
data$EVTYPE[grepl("LOW RAINFALL", data$EVTYPE)] <- "DROUGHT"
data$EVTYPE[grepl("RAIN", data$EVTYPE)] <- "RAIN"
data$EVTYPE[grepl("SNOW", data$EVTYPE)] <- "SNOW"
data$EVTYPE[grepl("DRY", data$EVTYPE)] <- "DRYNESS"
data$EVTYPE[grepl("HURRIC", data$EVTYPE)] <- "HURRICANE"
data$EVTYPE[grepl("TYPHOO", data$EVTYPE)] <- "HURRICANE"

length(unique(data$EVTYPE))     # Number of unique event types after cleaning
## [1] 171

We will then add variables to our dataframe that show the meanal casualties (i.e., fatalities plus injuries), as well as the total property and crop damage caused by weather events.

data <- mutate(data, CASUALTIES = FATALITIES + INJURIES)
data <- mutate(data, PROPDMG_TOT = PROPDMG * 10^PROPDMGEXP)
data <- mutate(data, CROPDMG_TOT = CROPDMG * 10^CROPDMGEXP)
data <- mutate(data, COST_TOT = PROPDMG_TOT + CROPDMG_TOT)

Now that we have read in, cleaned, and simplified the data for easy manipulation, we can start our analysis.

Results

There are many ways to get an idea of the health and economic effects of weather events. We can look at the total casualities and costs over a given time period to determine the overall significance of a specific type of weather event. Alternatively, we can look at the mean casualties and costs over a given time period to determine the significance of the typical instantiation of a weather event.

We start by looking at cumulative total effects weather events on public health and economic wellbeing.

library(ggplot2)
library(cowplot)
## Warning: package 'cowplot' was built under R version 3.3.3
## 
## Attaching package: 'cowplot'
## The following object is masked from 'package:ggplot2':
## 
##     ggsave
dataCasual <- data %>%
  group_by(EVTYPE) %>%
  summarize(casualties_tot = sum(CASUALTIES)) %>%
  filter(rank(desc(casualties_tot)) <= 5) %>%
  arrange(desc(casualties_tot))

dataInjur <- data %>%
  group_by(EVTYPE) %>%
  summarize(injuries_tot = sum(INJURIES)) %>%
  filter(rank(desc(injuries_tot)) <= 5) %>%
  arrange(desc(injuries_tot))

dataFatal <- data %>%
  group_by(EVTYPE) %>%
  summarize(fatalities_tot = sum(FATALITIES)) %>%
  filter(rank(desc(fatalities_tot)) <= 5) %>%
  arrange(desc(fatalities_tot))

dataCost <- data %>%
  group_by(EVTYPE) %>%
  summarize(cost_tot = sum(COST_TOT)) %>%
  filter(rank(desc(cost_tot)) <= 5) %>%
  arrange(desc(cost_tot))

dataProp <- data %>%
  group_by(EVTYPE) %>%
  summarize(prop_tot = sum(PROPDMG_TOT)) %>%
  filter(rank(desc(prop_tot)) <= 5) %>%
  arrange(desc(prop_tot))

dataCrop <- data %>%
  group_by(EVTYPE) %>%
  summarize(crop_tot = sum(CROPDMG_TOT)) %>%
  filter(rank(desc(crop_tot)) <= 5) %>%
  arrange(desc(crop_tot))


plotCasual <- dataCasual %>%
  ggplot(aes(reorder(EVTYPE, -casualties_tot), casualties_tot)) + 
  geom_bar(stat="identity", fill = "dodgerblue4") +
  theme(axis.text.x = element_text(size = 7, angle = 45, vjust = 0.5),
        axis.text.y = element_text(size = 7), 
        axis.title.y = element_text(size = 7), 
        axis.title.x = element_text(size = 7),
        plot.title = element_text(size=9)) +
  labs(x = "Weather event type", y = "Total casualties", 
       title = "Total CASUALTIES, 1996-2011")

plotFatal <- dataFatal %>%
  ggplot(aes(reorder(EVTYPE, -fatalities_tot), fatalities_tot)) + 
  geom_bar(stat="identity", fill = "dodgerblue3") +
  theme(axis.text.x = element_text(size = 7, angle = 45, vjust = 0.5),
        axis.text.y = element_text(size = 7), 
        axis.title.y = element_text(size = 7), 
        axis.title.x = element_text(size = 7),
        plot.title = element_text(size=9)) +
  labs(x = "Weather event type", y = "Total fatalities", 
       title = "Total FATALITIES, 1996-2011")

plotInjur <- dataInjur %>%
  ggplot(aes(reorder(EVTYPE, -injuries_tot), injuries_tot)) + 
  geom_bar(stat="identity", fill = "dodgerblue") +
  theme(axis.text.x = element_text(size = 7, angle = 45, vjust = 0.5),
        axis.text.y = element_text(size = 7), 
        axis.title.y = element_text(size = 7), 
        axis.title.x = element_text(size = 7),
        plot.title = element_text(size=9)) +
  labs(x = "Weather event type", y = "Total injuries", 
       title = "Total INJURIES, 1996-2011")

plotCost <- dataCost %>%
  ggplot(aes(reorder(EVTYPE, -cost_tot), cost_tot)) + 
  geom_bar(stat="identity", fill = "springgreen4") +
  theme(axis.text.x = element_text(size = 7, angle = 45, vjust = 0.5),
        axis.text.y = element_text(size = 7), 
        axis.title.y = element_text(size = 7), 
        axis.title.x = element_text(size = 7),
        plot.title = element_text(size=9)) +
  labs(x = "Weather event type", y = "Total damages", 
       title = "Total DAMAGES, 1996-2011")

plotProp <- dataProp %>%
  ggplot(aes(reorder(EVTYPE, -prop_tot), prop_tot)) + 
  geom_bar(stat="identity", fill = "springgreen2") +
  theme(axis.text.x = element_text(size = 7, angle = 45, vjust = 0.5),
        axis.text.y = element_text(size = 7), 
        axis.title.y = element_text(size = 7), 
        axis.title.x = element_text(size = 7),
        plot.title = element_text(size=9)) +
  labs(x = "Weather event type", y = "Total prop. damages", 
       title = "Total PROP. damages, 1996-2011")

plotCrop <- dataCrop %>%
  ggplot(aes(reorder(EVTYPE, -crop_tot), crop_tot)) + 
  geom_bar(stat="identity", fill = "springgreen") +
  theme(axis.text.x = element_text(size = 7, angle = 45, vjust=0.5, hjust=0.5),
        axis.text.y = element_text(size = 7), 
        axis.title.y = element_text(size = 7), 
        axis.title.x = element_text(size = 7),
        plot.title = element_text(size=9)) +
  labs(x = "Weather event type", y = "Total crop damages", 
       title = "Total CROP damages, 1996-2011")


plot_grid(plotCasual, plotFatal, plotInjur, 
          plotCost, plotProp, plotCrop, ncol = 3)

Looking at health totals for all years between 1996-2011, we see that tornados have cause by far the most casualties (i.e., injuries and fatalities) of any weather event. Heat-related events, however, are responsible for more overall fatalities, so most of the casualties associated with tornados are injuries.

Looking at economic totals for all years between 1996-2011, we see that floods cause by far the most overall damages, but they come in a distant second place to drought when looking specifically at crop damages. Drought, however, does not cause much damage to property, and so does not break into the top five weather related causes for overall damages.

Next we look at the mean effects weather events on public health and economic wellbeing.

dataCasualM <- data %>%
  group_by(EVTYPE) %>%
  summarize(casualties_mean = mean(CASUALTIES)) %>%
  filter(rank(desc(casualties_mean)) <= 5) %>%
  arrange(desc(casualties_mean))

dataInjurM <- data %>%
  group_by(EVTYPE) %>%
  summarize(injuries_mean = mean(INJURIES)) %>%
  filter(rank(desc(injuries_mean)) <= 5) %>%
  arrange(desc(injuries_mean))

dataFatalM <- data %>%
  group_by(EVTYPE) %>%
  summarize(fatalities_mean = mean(FATALITIES)) %>%
  filter(rank(desc(fatalities_mean)) <= 5) %>%
  arrange(desc(fatalities_mean))

dataCostM <- data %>%
  group_by(EVTYPE) %>%
  summarize(cost_mean = mean(COST_TOT)) %>%
  filter(rank(desc(cost_mean)) <= 5) %>%
  arrange(desc(cost_mean))

dataPropM <- data %>%
  group_by(EVTYPE) %>%
  summarize(prop_mean = mean(PROPDMG_TOT)) %>%
  filter(rank(desc(prop_mean)) <= 5) %>%
  arrange(desc(prop_mean))

dataCropM <- data %>%
  group_by(EVTYPE) %>%
  summarize(crop_mean = mean(CROPDMG_TOT)) %>%
  filter(rank(desc(crop_mean)) <= 5) %>%
  arrange(desc(crop_mean))


plotCasualM <- dataCasualM %>%
  ggplot(aes(reorder(EVTYPE, -casualties_mean), casualties_mean)) + 
  geom_bar(stat="identity", fill = "dodgerblue4") +
  theme(axis.text.x = element_text(size = 7, angle = 45, vjust = 0.5),
        axis.text.y = element_text(size = 7), 
        axis.title.y = element_text(size = 7), 
        axis.title.x = element_text(size = 7),
        plot.title = element_text(size=9)) +
  labs(x = "Weather event type", y = "Mean casualties", 
       title = "Mean CASUALTIES, 1996-2011")

plotFatalM <- dataFatalM %>%
  ggplot(aes(reorder(EVTYPE, -fatalities_mean), fatalities_mean)) + 
  geom_bar(stat="identity", fill = "dodgerblue3") +
  theme(axis.text.x = element_text(size = 7, angle = 45, vjust = 0.5),
        axis.text.y = element_text(size = 7), 
        axis.title.y = element_text(size = 7), 
        axis.title.x = element_text(size = 7),
        plot.title = element_text(size=9)) +
  labs(x = "Weather event type", y = "Mean fatalities", 
       title = "Mean FATALITIES, 1996-2011")

plotInjurM <- dataInjurM %>%
  ggplot(aes(reorder(EVTYPE, -injuries_mean), injuries_mean)) + 
  geom_bar(stat="identity", fill = "dodgerblue") +
  theme(axis.text.x = element_text(size = 7, angle = 45, vjust = 0.5),
        axis.text.y = element_text(size = 7), 
        axis.title.y = element_text(size = 7), 
        axis.title.x = element_text(size = 7),
        plot.title = element_text(size=9)) +
  labs(x = "Weather event type", y = "Mean injuries", 
       title = "Mean INJURIES, 1996-2011")

plotCostM <- dataCostM %>%
  ggplot(aes(reorder(EVTYPE, -cost_mean), cost_mean)) + 
  geom_bar(stat="identity", fill = "springgreen4") +
  theme(axis.text.x = element_text(size = 7, angle = 45, vjust = 0.5),
        axis.text.y = element_text(size = 7), 
        axis.title.y = element_text(size = 7), 
        axis.title.x = element_text(size = 7),
        plot.title = element_text(size=9)) +
  labs(x = "Weather event type", y = "Mean damages", 
       title = "Mean DAMAGES, 1996-2011")

plotPropM <- dataPropM %>%
  ggplot(aes(reorder(EVTYPE, -prop_mean), prop_mean)) + 
  geom_bar(stat="identity", fill = "springgreen2") +
  theme(axis.text.x = element_text(size = 7, angle = 45, vjust = 0.5),
        axis.text.y = element_text(size = 7), 
        axis.title.y = element_text(size = 7), 
        axis.title.x = element_text(size = 7),
        plot.title = element_text(size=9)) +
  labs(x = "Weather event type", y = "Mean prop. damages", 
       title = "Mean PROP. damages, 1996-2011")

plotCropM <- dataCropM %>%
  ggplot(aes(reorder(EVTYPE, -crop_mean), crop_mean)) + 
  geom_bar(stat="identity", fill = "springgreen") +
  theme(axis.text.x = element_text(size = 7, angle = 45, vjust = 0.5),
        axis.text.y = element_text(size = 7), 
        axis.title.y = element_text(size = 7), 
        axis.title.x = element_text(size = 7),
        plot.title = element_text(size=9)) +
  labs(x = "Weather event type", y = "Mean crop damages", 
       title = "Mean CROP damages, 1996-2011")

  
plot_grid(plotCasualM, plotFatalM, plotInjurM, 
          plotCostM, plotPropM, plotCropM, ncol = 3)

Looking at health means for all years between 1996-2011, we see that—even though tornados have cause by far the most casualties of any weather event, in total—the average tsunami, hurrican, marine accident, etc. are more likely to cause a greater number of casualties than the average tornado. This pattern hold when looking at fatalities and injuries, specifically, and casualities overall.

A similar phenomenon holds when examining the mean economic impact of weather events for all years between 1996-2011. While floods cause by far the most overall cumulative damages during that time, the average hurricane, storm surge, tropical storm, etc. were more likely to cause more damage than the average flood. When focusing specifically on crops, the average hurricane remains the most damaging weather event, although the average drought comes in second (but a distant second).