The following analysis examines different weather event types and their effects on population health and economic damage. There were over 900 weather event types initially reported in this data - this report focuses only on the top 15, since these comprise over 90% of all recorded weather events. The outcome of interest for population health was a composite outcome of weighted fatalities and injuries, with fatalities arbitrarily weighted 10 times as much as injuries. The outcome of interest for economic damage was a composite outcome of property damage and crop damage summed together.
After completing the analysis, we see that tornados are associated with the highest negative population health outcome, whereas floods are clearly associated with the highest amount of economic damage for an event type.
The raw data is downloaded from the web address below and saved to the working directory:
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "weatherdata.csv")
df <- read.csv("weatherdata.csv")
uniqueevents <- length(unique(df$EVTYPE))
#Group by event type and calculate the total count of each, arrange in desc order
dftemp <- df %>% group_by(EVTYPE) %>% summarize(count = n()) %>% arrange(desc(count))
totaltop15 <- sum(dftemp[1:15, ]$count)
percenttop15 <- totaltop15 *100 / nrow(df)
There are 902,297 observations. On further inspection of the dataset, there are 985 different event types that are logged in the dataset. The figure below shows a barplot with all event types arranged in descending order by their frequency. While there are a large number of different event types, the overwhelming majority of events are composed of a small subset of event types - in fact, the top 15 most frequent event types account for approximately 93.89% of all event types in the dataset. As a result, this analysis will focus only on the 15 most frequent weather events, since they comprise over 90% of the data.
#Filter count > 100 to make axis readable
ggplot(data = dftemp %>% filter(count > 100) , aes(x = reorder(EVTYPE, -count), y = count)) +
geom_bar(stat = "identity") +
xlab("Weather Event Types") +
ylab("Frequency of Event") +
labs(title = "Frequency of Different Weather Event Types") +
theme_minimal() +
theme(axis.ticks.x = element_blank(), axis.text.x = element_blank(),
panel.background = element_rect(fill = "white"))
#Save the names of the EVTYPES into a vector
top15ev <- dftemp[1:15, ]$EVTYPE
#Filter for EVTYPE in that vector
df <- df %>% filter(EVTYPE %in% top15ev)
meanfat <- mean(df$FATALITIES, na.rm = TRUE)
medianfat <- median(df$FATALITIES, na.rm = TRUE)
meaninj <- mean(df$INJURIES, na.rm = TRUE)
medianinj <- median(df$INJURIES, na.rm = TRUE)
dftemp <- df %>% filter(FATALITIES > 1 | INJURIES > 1) %>% summarize(meanfat = mean(FATALITIES, na.rm = TRUE),
medianfat = median(FATALITIES, na.rm = TRUE),
meaninj = mean(INJURIES, na.rm = TRUE),
medianinj = median(INJURIES, na.rm = TRUE))
We now turn to outcome data. This report will focus on two outcome measures - impacts to population health and economic consequences. With respect to population health, the two reported outcomes of interest are fatalities and injuries. The mean number of fatalities per event in our dataset is 0.01, and the median number of fatalities is 0. Similarly, the mean number of injuries per event in our dataset is 0.14 and the median is 0. As we can see, the most common outcome for an event is to have no fatalities and no injuries. However, when looking at events that do cause injuries, we can see that for fatalities, the mean and median are 0.78 and 0 respectively, whereas the mean and median injuries are 12.27 and 3 respectively.
In order to give an estimation on the effect of weather events on population health, we’ll use a composite outcome of fatalities and injuries. However, we see that fatalities are far less frequent than injuries. Additionally, fatalities should be considered more severe than injuries and should therefore be weighted more heavily. In calculating the composite outcome, we will multiply the number of fatalities by 10 and add them to the number of injuries.
#Creating the health outcome
fatweight <- 10
df <- df %>% mutate(healthoutcome = FATALITIES * fatweight + INJURIES)
Additionally, for the economic outcome, we see outcomes for property damage (PROPDMG) and crop damage (CROPDMG). Each of these outcomes has an associated variable (PROPDMGEXP, CROPDMGEXP) that gives the magnitude of the property damage (in thousands, millions, or billions). The data dictionary only specifies “K”, “M”, and “B” as valid magnitude identifers. There are various other entries in PROPDMGEXP and CROPDMGEXP that are not these 3 valid identifers - some include lowercase versions, such as “k” and “m”, whereas others include just numbers. The lowercase versions of the three identifiers were likely entry errors and were intended to be their uppercase versions and will be treated as such. However, it is not possible to determine with any certainty what the number entries are intended for, and these will instead be treated as NA values. Tables for PROPDMGEXP and CROPDMGEXP are shown below:
table(df$PROPDMGEXP)
##
## - ? + 0 1 2 3 4 5 6
## 441623 1 8 2 209 23 13 4 2 25 4
## 7 8 B H K m M
## 5 1 12 6 394974 6 10214
table(df$CROPDMGEXP)
##
## ? 0 2 k K M
## 588163 4 17 1 19 257401 1527
We will calculate a composite economic outcome that is the sum of the total value of damaged property and crops.
df <- df %>% mutate(prop = case_when(
PROPDMGEXP == "K" | PROPDMGEXP == "k" ~ PROPDMG * 1000,
PROPDMGEXP == "M" | PROPDMGEXP == "m" ~ PROPDMG * 1000000,
PROPDMGEXP == "B" ~ PROPDMG * 1000000000,
is.na(PROPDMGEXP) ~ NA_real_,
TRUE ~ NA_real_
))
df <- df %>% mutate(crop = case_when(
CROPDMGEXP == "K" | CROPDMGEXP == "k" ~ PROPDMG * 1000,
CROPDMGEXP == "M" | CROPDMGEXP == "m" ~ PROPDMG * 1000000,
is.na(CROPDMGEXP) ~ NA_real_,
TRUE ~ NA_real_
))
df <- df %>% mutate(econoutcome = prop + crop)
We’ll first look at population health outcomes by event type.
ggplot(data = df, aes(x = reorder(EVTYPE, -healthoutcome), y = healthoutcome, fill = EVTYPE)) +
stat_summary(fun = mean, geom = "bar") +
xlab("Weather event Type") +
ylab("Average Composite Adverse Health Events") +
labs(title = "Composite Adverse Health Effects by Event Type",
caption = "This graph shows the average adverse composite health events (weighted fatalities and injuries) by different weather event types ") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1,vjust = 1),
legend.position = "none")
We’ll now look at economic outcomes by event type.
ggplot(data = df %>% filter(!is.na(econoutcome)), aes(x = reorder(EVTYPE, -econoutcome), y = econoutcome,
fill = EVTYPE)) +
stat_summary(fun = mean, geom = "bar") +
xlab("Weather Event Type") +
ylab("Average Economic Cost") +
labs(title = "Average Economic Cost by Weather Event Type",
caption = "This graph shows the average adverse composite economic events (property and crop damage) by different Weather event types") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1,vjust = 1),
legend.position = "none")