Synopsis: This report leverages catastrophic weather data from the National Weather Service to analyze the most physically hazardous and economically detrimental types of catastrophic weather. The following code, and its accompanying narrative, perform the following:
Packages: The following code detects, installs, and loads required packages for analysis:
if(!require(pacman)){install.packages("pacman")}; library(pacman)
## Loading required package: pacman
packages <- c("readr", "dplyr", "scales", "stringr", "ggplot2", "lubridate")
p_load(packages, character.only = TRUE); rm(packages)
Data Retrieval & Reading: The directory, storm_data, is created if undetected, and the weather data are downloaded and read into R:
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
path <- "./storm_data"
file <- "./repdata%2Fdata%2FStormData.csv"
if(!file.exists(path)){dir.create(path)}; setwd(path)
download.file(url, file)
storms <- read.csv(file, stringsAsFactors = F)
rm(url, path, file)
Formatting: The data are then formatted to have lowercase variable names, while variable bgn_date, which timestamps the beginning of the weather event, is formatted for posterity. Lastly, dimension reduction trims the dataset to key variables of interest:
names(storms) <- tolower(names(storms))
storms <- storms %>%
mutate(bgn_date = str_split(storms$bgn_date, " ", simplify = T)[,1],
bgn_date = mdy(bgn_date)) %>%
select(bgn_date, countyname:evtype, fatalities:cropdmgexp)
Filtering: The reader may view the unique values and total occurrences for variables propdmgexp and cropdmgexp, the multiplicative values which modify propdmg and cropdmg, variables indicating the economic impact, in USD, of property damage and crop damage, respectively:
table(storms$propdmgexp)
##
## - ? + 0 1 2 3 4 5
## 465934 1 8 5 216 25 13 4 4 28
## 6 7 8 B h H K m M
## 4 5 1 40 1 6 424665 7 11330
table(storms$cropdmgexp)
##
## ? 0 2 B k K m M
## 618413 7 19 1 9 21 281832 1 1994
Due to the unknown nature of myriad unique values seen in both propdmgexp and cropdmgexp, only values mentioned specifically in the dataset documentation will be preserved during filtering, as well as 465,934 and 618,413 uncategorized values which, if filtered, may remove valid entries for variables of interest. This includes values H, K, M, and B, indicating hundreds, thousands, millions, and billions of USD, respectively. Values are capitalized for uniformity:
prop_exp <- c("H", "K", "M", "B")
storms <- storms %>% filter(propdmgexp %in% prop_exp)
storms$cropdmgexp <- toupper(storms$cropdmgexp)
crop_exp <- c("", "H", "B", "K", "M")
storms <- storms %>% filter(cropdmgexp %in% crop_exp)
rm(crop_exp, prop_exp)
Post-filtering, observe the following counts for preserved values among variables propdmgexp and cropdmgexp:
table(storms$propdmgexp)
##
## B H K M
## 40 6 424645 11329
table(storms$cropdmgexp)
##
## B K M
## 156484 5 277982 1549
Transformations: Though minor transformations occur later in Processing, the most notable and time-consuming transformation in Processing involves detecting the multiplicative values in propdmgexp and cropdmgexp and modifying their respective cost-related variables, propdmg and cropdmg. In effect, these variables now accurately reflect weather event costs in USD. Now-obsolete variables are removed from the dataset and scientific notation disabled.
for(i in 1:nrow(storms)){
if(storms$propdmgexp[i] == "H"){
storms$propdmg[i] <- storms$propdmg[i] * 100
} else if (storms$propdmgexp[i] == "K"){
storms$propdmg[i] <- storms$propdmg[i] * 1000
} else if (storms$propdmgexp[i] == "M"){
storms$propdmg[i] <- storms$propdmg[i] * 1000000
} else if (storms$propdmgexp[i] == "B"){
storms$propdmg[i] <- storms$propdmg[i] * 1000000000
} else {
storms$propdmgexp[i] <- storms$propdmg[i]
}
}
for(i in 1:nrow(storms)){
if(storms$cropdmgexp[i] == "H"){
storms$cropdmg[i] <- storms$cropdmg[i] * 100
} else if (storms$cropdmgexp[i] == "K"){
storms$cropdmg[i] <- storms$cropdmg[i] * 1000
} else if (storms$cropdmgexp[i] == "M"){
storms$cropdmg[i] <- storms$cropdmg[i] * 1000000
} else if (storms$cropdmgexp[i] == "B"){
storms$cropdmg[i] <- storms$cropdmg[i] * 1000000000
} else {
storms$cropdmgexp[i] <- storms$cropdmg[i]
}
}
storms <- storms %>% select(-propdmgexp, -cropdmgexp)
options(scipen = 999); rm(i)
The results are cached for more readily-available knitting.
Aggregation, Arrangement, and Additional Transformations: The following creates a subset of the storm data, health, which aggregates per capita casualties per event type by summing fatalities and injuries divided by 1,000 to create variable per_cap and summing the same variables without division to create total casualties. The aggregate data are arranged in descending order and coerced to ordered factors for ease of visualization, while the top ten most catastrophic weather events remain:
health <- storms %>%
group_by(evtype) %>%
summarize(per_cap = sum(fatalities, injuries)/1000,
total = sum(fatalities, injuries)) %>%
arrange(desc(per_cap)) %>%
head(10)
health$evtype <- factor(health$evtype,
levels = health$evtype[order(health$per_cap,
decreasing = T)])
The last Processing step prepares another subset of the storm data, economy, which again aggregates by event type and creates variable billions, the sum of propdmg and cropdmg in billions of USD, as well as the total USD value, derived from the same variables: total. The aggregate data are arranged in descending order and coerced to ordered factors for ease of visualization, while the top ten most catastrophic weather events remain:
economy <- storms %>%
group_by(evtype) %>%
summarize(billions = sum(propdmg, cropdmg)/1000000000,
total = sum(propdmg, cropdmg)) %>%
arrange(desc(billions)) %>%
head(10)
economy$evtype <- factor(economy$evtype,
levels = economy$evtype[order(economy$billions,
decreasing = T)])
Effects on Population Health: Per the above processing, the top ten most catastrophic weather events in terms of public health, measured in casualties, or the sum of fatalities and injuries: total are as follows:
print(health)
## # A tibble: 10 x 3
## evtype per_cap total
## <fct> <dbl> <dbl>
## 1 TORNADO 96.0 95981
## 2 FLOOD 7.15 7151
## 3 TSTM WIND 2.91 2910
## 4 FLASH FLOOD 2.27 2271
## 5 ICE STORM 1.89 1891
## 6 LIGHTNING 1.82 1817
## 7 THUNDERSTORM WIND 1.60 1599
## 8 HEAT 1.38 1384
## 9 HURRICANE/TYPHOON 1.34 1337
## 10 EXCESSIVE HEAT 1.14 1137
Visualized, Tornado weather events dominate casualties juxtaposed to runner-ups:
ggplot(health, aes(x = evtype, y = per_cap)) +
geom_col(fill = "tomato") +
theme_classic() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_y_continuous(labels = comma) +
ylab("Casualties (In Thousands)") +
xlab("Event Type") +
ggtitle("Total casualties by event type",
"Including injuries and fatalities")
Figure 1: “Total casualties by event type” shows the overwhelmingly devestating effect on public health by “Tornado” events, causing 95 K casualties. See Figure 2 for a more informative look at remaining weather events.
For a more informative view of remaining catastrophic weather events, the following is the exact visualization as the one above, albeit with a y-axis truncated at a 10,000-casualty limit:
ggplot(health, aes(x = evtype, y = per_cap)) +
geom_col(fill = "tomato") +
theme_classic() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_y_continuous(labels = comma) +
ylab("Casualties (In Thousands)") +
xlab("Event Type") +
ggtitle("Total casualties by event type, truncated",
"Truncated at 10,000 casualties") +
coord_cartesian(ylim = c(0, 10))
Figure 2: “Total casualties by event type, truncated” shows remaining contenders for catastrophic weather events in terms of public health impacts, with the y-axis now truncated at 10,000 casualties. In second, “Flood” events are markedly deadlier than remaining weather events, with the former causing nearly triple the casualties than remaining events, e.g. “Flash Floods” and “Lightning”.
Economic Consequences: Lastly, per the above processing, the top ten most catastrophic weather effects in terms of economic consequences, measured in USD by summing the multiplier-modified values of propdmg and cropdmg. Variable billions measures total costs in billions of USD, while variable total measures the total cost of the top 10 catastrophic weather events in USD:
print(economy)
## # A tibble: 10 x 3
## evtype billions total
## <fct> <dbl> <dbl>
## 1 FLOOD 150. 149828665250
## 2 HURRICANE/TYPHOON 71.9 71913712800
## 3 TORNADO 57.3 57278861940
## 4 STORM SURGE 43.3 43323541000
## 5 HAIL 17.8 17756170123
## 6 FLASH FLOOD 17.5 17528250560
## 7 HURRICANE 14.6 14557229010
## 8 RIVER FLOOD 10.1 10147679500
## 9 ICE STORM 8.97 8967037810
## 10 TROPICAL STORM 8.16 8155601550
These costs are illustrated in the following visualization. Note that the y-axis is measured in billions of USD:
ggplot(economy, aes(x = evtype, y = billions)) +
geom_col(fill = "limegreen") +
theme_classic() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
ylab("USD Costs (In Billions)") +
xlab("Event Type") +
ggtitle("Total economic damage by event type",
"Including property and crop damage")
Figure 3: “Total economic damage by event type” shows “Flood” events to dominate economic consequences among catastrophic weather events at nearly $150 B. Second, and less than half the economic detriment of “Flood” events are “Hurricane/Typhoon” events, at $72 B. “Tornado” and “Storm Surge” events closely follow, while remaining events, though not insignificant, pale in comparison to the leading four events.