The goal of this project is to determine what types of categorical and continuous variables exist in Tidy Tuesday’s nuclear explosion data set, how they are distributed, and what kinds of potential questions they could answer.
library(tidyverse)
library(readr)
nuclear_explosions <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2019/2019-08-20/nuclear_explosions.csv", show_col_types = FALSE)
nuclear_explosions |>
pluck("type") |>
unique()
## [1] "TOWER" "AIRDROP" "UW" "SURFACE" "CRATER" "SHIP"
## [7] "ATMOSPH" "BARGE" "BALLOON" "ROCKET" "SHAFT" "TUNNEL"
## [13] "WATERSUR" "SPACE" "GALLERY" "WATER SU" "UG" "SHAFT/GR"
## [19] "MINE" "SHAFT/LG"
nuclear_explosions |>
group_by(type) |>
summarize(num_explosions = n()) |>
arrange(desc(num_explosions))
## # A tibble: 20 × 2
## type num_explosions
## <chr> <int>
## 1 SHAFT 1015
## 2 TUNNEL 310
## 3 ATMOSPH 185
## 4 SHAFT/GR 85
## 5 AIRDROP 78
## 6 TOWER 75
## 7 BALLOON 62
## 8 SURFACE 62
## 9 SHAFT/LG 56
## 10 BARGE 40
## 11 UG 32
## 12 GALLERY 13
## 13 ROCKET 13
## 14 CRATER 9
## 15 UW 8
## 16 SPACE 4
## 17 MINE 1
## 18 SHIP 1
## 19 WATER SU 1
## 20 WATERSUR 1
In total, there are 20 different explosion types that are recorded in this dataset, but 4 of them only have a single entry and 7 others have less than 50. Shaft explosions make up the majority of the data and are the only type with over 1000 entries. There appears to be some cleaning necessary for this column, as ‘WATERSUR’ and ‘WATER SU’ both likely refer to Water Surface and ‘TUNNEL’ and ‘GALLERY’ mean the same thing according to the data documentation.
nuclear_explosions |>
pluck("purpose") |>
unique()
## [1] "WR" "COMBAT" "WE" "ME" "SE" "FMS" "SB"
## [8] "SAM" "PNE:PLO" "TRANSP" "PNE:V" NA "PNE" "WR/SE"
## [15] "WR/WE" "WR/PNE" "WR/SAM" "PNE/WR" "SE/WR" "WR/P/SA" "WE/SAM"
## [22] "WE/WR" "WR/F/SA" "WR/FMS" "FMS/WR" "WR/P/S" "WR/F/S" "WR/WE/S"
nuclear_explosions |>
group_by(purpose) |>
summarize(num_explosions = n()) |>
arrange(desc(num_explosions))
## # A tibble: 28 × 2
## purpose num_explosions
## <chr> <int>
## 1 WR 1495
## 2 WE 181
## 3 PNE 153
## 4 SE 71
## 5 FMS 33
## 6 PNE:PLO 27
## 7 SAM 25
## 8 WR/SE 11
## 9 PNE:V 7
## 10 WR/FMS 6
## # ℹ 18 more rows
Similarly to explosion type, there are many explosion purposes recorded in this dataset, but most of them only have a few instances. 5 purposes only have a single entry and 14 others have less than 10 entries, leaving 8 major categories to focus on in visualizations. There is also a single explosion without a recorded purpose. However, many explosion purposes are actually multiple types delimited by a ‘/’ and two seem to be more specific versions of ‘PNE’ (Peaceful Nuclear Explosion).
depth_col <- nuclear_explosions |>
pluck("magnitude_surface")
depth_col |>
mean(na.rm = TRUE)
## [1] 0.3558264
depth_col |>
sd(na.rm = TRUE)
## [1] 1.202229
depth_col |>
quantile()
## 0% 25% 50% 75% 100%
## 0 0 0 0 6
depth_col |>
quantile(c(.75, .8, .85, .9, .95, 1))
## 75% 80% 85% 90% 95% 100%
## 0.0 0.0 0.0 0.0 4.2 6.0
Here we investigate the different values for the surface wave magnitudes of the different explosions. We can see by the mean and standard deviation that the values are closely clustered together and near zero, which is further confirmed by looking at the quantiles. The default quantile probabilities show us that the lower 75% of values are all 0, while the maximum value is 6. By investigating probabilities above 75% in more detail we can see that somewhere between the lower 90-95% of the values are all 0.
What is the mean depth for each type of explosion?
What is the median yield range for each type of explosion?
How many explosions occurred in each year included in the data set?
nuclear_explosions |>
group_by(type) |>
summarize(mean_depth = round(mean(depth), 8), num_explosions = n()) |>
filter(num_explosions > 1) |>
arrange(desc(mean_depth))
## # A tibble: 16 × 3
## type mean_depth num_explosions
## <chr> <dbl> <int>
## 1 SHAFT 0.0730 1015
## 2 UW 0.035 8
## 3 TUNNEL 0.0154 310
## 4 CRATER 0.00667 9
## 5 GALLERY 0 13
## 6 SHAFT/GR 0 85
## 7 SHAFT/LG 0 56
## 8 SPACE 0 4
## 9 UG 0 32
## 10 SURFACE -0.000258 62
## 11 BARGE -0.0009 40
## 12 ATMOSPH -0.00108 185
## 13 TOWER -0.0701 75
## 14 AIRDROP -0.322 78
## 15 BALLOON -0.582 62
## 16 ROCKET -78.2 13
This aggregation answers the first question by finding the mean depth for each type of explosion that has more than one entry in the data set. All of the values are very small, excluding the mean of ‘ROCKET’ type, which is about -78.2. Negative depth is height and indicates that the explosions were detonated above ground, so it makes sense that a rocket would deploy very high above the ground. Logically, explosions detonated in underground shafts and tunnels, underwater, and in craters have a mean depth above zero, while explosions detonated on the surface, barges, and towers, in the atmosphere, and from an airplane or balloon have a mean height above zero. What needs further investigation are the explosion types with a mean depth of exactly zero, as it does not make sense for ‘GALLERY’ type explosions (which the documentation explains to be the same as ‘TUNNEL’ type explosions) or for the alternate vertical shaft type explosions to be detonated at exactly surface level. Zero values could potentially indicate that the depth was not successfully measured.
nuclear_explosions |>
group_by(type) |>
summarize(median_yield_range = median(yield_upper - yield_lower),
num_explosions = n()) |>
filter(num_explosions > 1, !is.na(median_yield_range)) |>
arrange(desc(median_yield_range), desc(num_explosions))
## # A tibble: 12 × 3
## type median_yield_range num_explosions
## <chr> <dbl> <int>
## 1 SHAFT/LG 150 56
## 2 SHAFT 20 1015
## 3 GALLERY 20 13
## 4 TUNNEL 20.0 310
## 5 SHAFT/GR 5 85
## 6 AIRDROP 0 78
## 7 TOWER 0 75
## 8 SURFACE 0 62
## 9 BARGE 0 40
## 10 CRATER 0 9
## 11 UW 0 8
## 12 SPACE 0 4
This aggregation answers the second question: what is the median yield range for each explosion type? After removing types with only a single entry and with invalid yield ranges (likely due to either yield_upper or yield_lower values being missing), we can see that shaft explosions have the largest median yield ranges, meaning they tend to have large differences between their lower and upper yield estimates. Seven of the twelve types with valid data have a median yield range of 0, indicating that they may potentially be much easier to estimate yield for accurately.
purposes <- nuclear_explosions |>
group_by(purpose) |>
summarize(num_explosions = n()) |>
arrange(desc(num_explosions)) |>
filter(num_explosions >= 10) |>
pluck("purpose")
purposes
## [1] "WR" "WE" "PNE" "SE" "FMS" "PNE:PLO" "SAM"
## [8] "WR/SE"
nuclear_explosions |>
filter(purpose %in% purposes) |>
ggplot() +
geom_boxplot(mapping = aes(x = purpose, y = magnitude_body)) +
labs(title = "Body Wave Magnitude by Explosion Purpose",
x = "Purpose", y = "Body Wave Magnitude") +
theme_minimal()
Since there are too many different explosion purposes to visualize, we focus on the top 8 explosion purposes, as they all have at least 10 instances in the data. The body wave magnitude for most purposes leans heavily towards 0 or is even entirely 0 in the case of ‘SE’ purpose explosions. Interestingly, PNE purpose explosions, peaceful nuclear explosions, are the only main purpose to have a median above 0.
types <- nuclear_explosions |>
group_by(type) |>
summarize(num_explosions = n()) |>
arrange(desc(num_explosions)) |>
filter(num_explosions >= 60) |>
pluck("type")
types
## [1] "SHAFT" "TUNNEL" "ATMOSPH" "SHAFT/GR" "AIRDROP" "TOWER" "BALLOON"
## [8] "SURFACE"
nuclear_explosions |>
filter(type %in% types, purpose %in% purposes) |>
ggplot() +
geom_bar(mapping = aes(x = purpose, fill = type)) +
labs(title = "Number of Explosions by Type and Purpose",
x = "Purpose", y = "Explosion Count") +
theme_minimal() +
scale_fill_brewer(palette = 'Set1')
As with the purposes, there are too many explosion types to include in a single visualization, so we focus on the top 8 explosion types–those with at least 60 instances in the data set. Explosions of purpose WR, meaning they were detonated as part of a weapons development program, make up the vast majority of the data and include all of the main explosion types. Shaft explosions clearly make up most of the WR explosions, and shaft and tunnel explosions are the most visible among the other explosion purposes.