Week 2 Data Dive: Summaries

The goal of this project is to determine what types of categorical and continuous variables exist in Tidy Tuesday’s nuclear explosion data set, how they are distributed, and what kinds of potential questions they could answer.

library(tidyverse)
library(readr)

nuclear_explosions <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2019/2019-08-20/nuclear_explosions.csv", show_col_types = FALSE)

Numeric Summaries

Explosion Type

nuclear_explosions |> 
  pluck("type") |>
  unique()

##  [1] "TOWER"    "AIRDROP"  "UW"       "SURFACE"  "CRATER"   "SHIP"    
##  [7] "ATMOSPH"  "BARGE"    "BALLOON"  "ROCKET"   "SHAFT"    "TUNNEL"  
## [13] "WATERSUR" "SPACE"    "GALLERY"  "WATER SU" "UG"       "SHAFT/GR"
## [19] "MINE"     "SHAFT/LG"

nuclear_explosions |>
  group_by(type) |>
  summarize(num_explosions = n()) |>
  arrange(desc(num_explosions))

## # A tibble: 20 × 2
##    type     num_explosions
##    <chr>             <int>
##  1 SHAFT              1015
##  2 TUNNEL              310
##  3 ATMOSPH             185
##  4 SHAFT/GR             85
##  5 AIRDROP              78
##  6 TOWER                75
##  7 BALLOON              62
##  8 SURFACE              62
##  9 SHAFT/LG             56
## 10 BARGE                40
## 11 UG                   32
## 12 GALLERY              13
## 13 ROCKET               13
## 14 CRATER                9
## 15 UW                    8
## 16 SPACE                 4
## 17 MINE                  1
## 18 SHIP                  1
## 19 WATER SU              1
## 20 WATERSUR              1

In total, there are 20 different explosion types that are recorded in this dataset, but 4 of them only have a single entry and 7 others have less than 50. Shaft explosions make up the majority of the data and are the only type with over 1000 entries. There appears to be some cleaning necessary for this column, as ‘WATERSUR’ and ‘WATER SU’ both likely refer to Water Surface and ‘TUNNEL’ and ‘GALLERY’ mean the same thing according to the data documentation.

Explosion Purpose

nuclear_explosions |> 
  pluck("purpose") |>
  unique()

##  [1] "WR"      "COMBAT"  "WE"      "ME"      "SE"      "FMS"     "SB"     
##  [8] "SAM"     "PNE:PLO" "TRANSP"  "PNE:V"   NA        "PNE"     "WR/SE"  
## [15] "WR/WE"   "WR/PNE"  "WR/SAM"  "PNE/WR"  "SE/WR"   "WR/P/SA" "WE/SAM" 
## [22] "WE/WR"   "WR/F/SA" "WR/FMS"  "FMS/WR"  "WR/P/S"  "WR/F/S"  "WR/WE/S"

nuclear_explosions |> 
  group_by(purpose) |>
  summarize(num_explosions = n()) |>
  arrange(desc(num_explosions))

## # A tibble: 28 × 2
##    purpose num_explosions
##    <chr>            <int>
##  1 WR                1495
##  2 WE                 181
##  3 PNE                153
##  4 SE                  71
##  5 FMS                 33
##  6 PNE:PLO             27
##  7 SAM                 25
##  8 WR/SE               11
##  9 PNE:V                7
## 10 WR/FMS               6
## # ℹ 18 more rows

Similarly to explosion type, there are many explosion purposes recorded in this dataset, but most of them only have a few instances. 5 purposes only have a single entry and 14 others have less than 10 entries, leaving 8 major categories to focus on in visualizations. There is also a single explosion without a recorded purpose. However, many explosion purposes are actually multiple types delimited by a ‘/’ and two seem to be more specific versions of ‘PNE’ (Peaceful Nuclear Explosion).

Explosion Surface Wave Magnitude

depth_col <- nuclear_explosions |>
  pluck("magnitude_surface")

depth_col |>
  mean(na.rm = TRUE)

## [1] 0.3558264

depth_col |>
  sd(na.rm = TRUE)

## [1] 1.202229

depth_col |> 
  quantile()

##   0%  25%  50%  75% 100% 
##    0    0    0    0    6

depth_col |>
  quantile(c(.75, .8, .85, .9, .95, 1))

##  75%  80%  85%  90%  95% 100% 
##  0.0  0.0  0.0  0.0  4.2  6.0

Here we investigate the different values for the surface wave magnitudes of the different explosions. We can see by the mean and standard deviation that the values are closely clustered together and near zero, which is further confirmed by looking at the quantiles. The default quantile probabilities show us that the lower 75% of values are all 0, while the maximum value is 6. By investigating probabilities above 75% in more detail we can see that somewhere between the lower 90-95% of the values are all 0.

Questions

What is the mean depth for each type of explosion?
What is the median yield range for each type of explosion?
How many explosions occurred in each year included in the data set?

Aggregation

Mean Depth by Explosion Type

nuclear_explosions |>
  group_by(type) |>
  summarize(mean_depth = round(mean(depth), 8), num_explosions = n()) |>
  filter(num_explosions > 1) |>
  arrange(desc(mean_depth))

## # A tibble: 16 × 3
##    type     mean_depth num_explosions
##    <chr>         <dbl>          <int>
##  1 SHAFT      0.0730             1015
##  2 UW         0.035                 8
##  3 TUNNEL     0.0154              310
##  4 CRATER     0.00667               9
##  5 GALLERY    0                    13
##  6 SHAFT/GR   0                    85
##  7 SHAFT/LG   0                    56
##  8 SPACE      0                     4
##  9 UG         0                    32
## 10 SURFACE   -0.000258             62
## 11 BARGE     -0.0009               40
## 12 ATMOSPH   -0.00108             185
## 13 TOWER     -0.0701               75
## 14 AIRDROP   -0.322                78
## 15 BALLOON   -0.582                62
## 16 ROCKET   -78.2                  13

This aggregation answers the first question by finding the mean depth for each type of explosion that has more than one entry in the data set. All of the values are very small, excluding the mean of ‘ROCKET’ type, which is about -78.2. Negative depth is height and indicates that the explosions were detonated above ground, so it makes sense that a rocket would deploy very high above the ground. Logically, explosions detonated in underground shafts and tunnels, underwater, and in craters have a mean depth above zero, while explosions detonated on the surface, barges, and towers, in the atmosphere, and from an airplane or balloon have a mean height above zero. What needs further investigation are the explosion types with a mean depth of exactly zero, as it does not make sense for ‘GALLERY’ type explosions (which the documentation explains to be the same as ‘TUNNEL’ type explosions) or for the alternate vertical shaft type explosions to be detonated at exactly surface level. Zero values could potentially indicate that the depth was not successfully measured.

Median Yield Range by Explosion Type

nuclear_explosions |> 
  group_by(type) |>
  summarize(median_yield_range = median(yield_upper - yield_lower),
            num_explosions = n()) |>
  filter(num_explosions > 1, !is.na(median_yield_range)) |>
  arrange(desc(median_yield_range), desc(num_explosions))

## # A tibble: 12 × 3
##    type     median_yield_range num_explosions
##    <chr>                 <dbl>          <int>
##  1 SHAFT/LG              150               56
##  2 SHAFT                  20             1015
##  3 GALLERY                20               13
##  4 TUNNEL                 20.0            310
##  5 SHAFT/GR                5               85
##  6 AIRDROP                 0               78
##  7 TOWER                   0               75
##  8 SURFACE                 0               62
##  9 BARGE                   0               40
## 10 CRATER                  0                9
## 11 UW                      0                8
## 12 SPACE                   0                4

This aggregation answers the second question: what is the median yield range for each explosion type? After removing types with only a single entry and with invalid yield ranges (likely due to either yield_upper or yield_lower values being missing), we can see that shaft explosions have the largest median yield ranges, meaning they tend to have large differences between their lower and upper yield estimates. Seven of the twelve types with valid data have a median yield range of 0, indicating that they may potentially be much easier to estimate yield for accurately.

Visualizations

Explosion Body Wave Magnitude by Purpose

purposes <- nuclear_explosions |>
  group_by(purpose) |>
  summarize(num_explosions = n()) |>
  arrange(desc(num_explosions)) |>
  filter(num_explosions >= 10) |>
  pluck("purpose")

purposes

## [1] "WR"      "WE"      "PNE"     "SE"      "FMS"     "PNE:PLO" "SAM"    
## [8] "WR/SE"

nuclear_explosions |>
  filter(purpose %in% purposes) |>
  ggplot() +
  geom_boxplot(mapping = aes(x = purpose, y = magnitude_body)) +
  labs(title = "Body Wave Magnitude by Explosion Purpose",
       x = "Purpose", y = "Body Wave Magnitude") +
  theme_minimal()

Since there are too many different explosion purposes to visualize, we focus on the top 8 explosion purposes, as they all have at least 10 instances in the data. The body wave magnitude for most purposes leans heavily towards 0 or is even entirely 0 in the case of ‘SE’ purpose explosions. Interestingly, PNE purpose explosions, peaceful nuclear explosions, are the only main purpose to have a median above 0.

Count of Explosions by Type and Purpose

types <- nuclear_explosions |>
  group_by(type) |>
  summarize(num_explosions = n()) |>
  arrange(desc(num_explosions)) |>
  filter(num_explosions >= 60) |>
  pluck("type")

types

## [1] "SHAFT"    "TUNNEL"   "ATMOSPH"  "SHAFT/GR" "AIRDROP"  "TOWER"    "BALLOON" 
## [8] "SURFACE"

nuclear_explosions |>
  filter(type %in% types, purpose %in% purposes) |>
  ggplot() +  
  geom_bar(mapping = aes(x = purpose, fill = type)) +
  labs(title = "Number of Explosions by Type and Purpose",
       x = "Purpose", y = "Explosion Count") +
  theme_minimal() +
  scale_fill_brewer(palette = 'Set1')

As with the purposes, there are too many explosion types to include in a single visualization, so we focus on the top 8 explosion types–those with at least 60 instances in the data set. Explosions of purpose WR, meaning they were detonated as part of a weapons development program, make up the vast majority of the data and include all of the main explosion types. Shaft explosions clearly make up most of the WR explosions, and shaft and tunnel explosions are the most visible among the other explosion purposes.