Project 3

Author

Jude E. Abban

Introduction

Nuclear weapons testing represents one of the most effective and secretive chapters of 20th century. Between 1945 and 1998, seven nations detonated 1,000 nuclear devices, which changed geopolitics and accelerated an arms race that brought the world close to destruction. This project visualizes and explores that history through data. I choose this data set because we are seeing similar things in today’s political world where another arms race might be going on without the public knowledge. The dataset I used is the SIPRI Nuclear Explosions Database, put together by the Stockholm International Peace Research Institute (SIPRI).

library(tidyverse)
library(highcharter)
library(corrplot)
library(ggthemes)
library(leaflet)

nuclear_explosion <- read_csv("/Users/eabban/nuclear_explosions.csv")

Variables	Explanation
DOE	Department of Energy
yield_lower	Explosion yield lower estimate in kilotons of TNT
yield_higher	Explosion yield upper estimate in kilotons of TNT
Combat	WWII bombs dropped over Japan
FMS	Soviet test, study phenomenon of nuclear explosion
ME	Military Exercise
PNE	Peaceful nuclear explosion
SAM	Soviet test, accidental mode/emergency
SSE	French/US tests - testing safety of nuclear weapons in case of accident
TRANSP	Transportation-storage purposes
WE	British, French, US, evaluate effects of nuclear detonation on various targets
WR	Weapons development program

Cleaning

# Removing NA's from the values I'll be using
clean_nukes <- nuclear_explosion |>
  filter (!is.na(latitude)) |>
  filter (!is.na(name))|>
  filter (!is.na(yield_lower))|>
  filter (!is.na(yield_upper))|>
  filter (!is.na(latitude))|>
  filter (!is.na(longitude)) |>
  filter (!is.na(yield_lower)) |>
  filter (!is.na(purpose))

Research/Background Information

Limited Test Ban Treaty (LTBT)

Was a treaty signed by the USA, USSR, and UK in 1963. It banned nuclear testing in the atmosphere, underwater, and in space. Countries could still test underground.

Source: U.S. Department of State. (n.d.). Limited Test Ban Treaty (LTBT). Office of the Historian. https://history.state.gov/milestones/1961-1968/limited-ban

Comprehensive Nuclear Test Ban Treaty (CTBT)

Was a treaty signed in 1996 who’s aim was to ban ALL nuclear explosions everywhere, including underground. The US signed it but never ratified it, meaning it technically never entered into full legal force.

Reference: Arms Control Association. (2023). Comprehensive Test Ban Treaty at a glance. https://www.armscontrol.org/factsheets/comprehensive-test-ban-treaty-glance

PLot 1

This plot shows all the countries with their nuclear testing over the years

tests_by_year <- clean_nukes |>
  group_by(year, country) |>
  summarise(tests = n())

# Build chart
hchart(tests_by_year, "line",
       hcaes(x = year, y = tests, group = country)) |>
  hc_colors(c("#1f77b4", "#d62728", "#2ca02c",
              "#ff7f0e", "#9467bd", "#e377c2", "#bcbd22")) |>
  hc_title(text = "Nuclear Tests Per Year by Country (1945–1998)") |>
  hc_xAxis(
    title = list(text = "Year"),
    plotLines = list(
      list(value = 1963, color = "gray",
           dashStyle = "Dash", width = 2,
           label = list(text = "LTBT 1963", rotation = 0)),
      list(value = 1996, color = "gray",
           dashStyle = "Dash", width = 2,
           label = list(text = "CTBT 1996", rotation = 0))
    )
  ) |>
  hc_yAxis(title = list(text = "Number of Tests")) |>
  hc_tooltip(shared = TRUE, crosshairs = TRUE) |>
  hc_caption(text = "Source: SIPRI Nuclear Explosions Database") |>
  hc_legend(title = list(text = "Country")) |>
  hc_add_theme(hc_theme_flat())

This plot shows how many tests each country did per year. The US did the most tests overall, peaking in the late 1950s and early 1960s. After the LTBT in 1963, testing numbers went down for the U.S but was higher than the other countries. The other countries tested much less and at lower levels throughout the whole period.

Correlation

Predicting yield_upper using magnitude_body and depth

cor(clean_nukes$yield_upper, clean_nukes$magnitude_body)

[1] -0.02026928

fit1 <- lm(yield_upper ~ magnitude_body + depth, data = clean_nukes)
summary(fit1)


Call:
lm(formula = yield_upper ~ magnitude_body + depth, data = clean_nukes)

Residuals:
    Min      1Q  Median      3Q     Max 
 -743.3  -182.9  -181.8   -52.9 14797.1 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     202.866     31.887   6.362  2.7e-10 ***
magnitude_body   -7.034     10.510  -0.669   0.5034    
depth            -3.390      1.926  -1.760   0.0786 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 954.8 on 1379 degrees of freedom
Multiple R-squared:  0.002652,  Adjusted R-squared:  0.001205 
F-statistic: 1.833 on 2 and 1379 DF,  p-value: 0.1603

Remove magnitude_body because it has the biggest p-value (0.5034).

The Adjusted R-squared is 0.001205, meaning the model explains less than 1% of the variation in yield. This tells us that magnitude and depth are not good predictors of yield on their own.

fit2 <- lm(yield_upper ~ depth, data = clean_nukes)
summary(fit2)


Call:
lm(formula = yield_upper ~ depth, data = clean_nukes)

Residuals:
    Min      1Q  Median      3Q     Max 
 -740.4  -173.8  -170.3   -40.3 14809.7 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  190.252     25.716   7.398 2.39e-13 ***
depth         -3.451      1.923  -1.794    0.073 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 954.6 on 1380 degrees of freedom
Multiple R-squared:  0.002328,  Adjusted R-squared:  0.001605 
F-statistic:  3.22 on 1 and 1380 DF,  p-value: 0.07296

Equation for the model

yield_upper = (-3.390 x depth) + 202.866 This means that for every one unit increase in depth, the yield is predicted to go down by about 3.390 kilotons.

Multiple Linear Regression

multi_linear <- clean_nukes |>
  filter(yield_upper > 0) |>
  mutate(log_yield = log(yield_upper),
    purpose_clean = case_when(
      purpose == "COMBAT" ~ "Combat",
      purpose %in% c("WR", "WR/SE","WR/WE","SE/WR", "PNE/WR") ~ "Weapons Research",
      purpose %in% c("WE") ~ "Weapons Effects",
      purpose %in% c("PNE", "PNE:PLO", "PNE:V")  ~ "Peaceful Use",
      purpose %in% c("SE") ~ "Safety Experiment",
      purpose %in% c("ME") ~ "Military Effects",
      purpose %in% c("TRANSP") ~ "Transport/Storage",
      purpose %in% c("SB") ~ "Standby"),
    type_clean = case_when(
      type %in% c("SHAFT", "TUNNEL", "GALLERY", "SHAFT/GR", "SHAFT/LG", "UG", "CRATER", "MINE") ~ "Underground",
      type %in% c("ATMOSPH", "AIRDROP", "BALLOON", "TOWER", "ROCKET", "SPACE") ~ "Atmospheric",
      type %in% c("SURFACE", "BARGE", "UW", "SHIP", "WATER SU", "WATERSUR") ~ "Surface / Water"))

model <- lm(log_yield ~ country + purpose_clean + type_clean + depth
            + magnitude_body, data = multi_linear)
summary(model)


Call:
lm(formula = log_yield ~ country + purpose_clean + type_clean + 
    depth + magnitude_body, data = multi_linear)

Residuals:
     Min       1Q   Median       3Q      Max 
-10.6014  -0.5796  -0.1091   0.6781   6.0251 

Coefficients:
                                Estimate Std. Error t value Pr(>|t|)    
(Intercept)                     2.908346   1.227498   2.369    0.018 *  
countryINDIA                   -1.453187   1.731159  -0.839    0.401    
countryUK                       0.460068   0.291515   1.578    0.115    
countryUSA                     -0.036807   0.137557  -0.268    0.789    
countryUSSR                    -1.522185   0.368822  -4.127 3.90e-05 ***
purpose_cleanPeaceful Use      -0.635892   1.261635  -0.504    0.614    
purpose_cleanSafety Experiment -2.306406   1.253320  -1.840    0.066 .  
purpose_cleanStandby           -8.506000   1.447843  -5.875 5.34e-09 ***
purpose_cleanWeapons Effects   -0.659909   1.235446  -0.534    0.593    
purpose_cleanWeapons Research   0.127352   1.226789   0.104    0.917    
type_cleanSurface / Water       1.807773   0.250265   7.223 8.50e-13 ***
type_cleanUnderground           0.105963   0.144397   0.734    0.463    
depth                          -0.007912   0.003573  -2.214    0.027 *  
magnitude_body                  0.246653   0.021521  11.461  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.725 on 1332 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.2813,    Adjusted R-squared:  0.2743 
F-statistic:  40.1 on 13 and 1332 DF,  p-value: < 2.2e-16

par(mfrow = c(2,2))
plot(model)

Final Plot

country_totals <- clean_nukes |>
  filter(country %in% c("USA", "USSR", "FRANCE", "UK", "INDIA")) |>
  group_by(country) |>
  summarise(
    total_tests   = n(),
    biggest_blast = max(yield_upper),
    top_purpose   = names(sort(table(purpose), decreasing = TRUE))[1]
  )

decade_data <- clean_nukes |>
  filter(country %in% c("USA", "USSR", "FRANCE", "UK", "INDIA")) |>
  mutate(
    decade     = paste0(floor(year / 10) * 10, "s"),
    type_clean = case_when(
      type %in% c("SHAFT", "TUNNEL", "GALLERY",
                  "SHAFT/GR", "SHAFT/LG", "UG",
                  "CRATER", "MINE")                ~ "Underground",
      type %in% c("ATMOSPH", "AIRDROP", "BALLOON",
                  "TOWER", "ROCKET", "SPACE")       ~ "Atmospheric",
      type %in% c("SURFACE", "BARGE", "UW", "SHIP",
                  "WATER SU", "WATERSUR")           ~ "Surface / Water"
    )
  ) |>
  group_by(country, decade) |>
  summarise(
    tests       = n(),
    underground = sum(type_clean == "Underground"),
    atmospheric = sum(type_clean == "Atmospheric"),
    surface     = sum(type_clean == "Surface / Water")
  )
# Grouping the purpose
purpose_labels <- c(
  "WR"     = "Weapons Research",
  "WE"     = "Weapons Effects",
  "PNE"    = "Peaceful Use",
  "SE"     = "Safety Experiment",
  "ME"     = "Military Effects",
  "FMS"    = "Soviet Phenomenon Study",
  "SAM"    = "Accidental Mode",
  "TRANSP" = "Transport/Storage",
  "COMBAT" = "Combat"
)

main_data <- lapply(1:nrow(country_totals), function(i) {
  purpose_code <- country_totals$top_purpose[i]
  purpose_full <- ifelse(
    purpose_code %in% names(purpose_labels),
    purpose_labels[[purpose_code]],
    purpose_code
  )
  list(
    name      = country_totals$country[i],
    y         = country_totals$total_tests[i],
    drilldown = country_totals$country[i],
    biggest   = paste0(country_totals$biggest_blast[i], " kt"),
    purpose   = purpose_full
  )
})
highchart() |>
  hc_chart(type = "column", polar = TRUE) |>
  hc_title(text = "Nuclear Tests by Country (1945-1998)") |>
  hc_subtitle(text = "Click a bar to see breakdown by decade") |>
  hc_colors(c("#f5c518", "#ff4500", "#00cfff", "#ff8c00", "#00ff99")) |>
  hc_xAxis(
    type              = "category",
    tickmarkPlacement = "on",
    lineWidth         = 0
  ) |>
  hc_yAxis(
    min           = 0,
    endOnTick     = FALSE,
    showLastLabel = TRUE,
    title         = list(text = "Number of Tests")
  ) |>
  hc_series(
    list(
      name         = "Total Tests",
      data         = main_data,
      colorByPoint = TRUE
    )
  ) |>
  hc_tooltip(
    useHTML      = TRUE,
    headerFormat = "",
    pointFormat  = "
      <b>{point.name}</b><br/>
      Total tests: <b>{point.y}</b><br/>
      Biggest explosion: <b>{point.biggest}</b><br/>
      Most common purpose: <b>{point.purpose}</b><br/>
      <i>Click to see decade breakdown</i>
    ",
    style = list(fontSize = "13px")
  ) |>
  hc_drilldown(
    allowPointDrilldown = TRUE,
    activeAxisLabelStyle = list(
      textDecoration = "none",
      color          = "#333"
    ),
    series = lapply(country_totals$country, function(c) {
      d <- decade_data |> filter(country == c)
      list(
        id      = c,
        type    = "column",
        name    = c,
        tooltip = list(
          useHTML      = TRUE,
          headerFormat = "",
          pointFormat  = paste0(
            "<b>{point.name}</b><br/>",
            "Total tests: <b>{point.y}</b><br/>",
            "Underground: <b>{point.underground}</b><br/>",
            "Atmospheric: <b>{point.atmospheric}</b><br/>",
            "Surface/Water: <b>{point.surface}</b>"
          )
        ),
        data = lapply(1:nrow(d), function(i) {
          list(
            name        = d$decade[i],
            y           = d$tests[i],
            underground = d$underground[i],
            atmospheric = d$atmospheric[i],
            surface     = d$surface[i]
          )
        })
      )
    })
  ) |>
  hc_caption(text = "Source: SIPRI Nuclear Explosions Database") |>
  hc_add_theme(hc_theme_flat())

This is an interactive plot that shows the countries in a circular form. Hovering over each bar shows the biggest single explosion that country recorded and the most common reason they tested. Clicking on a bar drills takes you into a decade view, you can see how many of those tests were underground, atmospheric, or surface level.

Reflection

The visualizations in this project tracks the full arms race. The first plot shows how many tests were done each year and how the numbers went down after the treaties. The last plot is a circular bar chart that shows total tests per country, with a drilldown that breaks down testing by decade and type when you click on a bar.

One thing that stood out was how much testing U.S was doing compared to everyone else’s, even though the US did more tests overall. The map also made it easy to see that most countries tested far away from their own land except USSR. Another surprising thing was the lack of China in the dataset.

Something I wanted to try was an animation that showed the tests appearing on the map year by year, but I could not get it to work. I also thought it would be interesting to connect the test locations to health data from those areas to see how the radiation affected people, but would have been very complex.