library(tidyverse)
library(highcharter)
library(corrplot)
library(ggthemes)
library(leaflet)Project 3
Introduction
Nuclear weapons testing represents one of the most effective and secretive chapters of 20th century. Between 1945 and 1998, seven nations detonated 1,000 nuclear devices, which changed geopolitics and accelerated an arms race that brought the world close to destruction. This project visualizes and explores that history through data. I choose this data set because we are seeing similar things in today’s political world where another arms race might be going on without the public knowledge. The dataset I used is the SIPRI Nuclear Explosions Database, put together by the Stockholm International Peace Research Institute (SIPRI).
nuclear_explosion <- read_csv("/Users/eabban/nuclear_explosions.csv")| Variables | Explanation |
|---|---|
| DOE | Department of Energy |
| yield_lower | Explosion yield lower estimate in kilotons of TNT |
| yield_higher | Explosion yield upper estimate in kilotons of TNT |
| Combat | WWII bombs dropped over Japan |
| FMS | Soviet test, study phenomenon of nuclear explosion |
| ME | Military Exercise |
| PNE | Peaceful nuclear explosion |
| SAM | Soviet test, accidental mode/emergency |
| SSE | French/US tests - testing safety of nuclear weapons in case of accident |
| TRANSP | Transportation-storage purposes |
| WE | British, French, US, evaluate effects of nuclear detonation on various targets |
| WR | Weapons development program |
Cleaning
# Removing NA's from the values I'll be using
clean_nukes <- nuclear_explosion |>
filter (!is.na(latitude)) |>
filter (!is.na(name))|>
filter (!is.na(yield_lower))|>
filter (!is.na(yield_upper))|>
filter (!is.na(latitude))|>
filter (!is.na(longitude)) |>
filter (!is.na(yield_lower)) |>
filter (!is.na(purpose))Research/Background Information
Limited Test Ban Treaty (LTBT)
Was a treaty signed by the USA, USSR, and UK in 1963. It banned nuclear testing in the atmosphere, underwater, and in space. Countries could still test underground.
Source: U.S. Department of State. (n.d.). Limited Test Ban Treaty (LTBT). Office of the Historian. https://history.state.gov/milestones/1961-1968/limited-ban
Comprehensive Nuclear Test Ban Treaty (CTBT)
Was a treaty signed in 1996 who’s aim was to ban ALL nuclear explosions everywhere, including underground. The US signed it but never ratified it, meaning it technically never entered into full legal force.
Reference: Arms Control Association. (2023). Comprehensive Test Ban Treaty at a glance. https://www.armscontrol.org/factsheets/comprehensive-test-ban-treaty-glance
PLot 1
This plot shows all the countries with their nuclear testing over the years
tests_by_year <- clean_nukes |>
group_by(year, country) |>
summarise(tests = n())
# Build chart
hchart(tests_by_year, "line",
hcaes(x = year, y = tests, group = country)) |>
hc_colors(c("#1f77b4", "#d62728", "#2ca02c",
"#ff7f0e", "#9467bd", "#e377c2", "#bcbd22")) |>
hc_title(text = "Nuclear Tests Per Year by Country (1945–1998)") |>
hc_xAxis(
title = list(text = "Year"),
plotLines = list(
list(value = 1963, color = "gray",
dashStyle = "Dash", width = 2,
label = list(text = "LTBT 1963", rotation = 0)),
list(value = 1996, color = "gray",
dashStyle = "Dash", width = 2,
label = list(text = "CTBT 1996", rotation = 0))
)
) |>
hc_yAxis(title = list(text = "Number of Tests")) |>
hc_tooltip(shared = TRUE, crosshairs = TRUE) |>
hc_caption(text = "Source: SIPRI Nuclear Explosions Database") |>
hc_legend(title = list(text = "Country")) |>
hc_add_theme(hc_theme_flat())This plot shows how many tests each country did per year. The US did the most tests overall, peaking in the late 1950s and early 1960s. After the LTBT in 1963, testing numbers went down for the U.S but was higher than the other countries. The other countries tested much less and at lower levels throughout the whole period.
Correlation
Predicting yield_upper using magnitude_body and depth
cor(clean_nukes$yield_upper, clean_nukes$magnitude_body)[1] -0.02026928
fit1 <- lm(yield_upper ~ magnitude_body + depth, data = clean_nukes)
summary(fit1)
Call:
lm(formula = yield_upper ~ magnitude_body + depth, data = clean_nukes)
Residuals:
Min 1Q Median 3Q Max
-743.3 -182.9 -181.8 -52.9 14797.1
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 202.866 31.887 6.362 2.7e-10 ***
magnitude_body -7.034 10.510 -0.669 0.5034
depth -3.390 1.926 -1.760 0.0786 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 954.8 on 1379 degrees of freedom
Multiple R-squared: 0.002652, Adjusted R-squared: 0.001205
F-statistic: 1.833 on 2 and 1379 DF, p-value: 0.1603
Remove magnitude_body because it has the biggest p-value (0.5034).
The Adjusted R-squared is 0.001205, meaning the model explains less than 1% of the variation in yield. This tells us that magnitude and depth are not good predictors of yield on their own.
fit2 <- lm(yield_upper ~ depth, data = clean_nukes)
summary(fit2)
Call:
lm(formula = yield_upper ~ depth, data = clean_nukes)
Residuals:
Min 1Q Median 3Q Max
-740.4 -173.8 -170.3 -40.3 14809.7
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 190.252 25.716 7.398 2.39e-13 ***
depth -3.451 1.923 -1.794 0.073 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 954.6 on 1380 degrees of freedom
Multiple R-squared: 0.002328, Adjusted R-squared: 0.001605
F-statistic: 3.22 on 1 and 1380 DF, p-value: 0.07296
Equation for the model
yield_upper = (-3.390 x depth) + 202.866 This means that for every one unit increase in depth, the yield is predicted to go down by about 3.390 kilotons.
Multiple Linear Regression
multi_linear <- clean_nukes |>
filter(yield_upper > 0) |>
mutate(log_yield = log(yield_upper),
purpose_clean = case_when(
purpose == "COMBAT" ~ "Combat",
purpose %in% c("WR", "WR/SE","WR/WE","SE/WR", "PNE/WR") ~ "Weapons Research",
purpose %in% c("WE") ~ "Weapons Effects",
purpose %in% c("PNE", "PNE:PLO", "PNE:V") ~ "Peaceful Use",
purpose %in% c("SE") ~ "Safety Experiment",
purpose %in% c("ME") ~ "Military Effects",
purpose %in% c("TRANSP") ~ "Transport/Storage",
purpose %in% c("SB") ~ "Standby"),
type_clean = case_when(
type %in% c("SHAFT", "TUNNEL", "GALLERY", "SHAFT/GR", "SHAFT/LG", "UG", "CRATER", "MINE") ~ "Underground",
type %in% c("ATMOSPH", "AIRDROP", "BALLOON", "TOWER", "ROCKET", "SPACE") ~ "Atmospheric",
type %in% c("SURFACE", "BARGE", "UW", "SHIP", "WATER SU", "WATERSUR") ~ "Surface / Water"))
model <- lm(log_yield ~ country + purpose_clean + type_clean + depth
+ magnitude_body, data = multi_linear)
summary(model)
Call:
lm(formula = log_yield ~ country + purpose_clean + type_clean +
depth + magnitude_body, data = multi_linear)
Residuals:
Min 1Q Median 3Q Max
-10.6014 -0.5796 -0.1091 0.6781 6.0251
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.908346 1.227498 2.369 0.018 *
countryINDIA -1.453187 1.731159 -0.839 0.401
countryUK 0.460068 0.291515 1.578 0.115
countryUSA -0.036807 0.137557 -0.268 0.789
countryUSSR -1.522185 0.368822 -4.127 3.90e-05 ***
purpose_cleanPeaceful Use -0.635892 1.261635 -0.504 0.614
purpose_cleanSafety Experiment -2.306406 1.253320 -1.840 0.066 .
purpose_cleanStandby -8.506000 1.447843 -5.875 5.34e-09 ***
purpose_cleanWeapons Effects -0.659909 1.235446 -0.534 0.593
purpose_cleanWeapons Research 0.127352 1.226789 0.104 0.917
type_cleanSurface / Water 1.807773 0.250265 7.223 8.50e-13 ***
type_cleanUnderground 0.105963 0.144397 0.734 0.463
depth -0.007912 0.003573 -2.214 0.027 *
magnitude_body 0.246653 0.021521 11.461 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.725 on 1332 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.2813, Adjusted R-squared: 0.2743
F-statistic: 40.1 on 13 and 1332 DF, p-value: < 2.2e-16
par(mfrow = c(2,2))
plot(model)Final Plot
country_totals <- clean_nukes |>
filter(country %in% c("USA", "USSR", "FRANCE", "UK", "INDIA")) |>
group_by(country) |>
summarise(
total_tests = n(),
biggest_blast = max(yield_upper),
top_purpose = names(sort(table(purpose), decreasing = TRUE))[1]
)
decade_data <- clean_nukes |>
filter(country %in% c("USA", "USSR", "FRANCE", "UK", "INDIA")) |>
mutate(
decade = paste0(floor(year / 10) * 10, "s"),
type_clean = case_when(
type %in% c("SHAFT", "TUNNEL", "GALLERY",
"SHAFT/GR", "SHAFT/LG", "UG",
"CRATER", "MINE") ~ "Underground",
type %in% c("ATMOSPH", "AIRDROP", "BALLOON",
"TOWER", "ROCKET", "SPACE") ~ "Atmospheric",
type %in% c("SURFACE", "BARGE", "UW", "SHIP",
"WATER SU", "WATERSUR") ~ "Surface / Water"
)
) |>
group_by(country, decade) |>
summarise(
tests = n(),
underground = sum(type_clean == "Underground"),
atmospheric = sum(type_clean == "Atmospheric"),
surface = sum(type_clean == "Surface / Water")
)
# Grouping the purpose
purpose_labels <- c(
"WR" = "Weapons Research",
"WE" = "Weapons Effects",
"PNE" = "Peaceful Use",
"SE" = "Safety Experiment",
"ME" = "Military Effects",
"FMS" = "Soviet Phenomenon Study",
"SAM" = "Accidental Mode",
"TRANSP" = "Transport/Storage",
"COMBAT" = "Combat"
)
main_data <- lapply(1:nrow(country_totals), function(i) {
purpose_code <- country_totals$top_purpose[i]
purpose_full <- ifelse(
purpose_code %in% names(purpose_labels),
purpose_labels[[purpose_code]],
purpose_code
)
list(
name = country_totals$country[i],
y = country_totals$total_tests[i],
drilldown = country_totals$country[i],
biggest = paste0(country_totals$biggest_blast[i], " kt"),
purpose = purpose_full
)
})
highchart() |>
hc_chart(type = "column", polar = TRUE) |>
hc_title(text = "Nuclear Tests by Country (1945-1998)") |>
hc_subtitle(text = "Click a bar to see breakdown by decade") |>
hc_colors(c("#f5c518", "#ff4500", "#00cfff", "#ff8c00", "#00ff99")) |>
hc_xAxis(
type = "category",
tickmarkPlacement = "on",
lineWidth = 0
) |>
hc_yAxis(
min = 0,
endOnTick = FALSE,
showLastLabel = TRUE,
title = list(text = "Number of Tests")
) |>
hc_series(
list(
name = "Total Tests",
data = main_data,
colorByPoint = TRUE
)
) |>
hc_tooltip(
useHTML = TRUE,
headerFormat = "",
pointFormat = "
<b>{point.name}</b><br/>
Total tests: <b>{point.y}</b><br/>
Biggest explosion: <b>{point.biggest}</b><br/>
Most common purpose: <b>{point.purpose}</b><br/>
<i>Click to see decade breakdown</i>
",
style = list(fontSize = "13px")
) |>
hc_drilldown(
allowPointDrilldown = TRUE,
activeAxisLabelStyle = list(
textDecoration = "none",
color = "#333"
),
series = lapply(country_totals$country, function(c) {
d <- decade_data |> filter(country == c)
list(
id = c,
type = "column",
name = c,
tooltip = list(
useHTML = TRUE,
headerFormat = "",
pointFormat = paste0(
"<b>{point.name}</b><br/>",
"Total tests: <b>{point.y}</b><br/>",
"Underground: <b>{point.underground}</b><br/>",
"Atmospheric: <b>{point.atmospheric}</b><br/>",
"Surface/Water: <b>{point.surface}</b>"
)
),
data = lapply(1:nrow(d), function(i) {
list(
name = d$decade[i],
y = d$tests[i],
underground = d$underground[i],
atmospheric = d$atmospheric[i],
surface = d$surface[i]
)
})
)
})
) |>
hc_caption(text = "Source: SIPRI Nuclear Explosions Database") |>
hc_add_theme(hc_theme_flat())This is an interactive plot that shows the countries in a circular form. Hovering over each bar shows the biggest single explosion that country recorded and the most common reason they tested. Clicking on a bar drills takes you into a decade view, you can see how many of those tests were underground, atmospheric, or surface level.
Reflection
The visualizations in this project tracks the full arms race. The first plot shows how many tests were done each year and how the numbers went down after the treaties. The last plot is a circular bar chart that shows total tests per country, with a drilldown that breaks down testing by decade and type when you click on a bar.
One thing that stood out was how much testing U.S was doing compared to everyone else’s, even though the US did more tests overall. The map also made it easy to see that most countries tested far away from their own land except USSR. Another surprising thing was the lack of China in the dataset.
Something I wanted to try was an animation that showed the tests appearing on the map year by year, but I could not get it to work. I also thought it would be interesting to connect the test locations to health data from those areas to see how the radiation affected people, but would have been very complex.