Satan-2 test. ABC News photo

Satan-2 test. ABC News photo

About Satan-2

Very little is known about this ICBM. The open source information is here: https://en.wikipedia.org/wiki/RS-28_Sarmat.

Research goal

We don’t believe any speculations about this ICBM. We just try estimate Satan-2 operational range, applying linear regression both in frequentest and Bayesian paradigm on SLBM data.

SLBM data

We use aggregated and transformed SLBM data to explore the relationship between M - Mass (kg), R - Range (km), P - Payload (kg), S - Stage (1,2,3), D - Diameter (m), L - Length (m) and W - Type of Warhead (Single - 1, MIRV - 2) of submarine launched ballistic missile.

load("slbm.dat")
library(DT)
datatable(slbm)
#slbm

Correlations

Here we produce correlations matrix in graphical form.

library(car)
scatterplotMatrix(slbm[,2:6])

Linear regression

Here we produce classical multinomial linear regression model and pass some routine tests.

Scatter plots

library(ggplot2)
ggplot(data=slbm,aes(log(M),sqrt(R)))+geom_point()+geom_smooth(method="lm")

ggplot(data=slbm,aes(log(P),sqrt(R)))+geom_point()+geom_smooth(method="lm")

ggplot(data=slbm,aes(D,sqrt(R)))+geom_point()+geom_smooth(method="lm")

ggplot(data=slbm,aes(S,sqrt(R)))+geom_point()+geom_smooth(method="lm")

ggplot(data=slbm,aes(L,sqrt(R)))+geom_point()+geom_smooth(method="lm")

Model and tests

fit.lm <- lm(log(M)~log(P)+sqrt(R),data = slbm)
summary(fit.lm)
## 
## Call:
## lm(formula = log(M) ~ log(P) + sqrt(R), data = slbm)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.43582 -0.09245 -0.05523  0.06307  0.65331 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 6.443759   0.653801   9.856 1.00e-09 ***
## log(P)      0.344329   0.100484   3.427   0.0023 ** 
## sqrt(R)     0.017668   0.002243   7.877 5.59e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.236 on 23 degrees of freedom
## Multiple R-squared:  0.8453, Adjusted R-squared:  0.8318 
## F-statistic: 62.82 on 2 and 23 DF,  p-value: 4.788e-10
hist(fit.lm$residuals,col = "blue",main = "Histogram of residuals",
     xlab = "Residuals")

par(mfrow=c(2,2))
plot(fit.lm)

Linear regression model formula looks like

\[{Mass} = e^{{B0 + B1\sqrt(Range)} + B2\ln(Payload)}\]

or

\[{Mass} = e^{{6.443759 + 0.01767\sqrt(Range)} + 0.34433\ln(Payload)}\]

Range point estimate

Here we apply previous linear model formula for Range given Mass and Payload.

\[Range={(\frac{6.443759+0.34433ln(Payload)-ln(Mass)}{0.01767})}^2\]

B0 <- as.numeric(fit.lm$coefficients[1])
B1 <- as.numeric(fit.lm$coefficients[2])
B2 <- as.numeric(fit.lm$coefficients[3])
Mass <- 200000
Payload <- 10000
Range <- ((B0 + B1*log(Payload) - log(Mass))/B2)^2
cat("Range =",Range)
## Range = 21504.9

Bayesian linear regression

Model and tests

fit.slbm.bs.4<-stan_glm(sqrt(R)~S+D+L+W+log(M)+log(P),data=slbm,chains=4,iter=10000,seed=12345)
fit.slbm.bs.4
## stan_glm
##  family:       gaussian [identity]
##  formula:      sqrt(R) ~ S + D + L + W + log(M) + log(P)
##  observations: 26
##  predictors:   7
## ------
##             Median MAD_SD
## (Intercept) -40.6  106.7 
## S             9.4    5.1 
## D            37.4   19.8 
## L             0.9    1.7 
## W             3.3    5.6 
## log(M)        6.5   15.0 
## log(P)       -7.3    6.0 
## sigma        10.2    1.7 
## 
## Sample avg. posterior predictive distribution of y:
##          Median MAD_SD
## mean_PPD 69.0    2.8  
## 
## ------
## For info on the priors used see help('prior_summary.stanreg').
par(mfrow=c(1,1))
plot(fit.slbm.bs.4)

posterior_vs_prior(fit.slbm.bs.4)
## 
## Drawing from prior...

fit.slbm.bs42<-as.array(fit.slbm.bs.4)
mcmc_hist(fit.slbm.bs42)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

mcmc_trace(fit.slbm.bs42)

summary(fit.slbm.bs.4)
## 
## Model Info:
## 
##  function:     stan_glm
##  family:       gaussian [identity]
##  formula:      sqrt(R) ~ S + D + L + W + log(M) + log(P)
##  algorithm:    sampling
##  priors:       see help('prior_summary')
##  sample:       20000 (posterior sample size)
##  observations: 26
##  predictors:   7
## 
## Estimates:
##                 mean   sd     2.5%   25%    50%    75%    97.5%
## (Intercept)    -40.0  109.1 -254.9 -112.3  -40.6   31.7  175.7 
## S                9.4    5.4   -1.2    5.9    9.4   12.9   20.2 
## D               37.4   20.3   -2.6   24.1   37.4   50.9   77.4 
## L                0.9    1.8   -2.7   -0.3    0.9    2.0    4.4 
## W                3.2    5.8   -8.5   -0.5    3.3    7.1   14.8 
## log(M)           6.4   15.6  -24.3   -3.6    6.5   16.6   37.0 
## log(P)          -7.3    6.2  -19.5  -11.3   -7.3   -3.3    5.0 
## sigma           10.4    1.8    7.6    9.2   10.2   11.4   14.6 
## mean_PPD        69.0    2.9   63.2   67.1   69.0   70.9   74.7 
## log-posterior -110.5    2.4 -116.3 -111.8 -110.1 -108.8 -107.1 
## 
## Diagnostics:
##               mcse Rhat n_eff
## (Intercept)   1.2  1.0   8809
## S             0.0  1.0  12972
## D             0.2  1.0   9063
## L             0.0  1.0  12118
## W             0.1  1.0  11991
## log(M)        0.2  1.0   8316
## log(P)        0.1  1.0  13099
## sigma         0.0  1.0  10055
## mean_PPD      0.0  1.0  19069
## log-posterior 0.0  1.0   6234
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

Predicting Range by posterior distribution

slbm.pp4 <- posterior_predict(fit.slbm.bs.4,
            newdata = data.frame(L=30,D=3,S=3,W=2,M=200000,P=10000),
            seed = 12345)
quantile(slbm.pp4^2,probs = c(0.1,0.5,0.9))
##      10%      50%      90% 
## 12397.96 20754.35 31240.97
hist(slbm.pp4^2,breaks = 50,col="blue",
     main = "Posterior distribution for R~f(L=30,D=3,S=3,W=2,M=200000,P=10000)",
     xlab = "R,km")
RR <- median(slbm.pp4^2)
abline(v = RR,col="red")

Conclusions

1.Given historical SLBM data (M,P,D,L,S,W) and linear regression model we can produce point estimate for Range as \({Range} = {{(\frac{6.443759+0.34433ln(Payload)-ln(Mass)}{0.01767})}^2}=21504\)

2.Applying Bayesian linear regression model for SLBM data we can produce posterior distribution sample given the same historical data with 80% credible interval \(P(12398\le Range\le31241)=0.8\) and the mean \(Range=20754\)

3.The Satan-2 has unlimited operational Range compared to the existing ICBMs to date.