Preface
Problem
Data
Frequentist linear regression
Bayesian linear regression
- Model
- Posterior sample
Reliability
- Frequentest case
- Baesian case
Conclusions
References

Pukkuksong-1/KN-11

Preface

Recently Pyongyang demonstrated for the whole world and especially for the USA and it’s allies the growing military power of North Korea ¹. The most shocking element of the parade seemed to be three types of long-range ballistic missiles, one of them apparently new SLBM KN-11 - clear message for Donald Trump and his administration. Though the following test of unknown missile failed the threat is still more than real ².

Problem

We want to estimate KN-11 Range (km), applying open source data on SLBM ³.

Data

We use aggregated and transformed SLBM data ⁴ to explore the relationship between M - Mass (kg), R - Range (km), P - Payload (kg), S - Stage (1,2,3), D - Diameter (m), L - Length (m) and W - Type of Warhead (Single - 1, MIRV - 2) of submarine launched ballistic missile.

load(file = "slbm.dat")
attach(slbm)
slbm

##    S     M     L    D     R    P W
## 1  1  5400 10.40 0.58   150  975 1
## 2  1 13700 11.80 1.30   650 1597 1
## 3  1 19650 14.20 1.30  1420 1179 1
## 4  1 14200  8.89 1.50  2400  650 1
## 5  1 14200  8.89 1.50  3000  650 1
## 6  1 14200  9.65 1.50  3000  650 2
## 7  2 33300 13.00 1.80  7800 1100 1
## 8  2 33300 13.00 1.80  9100 1100 1
## 9  2 26900 10.60 1.54  3900  450 1
## 10 2 35300 14.10 1.80  6500 1600 2
## 11 2 35300 14.10 1.80  8000 1600 1
## 12 2 35300 14.10 1.80  6500 1600 2
## 13 3 90000 16.00 2.40  8300 2550 2
## 14 3 40300 14.80 1.90  8300 2300 2
## 15 3 40800 14.80 1.90  8300 2800 2
## 16 3 36800 11.50 2.00  9300 1150 2
## 17 2 13600  9.45 1.37  2800  500 1
## 18 2 16200  9.86 1.37  4630  760 2
## 19 2 29485 10.36 1.88  5600 2000 2
## 20 3 32000 10.36 1.88  7400 1360 2
## 21 3 57500 13.42 2.11 11000 2880 2
## 22 2 20000 10.70 1.50  3000 1360 1
## 23 2 19500 10.70 1.50  3200 1360 1
## 24 2 19950 10.40 1.50  3200 1000 1
## 25 2 14700 10.70 1.40  2500  600 1
## 26 2 23000 13.00 2.00  8000  700 2

Pukkuksong-1 test flight

Frequentist linear regression

First, we try to estimate Mass of KN-11 \(M\), applying dimensions \(D\) and \(L\) ⁵.

fit.slbm.1<-lm(M~D + L, data = slbm)
predict(fit.slbm.1,newdata=data.frame(D=1.5,L=9), interval="conf")

##        fit      lwr      upr
## 1 14688.42 8668.755 20708.09

Note. We take as null hypothesis \(H_0:M_{KN11}=9000\) kg i.e. we can’t reject this fact of minimum possible mass for SLBM. Though alternative hypothesis maybe like this \(H_1: M_{KN11}>9000\). In fact KN-11 Mass is more likely to be \(9000{\le}M_{KN11}{\le}14000\). We can’t prove it. So we take \(H_0\).

Next, we estimate Range of KN-11. Box-Cox data transformation technique used to stabilize variance, make the data more normal distribution-like, improve the validity of measures of association such as the Pearson correlation between variables and for other data stabilization procedures ⁶.

fit.slbm.2<-lm(log(R)~S + D + M, data = slbm)
par(mfrow=c(2,2))
plot(fit.slbm.2)

exp(predict(fit.slbm.2,newdata = data.frame(S=2,D=1.5,M=9000),interval = "conf"))

##        fit      lwr      upr
## 1 4165.705 3145.233 5517.269

We have got 95% confident interval for KN-11 Range as \(3145{\le}R_{KN11}{\le}5517\).

Bayesian linear regression

This time we’ll use JAGS framework for MCMC with Bayesian estimates of posterior ⁷.

Model

model_string <- "model{

# Likelihood

for(i in 1:n){
R[i]   ~ dnorm(mu[i],inv.var)
mu[i] <- beta[1] + beta[2]*S[i] + beta[3]*D[i] + beta[4]*M[i]    
}

# Prior for beta
for(j in 1:4){
beta[j] ~ dnorm(0,0.0001)
}

# Prior for the inverse variance
inv.var   ~ dgamma(0.001, 0.001)
sigma     <- 1/sqrt(inv.var)

}"

Posterior sample

library(rjags)

## Loading required package: coda

## Linked to JAGS 4.1.0

## Loaded modules: basemod,bugs

library(bayesplot)

## This is bayesplot version 1.1.0

n <- length(slbm[,1])
model <- jags.model(textConnection(model_string), 
                    data = list(R=R,S=S,D=D,M=M,n=n),inits=list(.RNG.name= "base::Wichmann-Hill",.RNG.seed= 12341))

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 26
##    Unobserved stochastic nodes: 5
##    Total graph size: 181
## 
## Initializing model

update(model, 10000, progress.bar="none"); # Burnin for 10000 samples

num.iter <- 20000
samp <- coda.samples(model, 
                     variable.names=c("beta","sigma"), 
                     n.iter=num.iter, progress.bar="none")

summary(samp)

## 
## Iterations = 10001:30000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 20000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##              Mean        SD  Naive SE Time-series SE
## beta[1]   20.9494  99.75493 0.7053739      0.7378177
## beta[2]   53.2238  99.64124 0.7045700      0.8491243
## beta[3]   38.6979  98.79655 0.6985971      0.7893841
## beta[4]    0.1678   0.01544 0.0001092      0.0001439
## sigma   2193.0419 332.89736 2.3539398      2.5253523
## 
## 2. Quantiles for each variable:
## 
##             2.5%       25%       50%       75%     97.5%
## beta[1] -175.809  -45.2761   20.3851   88.8852  217.2929
## beta[2] -142.302  -13.5470   52.5946  121.2410  246.0507
## beta[3] -154.500  -27.9163   38.5901  104.8859  232.7657
## beta[4]    0.138    0.1575    0.1677    0.1779    0.1984
## sigma   1659.235 1956.5066 2152.5594 2386.1029 2946.0285

plot(samp)

mcmc_hist(samp, par=c("beta[1]", "beta[2]","beta[3]","sigma"))

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

samp.dat <- as.data.frame(mcmc.list(samp)[[1]])
str(samp.dat)

## 'data.frame':    20000 obs. of  5 variables:
##  $ beta[1]: num  48.8 -131.6 -12.5 117 -10.4 ...
##  $ beta[2]: num  -178.3 17.7 41 -136 36.5 ...
##  $ beta[3]: num  95.3 -20.6 148.9 164.1 105.6 ...
##  $ beta[4]: num  0.188 0.185 0.167 0.169 0.181 ...
##  $ sigma  : num  2442 2365 2120 1880 2126 ...

B1 <- samp.dat$`beta[1]`
B2 <- samp.dat$`beta[2]`
B3 <- samp.dat$`beta[3]`
B4 <- samp.dat$`beta[4]`
SS <- samp.dat$sigma

RR <- c()
for (i in 1:10000) {
RR[i] <- B1[i] + B2[i]*2 + B3[i]*1.5 + B4[i]*9000
}

hist(RR,col = "red",main = "95% credible KN-11 range distribution",xlab = "Range, km")

SE <- SS/sqrt(num.iter)

hist(SE,col = "blue",main = "95% credible interval for Standard Error",xlab = "SE, km")

quantile(RR,c(0.025,0.975))

##     2.5%    97.5% 
## 1244.434 2132.004

quantile(RR+2*SE,c(0.025,0.975))

##     2.5%    97.5% 
## 1276.870 2163.935

quantile(RR-2*SE,c(0.025,0.975))

##     2.5%    97.5% 
## 1212.814 2102.162

Applying Bayesian regression we have got 95% credible interval for KN-11 Range as \(P(1244{\le}R_{KN11}{\le}2132)=0.95\) or \(1244{\le}R_{KN11}{\le}2132\) +(-) \(2*SE\) i.e. \(P(1213{\le}R_{KN11}{\le}2164)=0.95\) ⁸.

Reliability

Though very little is known about conducted tests of KN-11 we can try to do some estimation of reliability ⁹. So we have 12 trials with 8 successful and 4 failed tests.

Frequentest case

We postulate null hypothesis about KN-11 reliability as \(H_0: P_{KN11}=0.85\) and alternative hypothesis as \(H_1: P_{KN11}{\ne}0.85\)

binom.test(8,12,0.85)

## 
##  Exact binomial test
## 
## data:  8 and 12
## number of successes = 8, number of trials = 12, p-value = 0.09221
## alternative hypothesis: true probability of success is not equal to 0.85
## 95 percent confidence interval:
##  0.3488755 0.9007539
## sample estimates:
## probability of success 
##              0.6666667

As the p-value = 0.09221 we can’t reject null hypothesis \(H_0: P_{KN11}=0.85\). Not bad at all!

That is we can be 95% confident that \(0.35{\le}P_{KN11}{\le}0.9\) while current point estimate is close to 0.66.

What about power of test?

library(binom)
binom.power(p.alt = 0.5,n = 12,p = 0.85,alpha = 0.05)

##     alpha 
## 0.4884767

binom.plot(n=12,type="levelplot")

## Loading required package: lattice

The power of test \(1-\beta=0.488\) i.e. we can’t accept this estimation as reliable.

Baesian case

We try Bayesian binomial test for our data converting confidence into credibility.

model_string_2 <- "model{

# Likelihood
Y ~ dbinom(theta,n)

# Prior
theta ~ dbeta(a, b)
}"

Y <- 8
n <- 12
a <- 1
b <- 1
model <- jags.model(textConnection(model_string_2),                    inits=list(.RNG.name="base::Wichmann-Hill",.RNG.seed= 12341),
                    data = list(Y=Y,n=n,a=a,b=b))

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 1
##    Unobserved stochastic nodes: 1
##    Total graph size: 5
## 
## Initializing model

update(model, 10000, progress.bar="none"); # Burnin for 10000 samples

samp <- coda.samples(model, 
                     variable.names=c("theta"), 
                     n.iter=20000, progress.bar="none")

summary(samp)

## 
## Iterations = 10001:30000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 20000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##           Mean             SD       Naive SE Time-series SE 
##      0.6420588      0.1242168      0.0008783      0.0008783 
## 
## 2. Quantiles for each variable:
## 
##   2.5%    25%    50%    75%  97.5% 
## 0.3815 0.5578 0.6486 0.7322 0.8610

mcmc_hist(samp,pars = "theta",binwidth = 0.05)

mcmc_intervals(samp,pars = "theta")

mcmc_combo(samp,pars = "theta")

samp.dat <- as.data.frame(mcmc.list(samp)[[1]])
mean(samp.dat$theta>0.85)

## [1] 0.0346

mean(samp.dat$theta<0.85)

## [1] 0.9654

Thanks to Bayes we have got 95% credible interval \(0.3815{\le}P_{KN11}{\le}0.8610\) where \((P_{KN11}>0.85)=0.0346\) and \((P_{KN11}<0.85)=0.9654\).

Conclusions

We take as null hypothesis \(H_0:M_{KN11}=9000\) kg i.e. we can’t reject this fact of minimum possible mass for SLBM.
Applying frequentest approach with linear regression we have got 95% confident interval for KN-11 Range as \(3145{\le}R_{KN11}{\le}5517\).
Applying Bayesian regression we have got 95% credible interval for KN-11 Range as \(P(1213{\le}R_{KN11}{\le}2164)=0.95\).
Bayesian regression provides more modest but more robust and credible results for KN-11 Range.
Bayesian 95% credible interval for KN-11 reliability looks like \(0.3815{\le}P_{KN11}{\le}0.8610\) where \((P_{KN11}>0.85)=0.0346\) and \((P_{KN11}<0.85)=0.9654\).
Anyway North Korean missile threat is not a fake story ¹⁰.

Missile threat

Alexander Levakov, Senior Research Fellow, Ph.D

April, 2017