Pukkuksong-1/KN-11

Pukkuksong-1/KN-11

Preface

Recently Pyongyang demonstrated for the whole world and especially for the USA and it’s allies the growing military power of North Korea 1. The most shocking element of the parade seemed to be three types of long-range ballistic missiles, one of them apparently new SLBM KN-11 - clear message for Donald Trump and his administration. Though the following test of unknown missile failed the threat is still more than real 2.

Problem

We want to estimate KN-11 Range (km), applying open source data on SLBM 3.

Data

We use aggregated and transformed SLBM data 4 to explore the relationship between M - Mass (kg), R - Range (km), P - Payload (kg), S - Stage (1,2,3), D - Diameter (m), L - Length (m) and W - Type of Warhead (Single - 1, MIRV - 2) of submarine launched ballistic missile.

load(file = "slbm.dat")
attach(slbm)
slbm
##    S     M     L    D     R    P W
## 1  1  5400 10.40 0.58   150  975 1
## 2  1 13700 11.80 1.30   650 1597 1
## 3  1 19650 14.20 1.30  1420 1179 1
## 4  1 14200  8.89 1.50  2400  650 1
## 5  1 14200  8.89 1.50  3000  650 1
## 6  1 14200  9.65 1.50  3000  650 2
## 7  2 33300 13.00 1.80  7800 1100 1
## 8  2 33300 13.00 1.80  9100 1100 1
## 9  2 26900 10.60 1.54  3900  450 1
## 10 2 35300 14.10 1.80  6500 1600 2
## 11 2 35300 14.10 1.80  8000 1600 1
## 12 2 35300 14.10 1.80  6500 1600 2
## 13 3 90000 16.00 2.40  8300 2550 2
## 14 3 40300 14.80 1.90  8300 2300 2
## 15 3 40800 14.80 1.90  8300 2800 2
## 16 3 36800 11.50 2.00  9300 1150 2
## 17 2 13600  9.45 1.37  2800  500 1
## 18 2 16200  9.86 1.37  4630  760 2
## 19 2 29485 10.36 1.88  5600 2000 2
## 20 3 32000 10.36 1.88  7400 1360 2
## 21 3 57500 13.42 2.11 11000 2880 2
## 22 2 20000 10.70 1.50  3000 1360 1
## 23 2 19500 10.70 1.50  3200 1360 1
## 24 2 19950 10.40 1.50  3200 1000 1
## 25 2 14700 10.70 1.40  2500  600 1
## 26 2 23000 13.00 2.00  8000  700 2
Pukkuksong-1 test flight

Pukkuksong-1 test flight

Frequentist linear regression

First, we try to estimate Mass of KN-11 \(M\), applying dimensions \(D\) and \(L\) 5.

fit.slbm.1<-lm(M~D + L, data = slbm)
predict(fit.slbm.1,newdata=data.frame(D=1.5,L=9), interval="conf")
##        fit      lwr      upr
## 1 14688.42 8668.755 20708.09

Note. We take as null hypothesis \(H_0:M_{KN11}=9000\) kg i.e. we can’t reject this fact of minimum possible mass for SLBM. Though alternative hypothesis maybe like this \(H_1: M_{KN11}>9000\). In fact KN-11 Mass is more likely to be \(9000{\le}M_{KN11}{\le}14000\). We can’t prove it. So we take \(H_0\).

Next, we estimate Range of KN-11. Box-Cox data transformation technique used to stabilize variance, make the data more normal distribution-like, improve the validity of measures of association such as the Pearson correlation between variables and for other data stabilization procedures 6.

fit.slbm.2<-lm(log(R)~S + D + M, data = slbm)
par(mfrow=c(2,2))
plot(fit.slbm.2)

exp(predict(fit.slbm.2,newdata = data.frame(S=2,D=1.5,M=9000),interval = "conf"))
##        fit      lwr      upr
## 1 4165.705 3145.233 5517.269

We have got 95% confident interval for KN-11 Range as \(3145{\le}R_{KN11}{\le}5517\).

Bayesian linear regression

This time we’ll use JAGS framework for MCMC with Bayesian estimates of posterior 7.

Model

model_string <- "model{

# Likelihood

for(i in 1:n){
R[i]   ~ dnorm(mu[i],inv.var)
mu[i] <- beta[1] + beta[2]*S[i] + beta[3]*D[i] + beta[4]*M[i]    
}

# Prior for beta
for(j in 1:4){
beta[j] ~ dnorm(0,0.0001)
}

# Prior for the inverse variance
inv.var   ~ dgamma(0.001, 0.001)
sigma     <- 1/sqrt(inv.var)

}"

Posterior sample

library(rjags)
## Loading required package: coda
## Linked to JAGS 4.1.0
## Loaded modules: basemod,bugs
library(bayesplot)
## This is bayesplot version 1.1.0
n <- length(slbm[,1])
model <- jags.model(textConnection(model_string), 
                    data = list(R=R,S=S,D=D,M=M,n=n),inits=list(.RNG.name= "base::Wichmann-Hill",.RNG.seed= 12341))
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 26
##    Unobserved stochastic nodes: 5
##    Total graph size: 181
## 
## Initializing model
update(model, 10000, progress.bar="none"); # Burnin for 10000 samples

num.iter <- 20000
samp <- coda.samples(model, 
                     variable.names=c("beta","sigma"), 
                     n.iter=num.iter, progress.bar="none")

summary(samp)
## 
## Iterations = 10001:30000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 20000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##              Mean        SD  Naive SE Time-series SE
## beta[1]   20.9494  99.75493 0.7053739      0.7378177
## beta[2]   53.2238  99.64124 0.7045700      0.8491243
## beta[3]   38.6979  98.79655 0.6985971      0.7893841
## beta[4]    0.1678   0.01544 0.0001092      0.0001439
## sigma   2193.0419 332.89736 2.3539398      2.5253523
## 
## 2. Quantiles for each variable:
## 
##             2.5%       25%       50%       75%     97.5%
## beta[1] -175.809  -45.2761   20.3851   88.8852  217.2929
## beta[2] -142.302  -13.5470   52.5946  121.2410  246.0507
## beta[3] -154.500  -27.9163   38.5901  104.8859  232.7657
## beta[4]    0.138    0.1575    0.1677    0.1779    0.1984
## sigma   1659.235 1956.5066 2152.5594 2386.1029 2946.0285
plot(samp)

mcmc_hist(samp, par=c("beta[1]", "beta[2]","beta[3]","sigma"))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

samp.dat <- as.data.frame(mcmc.list(samp)[[1]])
str(samp.dat)
## 'data.frame':    20000 obs. of  5 variables:
##  $ beta[1]: num  48.8 -131.6 -12.5 117 -10.4 ...
##  $ beta[2]: num  -178.3 17.7 41 -136 36.5 ...
##  $ beta[3]: num  95.3 -20.6 148.9 164.1 105.6 ...
##  $ beta[4]: num  0.188 0.185 0.167 0.169 0.181 ...
##  $ sigma  : num  2442 2365 2120 1880 2126 ...
B1 <- samp.dat$`beta[1]`
B2 <- samp.dat$`beta[2]`
B3 <- samp.dat$`beta[3]`
B4 <- samp.dat$`beta[4]`
SS <- samp.dat$sigma

RR <- c()
for (i in 1:10000) {
RR[i] <- B1[i] + B2[i]*2 + B3[i]*1.5 + B4[i]*9000
}

hist(RR,col = "red",main = "95% credible KN-11 range distribution",xlab = "Range, km")

SE <- SS/sqrt(num.iter)

hist(SE,col = "blue",main = "95% credible interval for Standard Error",xlab = "SE, km")

quantile(RR,c(0.025,0.975))
##     2.5%    97.5% 
## 1244.434 2132.004
quantile(RR+2*SE,c(0.025,0.975))
##     2.5%    97.5% 
## 1276.870 2163.935
quantile(RR-2*SE,c(0.025,0.975))
##     2.5%    97.5% 
## 1212.814 2102.162

Applying Bayesian regression we have got 95% credible interval for KN-11 Range as \(P(1244{\le}R_{KN11}{\le}2132)=0.95\) or \(1244{\le}R_{KN11}{\le}2132\) +(-) \(2*SE\) i.e. \(P(1213{\le}R_{KN11}{\le}2164)=0.95\) 8.

Reliability

Though very little is known about conducted tests of KN-11 we can try to do some estimation of reliability 9. So we have 12 trials with 8 successful and 4 failed tests.

Frequentest case

We postulate null hypothesis about KN-11 reliability as \(H_0: P_{KN11}=0.85\) and alternative hypothesis as \(H_1: P_{KN11}{\ne}0.85\)

binom.test(8,12,0.85)
## 
##  Exact binomial test
## 
## data:  8 and 12
## number of successes = 8, number of trials = 12, p-value = 0.09221
## alternative hypothesis: true probability of success is not equal to 0.85
## 95 percent confidence interval:
##  0.3488755 0.9007539
## sample estimates:
## probability of success 
##              0.6666667

As the p-value = 0.09221 we can’t reject null hypothesis \(H_0: P_{KN11}=0.85\). Not bad at all!

That is we can be 95% confident that \(0.35{\le}P_{KN11}{\le}0.9\) while current point estimate is close to 0.66.

What about power of test?

library(binom)
binom.power(p.alt = 0.5,n = 12,p = 0.85,alpha = 0.05)
##     alpha 
## 0.4884767
binom.plot(n=12,type="levelplot")
## Loading required package: lattice

The power of test \(1-\beta=0.488\) i.e. we can’t accept this estimation as reliable.

Baesian case

We try Bayesian binomial test for our data converting confidence into credibility.

model_string_2 <- "model{

# Likelihood
Y ~ dbinom(theta,n)

# Prior
theta ~ dbeta(a, b)
}"

Y <- 8
n <- 12
a <- 1
b <- 1
model <- jags.model(textConnection(model_string_2),                    inits=list(.RNG.name="base::Wichmann-Hill",.RNG.seed= 12341),
                    data = list(Y=Y,n=n,a=a,b=b))
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 1
##    Unobserved stochastic nodes: 1
##    Total graph size: 5
## 
## Initializing model
update(model, 10000, progress.bar="none"); # Burnin for 10000 samples

samp <- coda.samples(model, 
                     variable.names=c("theta"), 
                     n.iter=20000, progress.bar="none")

summary(samp)
## 
## Iterations = 10001:30000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 20000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##           Mean             SD       Naive SE Time-series SE 
##      0.6420588      0.1242168      0.0008783      0.0008783 
## 
## 2. Quantiles for each variable:
## 
##   2.5%    25%    50%    75%  97.5% 
## 0.3815 0.5578 0.6486 0.7322 0.8610
mcmc_hist(samp,pars = "theta",binwidth = 0.05)

mcmc_intervals(samp,pars = "theta")

mcmc_combo(samp,pars = "theta")

samp.dat <- as.data.frame(mcmc.list(samp)[[1]])
mean(samp.dat$theta>0.85)
## [1] 0.0346
mean(samp.dat$theta<0.85)
## [1] 0.9654

Thanks to Bayes we have got 95% credible interval \(0.3815{\le}P_{KN11}{\le}0.8610\) where \((P_{KN11}>0.85)=0.0346\) and \((P_{KN11}<0.85)=0.9654\).

Conclusions

Conclusions

  1. We take as null hypothesis \(H_0:M_{KN11}=9000\) kg i.e. we can’t reject this fact of minimum possible mass for SLBM.
  2. Applying frequentest approach with linear regression we have got 95% confident interval for KN-11 Range as \(3145{\le}R_{KN11}{\le}5517\).
  3. Applying Bayesian regression we have got 95% credible interval for KN-11 Range as \(P(1213{\le}R_{KN11}{\le}2164)=0.95\).
  4. Bayesian regression provides more modest but more robust and credible results for KN-11 Range.
  5. Bayesian 95% credible interval for KN-11 reliability looks like \(0.3815{\le}P_{KN11}{\le}0.8610\) where \((P_{KN11}>0.85)=0.0346\) and \((P_{KN11}<0.85)=0.9654\).
  6. Anyway North Korean missile threat is not a fake story 10.

References


  1. https://www.nytimes.com/2017/04/15/world/asia/north-korea-missiles-pyongyang-kim-jong-un.html?hp&action=click&pgtype=Homepage&clickSource=story-heading&module=first-column-region&region=top-news&WT.nav=top-news&_r=1

  2. http://www.bbc.com/news/world-asia-17399847

  3. https://en.wikipedia.org/wiki/Submarine-launched_ballistic_missile)

  4. https://en.wikipedia.org/wiki/Power_transform

  5. https://missilethreat.csis.org/missile/kn-11/

  6. https://en.wikipedia.org/wiki/Power_transform

  7. http://mcmc-jags.sourceforge.net/

  8. https://en.wikipedia.org/wiki/Standard_error

  9. https://en.wikipedia.org/wiki/Pukkuksong-1

  10. http://www.bbc.com/news/world-asia-39628223