Pukkuksong-1/KN-11
Recently Pyongyang demonstrated for the whole world and especially for the USA and it’s allies the growing military power of North Korea 1. The most shocking element of the parade seemed to be three types of long-range ballistic missiles, one of them apparently new SLBM KN-11 - clear message for Donald Trump and his administration. Though the following test of unknown missile failed the threat is still more than real 2.
We want to estimate KN-11 Range (km), applying open source data on SLBM 3.
We use aggregated and transformed SLBM data 4 to explore the relationship between M - Mass (kg), R - Range (km), P - Payload (kg), S - Stage (1,2,3), D - Diameter (m), L - Length (m) and W - Type of Warhead (Single - 1, MIRV - 2) of submarine launched ballistic missile.
load(file = "slbm.dat")
attach(slbm)
slbm
## S M L D R P W
## 1 1 5400 10.40 0.58 150 975 1
## 2 1 13700 11.80 1.30 650 1597 1
## 3 1 19650 14.20 1.30 1420 1179 1
## 4 1 14200 8.89 1.50 2400 650 1
## 5 1 14200 8.89 1.50 3000 650 1
## 6 1 14200 9.65 1.50 3000 650 2
## 7 2 33300 13.00 1.80 7800 1100 1
## 8 2 33300 13.00 1.80 9100 1100 1
## 9 2 26900 10.60 1.54 3900 450 1
## 10 2 35300 14.10 1.80 6500 1600 2
## 11 2 35300 14.10 1.80 8000 1600 1
## 12 2 35300 14.10 1.80 6500 1600 2
## 13 3 90000 16.00 2.40 8300 2550 2
## 14 3 40300 14.80 1.90 8300 2300 2
## 15 3 40800 14.80 1.90 8300 2800 2
## 16 3 36800 11.50 2.00 9300 1150 2
## 17 2 13600 9.45 1.37 2800 500 1
## 18 2 16200 9.86 1.37 4630 760 2
## 19 2 29485 10.36 1.88 5600 2000 2
## 20 3 32000 10.36 1.88 7400 1360 2
## 21 3 57500 13.42 2.11 11000 2880 2
## 22 2 20000 10.70 1.50 3000 1360 1
## 23 2 19500 10.70 1.50 3200 1360 1
## 24 2 19950 10.40 1.50 3200 1000 1
## 25 2 14700 10.70 1.40 2500 600 1
## 26 2 23000 13.00 2.00 8000 700 2
Pukkuksong-1 test flight
First, we try to estimate Mass of KN-11 \(M\), applying dimensions \(D\) and \(L\) 5.
fit.slbm.1<-lm(M~D + L, data = slbm)
predict(fit.slbm.1,newdata=data.frame(D=1.5,L=9), interval="conf")
## fit lwr upr
## 1 14688.42 8668.755 20708.09
Note. We take as null hypothesis \(H_0:M_{KN11}=9000\) kg i.e. we can’t reject this fact of minimum possible mass for SLBM. Though alternative hypothesis maybe like this \(H_1: M_{KN11}>9000\). In fact KN-11 Mass is more likely to be \(9000{\le}M_{KN11}{\le}14000\). We can’t prove it. So we take \(H_0\).
Next, we estimate Range of KN-11. Box-Cox data transformation technique used to stabilize variance, make the data more normal distribution-like, improve the validity of measures of association such as the Pearson correlation between variables and for other data stabilization procedures 6.
fit.slbm.2<-lm(log(R)~S + D + M, data = slbm)
par(mfrow=c(2,2))
plot(fit.slbm.2)
exp(predict(fit.slbm.2,newdata = data.frame(S=2,D=1.5,M=9000),interval = "conf"))
## fit lwr upr
## 1 4165.705 3145.233 5517.269
We have got 95% confident interval for KN-11 Range as \(3145{\le}R_{KN11}{\le}5517\).
This time we’ll use JAGS framework for MCMC with Bayesian estimates of posterior 7.
model_string <- "model{
# Likelihood
for(i in 1:n){
R[i] ~ dnorm(mu[i],inv.var)
mu[i] <- beta[1] + beta[2]*S[i] + beta[3]*D[i] + beta[4]*M[i]
}
# Prior for beta
for(j in 1:4){
beta[j] ~ dnorm(0,0.0001)
}
# Prior for the inverse variance
inv.var ~ dgamma(0.001, 0.001)
sigma <- 1/sqrt(inv.var)
}"
library(rjags)
## Loading required package: coda
## Linked to JAGS 4.1.0
## Loaded modules: basemod,bugs
library(bayesplot)
## This is bayesplot version 1.1.0
n <- length(slbm[,1])
model <- jags.model(textConnection(model_string),
data = list(R=R,S=S,D=D,M=M,n=n),inits=list(.RNG.name= "base::Wichmann-Hill",.RNG.seed= 12341))
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 26
## Unobserved stochastic nodes: 5
## Total graph size: 181
##
## Initializing model
update(model, 10000, progress.bar="none"); # Burnin for 10000 samples
num.iter <- 20000
samp <- coda.samples(model,
variable.names=c("beta","sigma"),
n.iter=num.iter, progress.bar="none")
summary(samp)
##
## Iterations = 10001:30000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 20000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## beta[1] 20.9494 99.75493 0.7053739 0.7378177
## beta[2] 53.2238 99.64124 0.7045700 0.8491243
## beta[3] 38.6979 98.79655 0.6985971 0.7893841
## beta[4] 0.1678 0.01544 0.0001092 0.0001439
## sigma 2193.0419 332.89736 2.3539398 2.5253523
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## beta[1] -175.809 -45.2761 20.3851 88.8852 217.2929
## beta[2] -142.302 -13.5470 52.5946 121.2410 246.0507
## beta[3] -154.500 -27.9163 38.5901 104.8859 232.7657
## beta[4] 0.138 0.1575 0.1677 0.1779 0.1984
## sigma 1659.235 1956.5066 2152.5594 2386.1029 2946.0285
plot(samp)
mcmc_hist(samp, par=c("beta[1]", "beta[2]","beta[3]","sigma"))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
samp.dat <- as.data.frame(mcmc.list(samp)[[1]])
str(samp.dat)
## 'data.frame': 20000 obs. of 5 variables:
## $ beta[1]: num 48.8 -131.6 -12.5 117 -10.4 ...
## $ beta[2]: num -178.3 17.7 41 -136 36.5 ...
## $ beta[3]: num 95.3 -20.6 148.9 164.1 105.6 ...
## $ beta[4]: num 0.188 0.185 0.167 0.169 0.181 ...
## $ sigma : num 2442 2365 2120 1880 2126 ...
B1 <- samp.dat$`beta[1]`
B2 <- samp.dat$`beta[2]`
B3 <- samp.dat$`beta[3]`
B4 <- samp.dat$`beta[4]`
SS <- samp.dat$sigma
RR <- c()
for (i in 1:10000) {
RR[i] <- B1[i] + B2[i]*2 + B3[i]*1.5 + B4[i]*9000
}
hist(RR,col = "red",main = "95% credible KN-11 range distribution",xlab = "Range, km")
SE <- SS/sqrt(num.iter)
hist(SE,col = "blue",main = "95% credible interval for Standard Error",xlab = "SE, km")
quantile(RR,c(0.025,0.975))
## 2.5% 97.5%
## 1244.434 2132.004
quantile(RR+2*SE,c(0.025,0.975))
## 2.5% 97.5%
## 1276.870 2163.935
quantile(RR-2*SE,c(0.025,0.975))
## 2.5% 97.5%
## 1212.814 2102.162
Applying Bayesian regression we have got 95% credible interval for KN-11 Range as \(P(1244{\le}R_{KN11}{\le}2132)=0.95\) or \(1244{\le}R_{KN11}{\le}2132\) +(-) \(2*SE\) i.e. \(P(1213{\le}R_{KN11}{\le}2164)=0.95\) 8.
Though very little is known about conducted tests of KN-11 we can try to do some estimation of reliability 9. So we have 12 trials with 8 successful and 4 failed tests.
We postulate null hypothesis about KN-11 reliability as \(H_0: P_{KN11}=0.85\) and alternative hypothesis as \(H_1: P_{KN11}{\ne}0.85\)
binom.test(8,12,0.85)
##
## Exact binomial test
##
## data: 8 and 12
## number of successes = 8, number of trials = 12, p-value = 0.09221
## alternative hypothesis: true probability of success is not equal to 0.85
## 95 percent confidence interval:
## 0.3488755 0.9007539
## sample estimates:
## probability of success
## 0.6666667
As the p-value = 0.09221 we can’t reject null hypothesis \(H_0: P_{KN11}=0.85\). Not bad at all!
That is we can be 95% confident that \(0.35{\le}P_{KN11}{\le}0.9\) while current point estimate is close to 0.66.
What about power of test?
library(binom)
binom.power(p.alt = 0.5,n = 12,p = 0.85,alpha = 0.05)
## alpha
## 0.4884767
binom.plot(n=12,type="levelplot")
## Loading required package: lattice
The power of test \(1-\beta=0.488\) i.e. we can’t accept this estimation as reliable.
We try Bayesian binomial test for our data converting confidence into credibility.
model_string_2 <- "model{
# Likelihood
Y ~ dbinom(theta,n)
# Prior
theta ~ dbeta(a, b)
}"
Y <- 8
n <- 12
a <- 1
b <- 1
model <- jags.model(textConnection(model_string_2), inits=list(.RNG.name="base::Wichmann-Hill",.RNG.seed= 12341),
data = list(Y=Y,n=n,a=a,b=b))
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1
## Unobserved stochastic nodes: 1
## Total graph size: 5
##
## Initializing model
update(model, 10000, progress.bar="none"); # Burnin for 10000 samples
samp <- coda.samples(model,
variable.names=c("theta"),
n.iter=20000, progress.bar="none")
summary(samp)
##
## Iterations = 10001:30000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 20000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## 0.6420588 0.1242168 0.0008783 0.0008783
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## 0.3815 0.5578 0.6486 0.7322 0.8610
mcmc_hist(samp,pars = "theta",binwidth = 0.05)
mcmc_intervals(samp,pars = "theta")
mcmc_combo(samp,pars = "theta")
samp.dat <- as.data.frame(mcmc.list(samp)[[1]])
mean(samp.dat$theta>0.85)
## [1] 0.0346
mean(samp.dat$theta<0.85)
## [1] 0.9654
Thanks to Bayes we have got 95% credible interval \(0.3815{\le}P_{KN11}{\le}0.8610\) where \((P_{KN11}>0.85)=0.0346\) and \((P_{KN11}<0.85)=0.9654\).
Conclusions