Hwasong-12

Hwasong-12

1 About missile

The Hwasong-12 (KN17 by US) first appeared in North Korea’s “Day of the Sun” parade on April 15, 2017. It was speculated at the time that it could be a shortened version of North Korea’s untested KN-08 ICBM. The missile was carried on a vehicle previously associated with the BM-25 Musudan IRBM. Evidence from the missile’s first test launch on May 15, however, indicates a single-stage design.

For more see:

https://missilethreat.csis.org/missile/hwasong-12/

https://en.wikipedia.org/wiki/Hwasong-12

https://www.youtube.com/watch?v=ct-0YyoSOsU

2 Missile Test Events Data

We can import Hwasong-12 test events from the open source WIKI data like this, applying substitution FAILURE=FALSE and SUCCESS=TRUE.

library(rvest)
## Loading required package: xml2
library(htmlTable)
library(DT)
url <- "https://en.wikipedia.org/wiki/Hwasong-12"

KN17_test <- url %>%
  read_html()%>%
  html_nodes(xpath='//*[@id="mw-content-text"]/div/table[3]') %>%
  html_table()
KN17_test_data <- data.frame(cbind(KN17_test[[1]]$Date,KN17_test[[1]]$Outcome))
names(KN17_test_data)=c("Date","Result")
KN17_test_data[,2] <- ifelse(KN17_test_data$Result=="Success",TRUE,FALSE)
datatable(KN17_test_data,caption = "Hwasong-12 test events")

3 Reliability

3.1 Point Estimate

summary(KN17_test_data$Result)
##    Mode   FALSE    TRUE 
## logical       3       3
NS <- sum(KN17_test_data$Result)
NALL <-length(KN17_test_data$Result) 
prop_success <- NS/NALL
prop_success
## [1] 0.5

As we see Hwasong-12 has got 50% reliability according to the test results to date.

Can we make out something else applying current data? Let’s see!

3.2 Confidence Interval

binom.test(NS,NALL,0.8)
## 
##  Exact binomial test
## 
## data:  NS and NALL
## number of successes = 3, number of trials = 6, p-value = 0.09888
## alternative hypothesis: true probability of success is not equal to 0.8
## 95 percent confidence interval:
##  0.1181172 0.8818828
## sample estimates:
## probability of success 
##                    0.5

One can see that Hwasong-12 reliability 95% confidence interval can be estimated as \(0.12\le P(KN17)\le0.88\). Maybe KIM JONG UN is satisfied by this interval? Let’s see why.

3.3 Proportion of success model

We apply KN17 test data for the proportion of success as function of the current test event using beta prior as Bayesian likelihood.

#The prop_model function - Rasmus Bååth R code

# This function takes a number of successes and failuers coded as a TRUE/FALSE
# or 0/1 vector. This should be given as the data argument.
# The result is a visualization of the how a Beta-Binomial
# model gradualy learns the underlying proportion of successes 
# using this data. The function also returns a sample from the
# posterior distribution that can be further manipulated and inspected.
# The default prior is a Beta(1,1) distribution, but this can be set using the
# prior_prop argument.

# Make sure the packages tidyverse and ggridges are installed, otherwise run:
# install.packages(c("tidyverse", "ggridges"))

# Example usage:
# data <- c(TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE)
# prop_model(data)
prop_model <- function(data = c(), prior_prop = c(1, 1), n_draws = 10000,
                       gr_name="Proportion graph") {
  library(tidyverse)
 
  data <- as.logical(data)
  # data_indices decides what densities to plot between the prior and the posterior
  # For 20 datapoints and less we're plotting all of them.
  data_indices <- round(seq(0, length(data), length.out = min(length(data) + 1, 40)))
  
  # dens_curves will be a data frame with the x & y coordinates for the 
  # denities to plot where x = proportion_success and y = probability
  proportion_success <- c(0, seq(0, 1, length.out = 100), 1)
  dens_curves <- map_dfr(data_indices, function(i) {
    value <- ifelse(i == 0, "Prior", ifelse(data[i], "Success", "Failure"))
    label <- paste0("n=", i)
    probability <- dbeta(proportion_success,
                         prior_prop[1] + sum(data[seq_len(i)]),
                         prior_prop[2] + sum(!data[seq_len(i)]))
    probability <- probability / max(probability)
    data_frame(value, label, proportion_success, probability)
  })
  # Turning label and value into factors with the right ordering for the plot
  dens_curves$label <- fct_rev(factor(dens_curves$label, levels =  paste0("n=", data_indices )))
  dens_curves$value <- factor(dens_curves$value, levels = c("Prior", "Success", "Failure"))
  
  graph_label <- paste("Prior likelihood distribution Beta(a =", 
                       as.character(prior_prop[1]),", b =",
                                    as.character(prior_prop[2]),")") 
  
  p <- ggplot(dens_curves, aes(x = proportion_success, y = label,
                               height = probability, fill = value)) +
    ggridges::geom_density_ridges(stat="identity", color = "white", alpha = 0.8,
                                  panel_scaling = TRUE, size = 1) +
    scale_y_discrete("", expand = c(0.01, 0)) +
    scale_x_continuous("Proportion of success") +
    scale_fill_manual(values = hcl(120 * 2:0 + 15, 100, 65), name = "", drop = FALSE,
                      labels =  c("Prior   ", "Success   ", "Failure   ")) +
    ggtitle(paste0(gr_name, ": ", sum(data),  " successes, ", sum(!data), " failures"),
            subtitle = graph_label) +
    labs(caption = "based on Rasmus Bååth R code") +
    theme_light() +
    theme(legend.position = "top")
  print(p)
  
  # Returning a sample from the posterior distribution that can be further 
  # manipulated and inspected
  posterior_sample <- rbeta(n_draws, prior_prop[1] + sum(data), prior_prop[2] + sum(!data))
  invisible(posterior_sample)
}

3.4 Proportion test results

3.4.1 Uniform prior Beta(1,1)

prop_model(data = KN17_test_data$Result,prior_prop = c(1,1),gr_name = "KN17 reliabilty")

3.4.2 Gaussian prior Beta(2,2)

prop_model(data = KN17_test_data$Result,prior_prop = c(2,2),gr_name = "KN17 reliability")

As we can see it makes no difference.

3.5 Bayesian STAN beta-bernouilli model

See relevant math stuff here:

  1. https://en.wikipedia.org/wiki/Bayesian_inference
  2. https://en.wikipedia.org/wiki/Beta_distribution
  3. https://en.wikipedia.org/wiki/Bernoulli_distribution
## 
## data { 
##   int<lower=0> N; 
##   int<lower=0,upper=1> y[N];
##   
##   real a;
##   real b;
## } 
## 
## transformed data{
##   int<lower=0,upper=1> yy[N];
##   for (k in 1:N)
##   yy[k]=abs(1-y[k]);
## }
## 
## 
## parameters {
##   real<lower=0,upper=1> theta[N];
## } 
## model {
##   for (j in 1:N){
##   theta[j] ~ beta(a+sum(y[1:j]),b+sum(yy[1:j]));
##       for (n in 1:j) //
##       y[j] ~ bernoulli(theta[j]);
##   }
## }
## 
## generated quantities {
##   real<lower=0,upper=1> pp[N]; 
##   for (j in 1:N)
##   pp[j]=beta_rng(a+sum(y[1:j]),b+sum(yy[1:j]));
## }

3.5.1 Posterior sample with Beta(2,2)

y <- as.integer(KN17_test_data$Result)
N <- length(y)
data_list <- list(N=N,y=y,a=2,b=2)
fit.ber <- stan(model_code = stan_code,data = data_list,chains = 2,verbose = FALSE,
            iter = 5000,warmup = 1000,seed=12345)

rstan::traceplot(fit.ber)

s1 <- as.array(fit.ber)
pp_par <- paste("pp[",1:NALL,"]",sep = "")
pp_par2 <- paste("theta[",1:NALL,"]",sep = "")
bayesplot::mcmc_intervals(x = s1,pars = pp_par)

bayesplot::mcmc_intervals(x = s1,pars = pp_par2)

3.6 Reliability crystall ball magic

last_par_pp<- paste("pp[",NALL,"]",sep = "")
last_par_theta<- paste("theta[",NALL,"]",sep = "")
bayesplot::mcmc_hist(x = s1,pars = c(last_par_pp,last_par_theta))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

plot(fit.ber,pars = c(last_par_pp,last_par_theta))
## ci_level: 0.8 (80% intervals)
## outer_level: 0.95 (95% intervals)

print(fit.ber,pars = c(last_par_pp,last_par_theta))
## Inference for Stan model: 8110aaf154812cced3e43a9d64740f79.
## 2 chains, each with iter=5000; warmup=1000; thin=1; 
## post-warmup draws per chain=4000, total post-warmup draws=8000.
## 
##          mean se_mean   sd 2.5%  25% 50%  75% 97.5% n_eff Rhat
## pp[6]    0.50       0 0.15 0.21 0.39 0.5 0.61  0.79  7609    1
## theta[6] 0.69       0 0.11 0.46 0.62 0.7 0.77  0.88  8000    1
## 
## Samples were drawn using NUTS(diag_e) at Fri Mar 22 15:08:44 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

Here is a moment of truth - we see significant difference between two distributions of the posterior samples: \(theta\), produced by the data through Bernoulli experiment with the prior Beta(a,b) distribution as Bayesian likelihood and \(pp\), produced by Beta(a,b) random distribution only. The last one is similar to the proportion model above. Both samples use Gaussian like \(Beta(2,2)\) distribution. Now we are ready to make the final conclusion about Hwasong-12 (KN17) reliability.

4 Operational range

Here we apply SLBM data for estimating KN17 operational range. ##SLBM Data

load("slbm.dat")
library(DT)
datatable(slbm)
library(ggplot2)
library(GGally)
## 
## Attaching package: 'GGally'
## The following object is masked from 'package:dplyr':
## 
##     nasa
#Here we use function from https://www.r-bloggers.com/multiple-regression-lines-in-ggpairs/
my_fn <- function(data, mapping, ...){
  p <- ggplot(data = data, mapping = mapping) + 
    geom_point() + 
    geom_smooth(method=loess, fill="red", color="red", ...) +
    geom_smooth(method=lm, fill="blue", color="blue", ...)
  p
}

g = ggpairs(slbm,columns = 2:6, lower = list(continuous = my_fn))
g

4.1 Bayesian Linear Regression Model

library(rstanarm)
## Loading required package: Rcpp
## rstanarm (Version 2.17.3, packaged: 2018-02-17 05:11:16 UTC)
## - Do not expect the default priors to remain the same in future rstanarm versions.
## Thus, R scripts should specify priors explicitly, even if they are just the defaults.
## - For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores())
## - Plotting theme set to bayesplot::theme_default().
library(bayesplot)

options(mc.cores = parallel::detectCores())
set.seed(12345)
fit.slbm.bs<-stan_glm(sqrt(R)~S+D+L+W+log(M)+log(P),
                        data=slbm,chains=2,iter=10000)
fit.slbm.bs
## stan_glm
##  family:       gaussian [identity]
##  formula:      sqrt(R) ~ S + D + L + W + log(M) + log(P)
##  observations: 26
##  predictors:   7
## ------
##             Median MAD_SD
## (Intercept) -43.1  105.2 
## S             9.5    5.2 
## D            36.7   19.5 
## L             0.8    1.7 
## W             3.3    5.6 
## log(M)        6.9   15.0 
## log(P)       -7.5    6.0 
## sigma        10.2    1.7 
## 
## Sample avg. posterior predictive distribution of y:
##          Median MAD_SD
## mean_PPD 69.0    2.8  
## 
## ------
## For info on the priors used see help('prior_summary.stanreg').
plot(fit.slbm.bs)

posterior_vs_prior(fit.slbm.bs)
## 
## Drawing from prior...

fit.slbm.bs2<-as.array(fit.slbm.bs)
mcmc_hist(fit.slbm.bs2)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

mcmc_trace(fit.slbm.bs2)

summary(fit.slbm.bs)
## 
## Model Info:
## 
##  function:     stan_glm
##  family:       gaussian [identity]
##  formula:      sqrt(R) ~ S + D + L + W + log(M) + log(P)
##  algorithm:    sampling
##  priors:       see help('prior_summary')
##  sample:       10000 (posterior sample size)
##  observations: 26
##  predictors:   7
## 
## Estimates:
##                 mean   sd     2.5%   25%    50%    75%    97.5%
## (Intercept)    -41.8  109.6 -259.0 -112.4  -43.1   29.2  176.0 
## S                9.5    5.3   -1.0    6.0    9.5   13.0   20.1 
## D               36.8   20.1   -3.6   23.5   36.7   49.8   75.9 
## L                0.8    1.8   -2.7   -0.3    0.8    2.0    4.4 
## W                3.2    5.8   -8.2   -0.5    3.3    6.9   14.6 
## log(M)           6.8   15.6  -24.1   -3.2    6.9   17.0   37.6 
## log(P)          -7.4    6.3  -19.9  -11.5   -7.5   -3.4    5.0 
## sigma           10.5    1.8    7.6    9.2   10.2   11.5   14.6 
## mean_PPD        69.0    2.9   63.3   67.1   69.0   70.9   74.9 
## log-posterior -110.5    2.4 -116.2 -111.8 -110.1 -108.8 -107.1 
## 
## Diagnostics:
##               mcse Rhat n_eff
## (Intercept)   1.7  1.0   4072
## S             0.1  1.0   5834
## D             0.3  1.0   4403
## L             0.0  1.0   5603
## W             0.1  1.0   6586
## log(M)        0.2  1.0   3928
## log(P)        0.1  1.0   6953
## sigma         0.0  1.0   5258
## mean_PPD      0.0  1.0  10000
## log-posterior 0.0  1.0   3008
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

4.2 KN17 Operational Range Estimation

Here we use open source data (see above) for the Bayesian linear regression model.

set.seed(12345)
slbm.pp <- rstanarm::posterior_predict(fit.slbm.bs,
          newdata = data.frame(L=16.5,D=1.5,S=1,W=1,
                               M=24700,P=500))

Range.km <- slbm.pp^2
Range.km <- Range.km[1:10000,]
quantile(Range.km,probs = c(0.1,0.5,0.9))
##      10%      50%      90% 
## 1862.303 3949.425 6758.593
ggplot(data=as.data.frame(Range.km), aes(Range.km)) + 
  geom_histogram(bins = 30,col="black",fill="green") + 
  geom_vline(xintercept = mean(Range.km), color = "red") + 
  geom_errorbarh(aes(y=-5, xmin=quantile(Range.km,0.1),
                     xmax=quantile(Range.km,0.9)), 
                 data=as.data.frame(Range.km), col="#0094EA", size=3) +
  ggtitle(label="Probability density of KN17 Range")

5 Conclusion

  1. The point estimate of KN17 reliability looks like \(P_{KN17}=0.5\) in accordance with the data we have got so far.
  2. The 95% confidence interval for KN17 reliability looks like \(0.12\le P_{KN17}\le0.88\).
  3. Proportion model makes our belief in KN17 reliability stronger and stronger from one event to another no matter what prior we use: \(Beta(1,1)\) or \(Beta(2,2)\).
  4. Bayesian estimate of the posterior sample for the KN17 reliability \(theta\) produced by Bernoulli experiment with data and \(Beta(2,2)\) prior as likelihood is significantly greater than \(Beta(2,2)\) random distribution for the proportion of success.
  5. We believe that true KN17 reliability to date looks like 95% credible interval \(0.46\le P_{KN17}\le0.88\) with the mean value \(M[P_{KN17}]=0.77\).
  6. All the reliability estimates made above are based on the assumption that KN17 flight test data are correct in terms of \(Success\) and \(Failure\) events.
  7. KN17 operational range according to the Bayesian linear regression model and some estimates of the missile data looks like \(P(1862\le R_{KN17}\le 6758)=0.8\) with the mean value \(M[R_{KN17}]=3949\) that is close to the NK and US official estimates.
  8. That is why KIM JONG UN is satisfied with these results making strong message to DONALD TRUMP for the future if NK-US deal would be canceled.