1 About missile

Lockheed Martin successfully tested its next-generation long-range missile designed for the Army’s Precision Strike Missile (PrSM) program at White Sands Missile Range, NM. All objectives were achieved in a flawless second performance following the missile’s inaugural flight last December.

https://news.lockheedmartin.com/2020-03-10-Lockheed-Martins-PrSM-Demonstrates-Pinpoint-Accuracy-in-Second-U-S-Army-Flight-Test#assets_all

Lockheed Martin’s PrSM test

For more information about PRSM see https://en.wikipedia.org/wiki/MGM-140_ATACMS

2 SLBM data

load("slbm.dat")
library(DT)
datatable(slbm)

library(ggplot2)
library(GGally)
#Here we use function from https://www.r-bloggers.com/multiple-regression-lines-in-ggpairs/
my_fn <- function(data, mapping, ...){
  p <- ggplot(data = data, mapping = mapping) + 
    geom_point() + 
    geom_smooth(method=loess, fill="red", color="red", ...) +
    geom_smooth(method=lm, fill="blue", color="blue", ...)
  p
}

g = ggpairs(slbm,columns = 2:6, lower = list(continuous = my_fn))
g

#heatmap(cor(slbm))

3 Bayesian Linear Regression Model

library(rstanarm)
library(bayesplot)

options(mc.cores = parallel::detectCores())
load("slbm.dat")


fit.slbm.bs<-stan_glm(sqrt(R)~S+D+L+W+log(M)+log(P),
                        data=slbm,chains=2,iter=5000,seed=12345)
fit.slbm.bs

## stan_glm
##  family:       gaussian [identity]
##  formula:      sqrt(R) ~ S + D + L + W + log(M) + log(P)
##  observations: 26
##  predictors:   7
## ------
##             Median MAD_SD
## (Intercept) -34.7  105.0 
## S             9.6    5.1 
## D            37.9   20.0 
## L             0.9    1.7 
## W             3.3    5.6 
## log(M)        5.7   15.0 
## log(P)       -7.5    6.0 
## 
## Auxiliary parameter(s):
##       Median MAD_SD
## sigma 10.2    1.7  
## 
## Sample avg. posterior predictive distribution of y:
##          Median MAD_SD
## mean_PPD 69.1    2.8  
## 
## ------
## * For help interpreting the printed output see ?print.stanreg
## * For info on the priors used see ?prior_summary.stanreg

plot(fit.slbm.bs)

posterior_vs_prior(fit.slbm.bs)

fit.slbm.bs2<-as.array(fit.slbm.bs)
mcmc_hist(fit.slbm.bs2)

mcmc_trace(fit.slbm.bs2)

summary(fit.slbm.bs)

## 
## Model Info:
## 
##  function:     stan_glm
##  family:       gaussian [identity]
##  formula:      sqrt(R) ~ S + D + L + W + log(M) + log(P)
##  algorithm:    sampling
##  priors:       see help('prior_summary')
##  sample:       5000 (posterior sample size)
##  observations: 26
##  predictors:   7
## 
## Estimates:
##                 mean   sd     2.5%   25%    50%    75%    97.5%
## (Intercept)    -36.0  109.4 -251.9 -107.5  -34.7   33.8  183.5 
## S                9.7    5.3   -1.0    6.2    9.6   13.0   20.0 
## D               37.5   20.3   -2.9   24.2   37.9   51.1   76.8 
## L                0.9    1.8   -2.6   -0.2    0.9    2.1    4.6 
## W                3.3    5.8   -8.1   -0.6    3.3    7.0   14.9 
## log(M)           6.0   15.6  -25.2   -4.2    5.7   16.0   37.4 
## log(P)          -7.5    6.3  -20.0  -11.6   -7.5   -3.4    4.9 
## sigma           10.5    1.9    7.6    9.2   10.2   11.5   14.9 
## mean_PPD        69.0    3.0   63.0   67.2   69.1   70.9   75.0 
## log-posterior -110.6    2.5 -116.6 -111.9 -110.1 -108.8 -107.0 
## 
## Diagnostics:
##               mcse Rhat n_eff
## (Intercept)   2.3  1.0  2200 
## S             0.1  1.0  3384 
## D             0.4  1.0  2253 
## L             0.0  1.0  3073 
## W             0.1  1.0  3107 
## log(M)        0.3  1.0  2064 
## log(P)        0.1  1.0  3589 
## sigma         0.0  1.0  2563 
## mean_PPD      0.0  1.0  4499 
## log-posterior 0.1  1.0  1474 
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

4 PRSM Operational Range

slbm.pp <- posterior_predict(fit.slbm.bs,
          newdata = data.frame(L=4,D=0.61,S=1,W=1,
                               M=1670,P=225),seed=12345)

Range.km <- slbm.pp^2
Range.km <- Range.km[1:5000,]
quantile(Range.km,probs = c(0.25,0.5,0.75))

##       25%       50%       75% 
##  39.92534 175.50036 517.08411

5 Flight test data

Here we use open source historical record data set of ongoing RPSM flight test program.

5.1 Preparing data

All in all we got the final file with test results (true,false) only.

prsm_test_data <- c(TRUE,TRUE)

6 Reliability

6.1 Point Estimate

summary(prsm_test_data)

##    Mode    TRUE 
## logical       2

NS <- sum(prsm_test_data)
NALL <-length(prsm_test_data) 
prop_success <- NS/NALL
prop_success

## [1] 1

As we see PRSM has got 100% reliability according to the test results during ongoing flight test program (2019-2020). Can we make out something else applying current data? Let’s see!

6.2 Confidence interval

binom.test(NS,NALL,0.95)

## 
##  Exact binomial test
## 
## data:  NS and NALL
## number of successes = 2, number of trials = 2, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.95
## 95 percent confidence interval:
##  0.1581139 1.0000000
## sample estimates:
## probability of success 
##                      1

One can see that PRSM has made great success during ongoing flight test program so far.

6.3 Proportion of success model

We apply PRSM flight test data for the proportion of success as function of the current test event using beta prior as Bayesian likelihood.

#The prop_model function - Rasmus Bååth R code

# This function takes a number of successes and failuers coded as a TRUE/FALSE
# or 0/1 vector. This should be given as the data argument.
# The result is a visualization of the how a Beta-Binomial
# model gradualy learns the underlying proportion of successes 
# using this data. The function also returns a sample from the
# posterior distribution that can be further manipulated and inspected.
# The default prior is a Beta(1,1) distribution, but this can be set using the
# prior_prop argument.

# Make sure the packages tidyverse and ggridges are installed, otherwise run:
# install.packages(c("tidyverse", "ggridges"))

# Example usage:
# data <- c(TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE)
# prop_model(data)
prop_model <- function(data = c(), prior_prop = c(1, 1), n_draws = 10000,
                       gr_name="Proportion graph") {
  library(tidyverse)
  
  data <- as.logical(data)
  # data_indices decides what densities to plot between the prior and the posterior
  # For 20 datapoints and less we're plotting all of them.
  data_indices <- round(seq(0, length(data), length.out = min(length(data) + 1, 40)))
  
  # dens_curves will be a data frame with the x & y coordinates for the 
  # denities to plot where x = proportion_success and y = probability
  proportion_success <- c(0, seq(0, 1, length.out = 100), 1)
  dens_curves <- map_dfr(data_indices, function(i) {
    value <- ifelse(i == 0, "Prior", ifelse(data[i], "Success", "Failure"))
    label <- paste0("n=", i)
    probability <- dbeta(proportion_success,
                         prior_prop[1] + sum(data[seq_len(i)]),
                         prior_prop[2] + sum(!data[seq_len(i)]))
    probability <- probability / max(probability)
    data_frame(value, label, proportion_success, probability)
  })
  # Turning label and value into factors with the right ordering for the plot
  dens_curves$label <- fct_rev(factor(dens_curves$label, levels =  paste0("n=", data_indices )))
  dens_curves$value <- factor(dens_curves$value, levels = c("Prior", "Success", "Failure"))
  
  graph_label <- paste("Prior likelihood distribution Beta(a =", 
                       as.character(prior_prop[1]),", b =",
                       as.character(prior_prop[2]),")") 
  
  p <- ggplot(dens_curves, aes(x = proportion_success, y = label,
                               height = probability, fill = value)) +
    ggridges::geom_density_ridges(stat="identity", color = "white", alpha = 0.8,
                                  panel_scaling = TRUE, size = 1) +
    scale_y_discrete("", expand = c(0.01, 0)) +
    scale_x_continuous("Proportion of success") +
    scale_fill_manual(values = hcl(120 * 2:0 + 15, 100, 65), name = "", drop = FALSE,
                      labels =  c("Prior   ", "Success   ", "Failure   ")) +
    ggtitle(paste0(gr_name, ": ", sum(data),  " successes, ", sum(!data), " failures"),
            subtitle = graph_label) +
    labs(caption = "based on Rasmus Bååth R code") +
    theme_light() +
    theme(legend.position = "top")
  print(p)
  
  # Returning a sample from the posterior distribution that can be further 
  # manipulated and inspected
  posterior_sample <- rbeta(n_draws, prior_prop[1] + sum(data), prior_prop[2] + sum(!data))
  invisible(posterior_sample)
}

6.3.1 Uniform prior Beta (1,1)

prop_model(data = prsm_test_data,
           prior_prop = c(1,1),gr_name = "PRSM 2019-2020 test events")

6.3.2 Gaussian prior Beta (2,2)

prop_model(data = prsm_test_data,
           prior_prop = c(2,2),gr_name = "PRSM 2019-2020 test events")

6.4 Bayesian STAN beta-Bernoulli model

See relevant math stuff here:

https://en.wikipedia.org/wiki/Bayesian_inference https://en.wikipedia.org/wiki/Beta_distribution https://en.wikipedia.org/wiki/Bernoulli_distribution

library(rstan)

## Loading required package: StanHeaders

## rstan (Version 2.19.2, GitRev: 2e1f913d3ca3)

## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)

## 
## Attaching package: 'rstan'

## The following object is masked from 'package:tidyr':
## 
##     extract

library(rstan)

options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)
y <- as.integer(prsm_test_data)
N <- length(y)
data_list <- list(N=N,y=y,a=2,b=2)
fit.ber <- stan(file = "stan_code.stan",data = data_list,chains = 2,verbose = FALSE,
                iter = 5000,warmup = 1000,seed=12345)
stan_model(file = "stan_code.stan")

## S4 class stanmodel 'stan_code' coded as follows:
##    
##    data { 
##      int<lower=0> N; 
##      int<lower=0,upper=1> y[N];
##      
##      real a;
##      real b;
##    } 
##    
##    transformed data{
##      int<lower=0,upper=1> yy[N];
##      for (k in 1:N)
##      yy[k]=abs(1-y[k]);
##    }
##    
##    
##    parameters {
##      real<lower=0,upper=1> theta[N];
##    } 
##    model {
##      for (j in 1:N){
##      theta[j] ~ beta(a+sum(y[1:j]),b+sum(yy[1:j]));
##          for (n in 1:j) //
##          y[j] ~ bernoulli(theta[j]);
##      }
##    }
##    
##    generated quantities {
##      real<lower=0,upper=1> pp[N]; 
##      for (j in 1:N)
##      pp[j]=beta_rng(a+sum(y[1:j]),b+sum(yy[1:j]));
##    }
##

rstan::traceplot(fit.ber)

6.4.1 Posterior sample with Beta(2,2)

s1 <- as.array(fit.ber)
pp_par <- paste("pp[",1:NALL,"]",sep = "")
pp_par2 <- paste("theta[",1:NALL,"]",sep = "")
bayesplot::mcmc_intervals(x = s1,pars = pp_par)

bayesplot::mcmc_intervals(x = s1,pars = pp_par2)

6.4.2 Reliability crystall ball magic

last_par_pp<- paste("pp[",NALL,"]",sep = "")
last_par_theta<- paste("theta[",NALL,"]",sep = "")
bayesplot::mcmc_hist(x = s1,pars = c(last_par_pp,last_par_theta))

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

plot(fit.ber,pars = c(last_par_pp,last_par_theta))

## ci_level: 0.8 (80% intervals)

## outer_level: 0.95 (95% intervals)

print(fit.ber,pars = c(last_par_pp,last_par_theta))

## Inference for Stan model: stan_code.
## 2 chains, each with iter=5000; warmup=1000; thin=1; 
## post-warmup draws per chain=4000, total post-warmup draws=8000.
## 
##          mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
## pp[2]    0.66       0 0.18 0.28 0.54 0.68 0.81  0.95  8229    1
## theta[2] 0.75       0 0.14 0.43 0.66 0.77 0.86  0.96  5645    1
## 
## Samples were drawn using NUTS(diag_e) at Thu Mar 12 17:25:58 2020.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

Here is a moment of truth - we see significant difference between two distributions of the posterior samples: theta, produced by the data through Bernoulli experiment with the prior Beta(a,b) distribution as Bayesian likelihood and pp, produced by Beta(a,b) random distribution only. The last one is similar to the proportion model above. Both samples use Gaussian like Beta(2,2) distribution. Now we are ready to make the final conclusion about PRSM reliability.

7 Conclusion

Bayesian linear regression model based on SLBM data given PRSM data produced operational range as \(38.52\le R\le 513.22\) with the mean value \(R=177.47\).
The point estimate of PRSM reliability looks like \(P=1\) in accordance with the data we have so far.
The 95% confidence interval for PRSM reliability looks like \(0.16\le P \le1.0\).
Proportion of succes model makes our belief in PRSM reliability stronger and stronger from one event to another no matter what prior we use: Beta(1,1) or Beta(2,2).
All the estimates made above are based on the assumption that PRSM flight test data are correct in terms of Success and Failure events.

PRSM flight test story

Alexander Levakov, Senior Research Fellow, Ph.D

March, 2020