Sejjil test

1 About Sejjil

The Sejjil missile is a two-stage, solid-propellant, medium-range ballistic missile domestically designed and built by Iran. Very little is known about this Sejjil MRBM though operational since 2012. The open source information is here: https://missilethreat.csis.org/missile/sejjil/.

2 Research goal

We don’t believe any speculations about this MRBM. We just try estimate Sejjil operational range, applying multiple linear regression both in frequentest and Bayesian paradigm on SLBM data.

3 SLBM data

We use aggregated and transformed SLBM data to explore the relationship between M - Mass (kg), R - Range (km), P - Payload (kg), S - Stage (1,2,3), D - Diameter (m), L - Length (m) and W - Type of Warhead (Single - 1, MIRV - 2) of submarine launched ballistic missile.

load("slbm.dat")
library(DT)
datatable(slbm)
#slbm

4 Correlations

Here we produce correlations matrix in graphical form.

library(ggplot2)
library(GGally)
#Here we use function from https://www.r-bloggers.com/multiple-regression-lines-in-ggpairs/
my_fn <- function(data, mapping, ...){
  p <- ggplot(data = data, mapping = mapping) + 
    geom_point() + 
    geom_smooth(method=loess, fill="red", color="red", ...) +
    geom_smooth(method=lm, fill="blue", color="blue", ...)
  p
}

g = ggpairs(slbm,columns = 2:6, lower = list(continuous = my_fn))
g

5 Linear regression

Here we produce classical multiple linear regression model in reduced form (two independent variables) and pass some routine tests.

5.1 Model and tests

fit.lm <- lm(log(M)~log(P)+sqrt(R),data = slbm)
summary(fit.lm)
## 
## Call:
## lm(formula = log(M) ~ log(P) + sqrt(R), data = slbm)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.43582 -0.09245 -0.05523  0.06307  0.65331 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 6.443759   0.653801   9.856 1.00e-09 ***
## log(P)      0.344329   0.100484   3.427   0.0023 ** 
## sqrt(R)     0.017668   0.002243   7.877 5.59e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.236 on 23 degrees of freedom
## Multiple R-squared:  0.8453, Adjusted R-squared:  0.8318 
## F-statistic: 62.82 on 2 and 23 DF,  p-value: 4.788e-10
hist(fit.lm$residuals,col = "blue",main = "Histogram of residuals",
     xlab = "Residuals")

par(mfrow=c(2,2))
plot(fit.lm)

Linear regression model formula looks like

\[{Mass} = e^{{B0 + B1\sqrt(Range)} + B2\ln(Payload)}\]

or

\[{Mass} = e^{{6.443759 + 0.01767\sqrt(Range)} + 0.34433\ln(Payload)}\]

5.2 Range point estimate

Here we apply previous linear model formula for Range given Mass and Payload.

\[Range={(\frac{6.443759+0.34433ln(Payload)-ln(Mass)}{0.01767})}^2\]

B0 <- as.numeric(fit.lm$coefficients[1])
B1 <- as.numeric(fit.lm$coefficients[2])
B2 <- as.numeric(fit.lm$coefficients[3])
Mass <- 23600
Payload <- 1500
Range <- ((B0 + B1*log(Payload) - log(Mass))/B2)^2
cat("Range =",Range)
## Range = 3926.368

6 Bayesian linear regression

6.1 Model and tests

fit.slbm.bs.4<-stan_glm(sqrt(R)~S+D+L+W+log(M)+log(P),data=slbm,chains=4,iter=10000,seed=12345)
fit.slbm.bs.4
## stan_glm
##  family:       gaussian [identity]
##  formula:      sqrt(R) ~ S + D + L + W + log(M) + log(P)
##  observations: 26
##  predictors:   7
## ------
##             Median MAD_SD
## (Intercept) -40.6  106.7 
## S             9.4    5.1 
## D            37.4   19.8 
## L             0.9    1.7 
## W             3.3    5.6 
## log(M)        6.5   15.0 
## log(P)       -7.3    6.0 
## 
## Auxiliary parameter(s):
##       Median MAD_SD
## sigma 10.2    1.7  
## 
## Sample avg. posterior predictive distribution of y:
##          Median MAD_SD
## mean_PPD 69.0    2.8  
## 
## ------
## * For help interpreting the printed output see ?print.stanreg
## * For info on the priors used see ?prior_summary.stanreg
par(mfrow=c(1,1))
plot(fit.slbm.bs.4)

posterior_vs_prior(fit.slbm.bs.4)
## 
## Drawing from prior...

fit.slbm.bs42<-as.array(fit.slbm.bs.4)
mcmc_hist(fit.slbm.bs42)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

mcmc_trace(fit.slbm.bs42)

summary(fit.slbm.bs.4)
## 
## Model Info:
## 
##  function:     stan_glm
##  family:       gaussian [identity]
##  formula:      sqrt(R) ~ S + D + L + W + log(M) + log(P)
##  algorithm:    sampling
##  priors:       see help('prior_summary')
##  sample:       20000 (posterior sample size)
##  observations: 26
##  predictors:   7
## 
## Estimates:
##                 mean   sd     2.5%   25%    50%    75%    97.5%
## (Intercept)    -40.0  109.1 -254.9 -112.3  -40.6   31.7  175.7 
## S                9.4    5.4   -1.2    5.9    9.4   12.9   20.2 
## D               37.4   20.3   -2.6   24.1   37.4   50.9   77.4 
## L                0.9    1.8   -2.7   -0.3    0.9    2.0    4.4 
## W                3.2    5.8   -8.5   -0.5    3.3    7.1   14.8 
## log(M)           6.4   15.6  -24.3   -3.6    6.5   16.6   37.0 
## log(P)          -7.3    6.2  -19.5  -11.3   -7.3   -3.3    5.0 
## sigma           10.4    1.8    7.6    9.2   10.2   11.4   14.6 
## mean_PPD        69.0    2.9   63.2   67.1   69.0   70.9   74.7 
## log-posterior -110.5    2.4 -116.3 -111.8 -110.1 -108.8 -107.1 
## 
## Diagnostics:
##               mcse Rhat n_eff
## (Intercept)   1.2  1.0   8838
## S             0.0  1.0  12972
## D             0.2  1.0   9062
## L             0.0  1.0  12215
## W             0.1  1.0  11906
## log(M)        0.2  1.0   8349
## log(P)        0.1  1.0  13073
## sigma         0.0  1.0  10097
## mean_PPD      0.0  1.0  19001
## log-posterior 0.0  1.0   6237
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

6.2 Predicting Range by posterior distribution

set.seed(12345)
slbm.pp <- rstanarm::posterior_predict(fit.slbm.bs.4,
          newdata = data.frame(L=18,D=1.25,S=2,W=1,
                               M=Mass,P=Payload),seed=12345)

Range.km <- slbm.pp^2
Range.km <- Range.km[1:10000,]
quantile(Range.km,probs = c(0.1,0.5,0.9))
##      10%      50%      90% 
## 1393.838 3080.234 5552.154
ggplot(data=as.data.frame(Range.km), aes(Range.km)) + 
  geom_histogram(bins = 30,col="black",fill="green") + 
  geom_vline(xintercept = mean(Range.km), color = "red") + 
  geom_errorbarh(aes(y=-5, xmin=quantile(Range.km,0.1),
                     xmax=quantile(Range.km,0.9)), 
                 data=as.data.frame(Range.km), col="#0094EA", size=3) +
  ggtitle(label="Probability density of Sejjil Range")

mean(Range.km>2000)
## [1] 0.7797

7 Conclusions

  1. Given historical SLBM data (M,P,D,L,S,W) and linear regression model we can produce point estimate for Sejjil MRBM Range as \({Range} = {{(\frac{6.443759+0.34433ln(Payload)-ln(Mass)}{0.01767})}^2}=3926\)

  2. Applying Bayesian linear regression model for SLBM data we can produce posterior distribution sample given the same historical data with 80% credible interval for Sejjil MRBM Range as \(P(1393\le Range\le5552)=0.8\) and the mean \(Range=3080\).

  3. We can be 78% assured that Sejjil MRBM \(Range\ge2000\).