Sejjil test
The Sejjil missile is a two-stage, solid-propellant, medium-range ballistic missile domestically designed and built by Iran. Very little is known about this Sejjil MRBM though operational since 2012. The open source information is here: https://missilethreat.csis.org/missile/sejjil/.
We don’t believe any speculations about this MRBM. We just try estimate Sejjil operational range, applying multiple linear regression both in frequentest and Bayesian paradigm on SLBM data.
We use aggregated and transformed SLBM data to explore the relationship between M - Mass (kg), R - Range (km), P - Payload (kg), S - Stage (1,2,3), D - Diameter (m), L - Length (m) and W - Type of Warhead (Single - 1, MIRV - 2) of submarine launched ballistic missile.
load("slbm.dat")
library(DT)
datatable(slbm)
#slbm
Here we produce correlations matrix in graphical form.
library(ggplot2)
library(GGally)
#Here we use function from https://www.r-bloggers.com/multiple-regression-lines-in-ggpairs/
my_fn <- function(data, mapping, ...){
p <- ggplot(data = data, mapping = mapping) +
geom_point() +
geom_smooth(method=loess, fill="red", color="red", ...) +
geom_smooth(method=lm, fill="blue", color="blue", ...)
p
}
g = ggpairs(slbm,columns = 2:6, lower = list(continuous = my_fn))
g
Here we produce classical multiple linear regression model in reduced form (two independent variables) and pass some routine tests.
fit.lm <- lm(log(M)~log(P)+sqrt(R),data = slbm)
summary(fit.lm)
##
## Call:
## lm(formula = log(M) ~ log(P) + sqrt(R), data = slbm)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.43582 -0.09245 -0.05523 0.06307 0.65331
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.443759 0.653801 9.856 1.00e-09 ***
## log(P) 0.344329 0.100484 3.427 0.0023 **
## sqrt(R) 0.017668 0.002243 7.877 5.59e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.236 on 23 degrees of freedom
## Multiple R-squared: 0.8453, Adjusted R-squared: 0.8318
## F-statistic: 62.82 on 2 and 23 DF, p-value: 4.788e-10
hist(fit.lm$residuals,col = "blue",main = "Histogram of residuals",
xlab = "Residuals")
par(mfrow=c(2,2))
plot(fit.lm)
Linear regression model formula looks like
\[{Mass} = e^{{B0 + B1\sqrt(Range)} + B2\ln(Payload)}\]
or
\[{Mass} = e^{{6.443759 + 0.01767\sqrt(Range)} + 0.34433\ln(Payload)}\]
Here we apply previous linear model formula for Range given Mass and Payload.
\[Range={(\frac{6.443759+0.34433ln(Payload)-ln(Mass)}{0.01767})}^2\]
B0 <- as.numeric(fit.lm$coefficients[1])
B1 <- as.numeric(fit.lm$coefficients[2])
B2 <- as.numeric(fit.lm$coefficients[3])
Mass <- 23600
Payload <- 1500
Range <- ((B0 + B1*log(Payload) - log(Mass))/B2)^2
cat("Range =",Range)
## Range = 3926.368
fit.slbm.bs.4<-stan_glm(sqrt(R)~S+D+L+W+log(M)+log(P),data=slbm,chains=4,iter=10000,seed=12345)
fit.slbm.bs.4
## stan_glm
## family: gaussian [identity]
## formula: sqrt(R) ~ S + D + L + W + log(M) + log(P)
## observations: 26
## predictors: 7
## ------
## Median MAD_SD
## (Intercept) -40.6 106.7
## S 9.4 5.1
## D 37.4 19.8
## L 0.9 1.7
## W 3.3 5.6
## log(M) 6.5 15.0
## log(P) -7.3 6.0
##
## Auxiliary parameter(s):
## Median MAD_SD
## sigma 10.2 1.7
##
## Sample avg. posterior predictive distribution of y:
## Median MAD_SD
## mean_PPD 69.0 2.8
##
## ------
## * For help interpreting the printed output see ?print.stanreg
## * For info on the priors used see ?prior_summary.stanreg
par(mfrow=c(1,1))
plot(fit.slbm.bs.4)
posterior_vs_prior(fit.slbm.bs.4)
##
## Drawing from prior...
fit.slbm.bs42<-as.array(fit.slbm.bs.4)
mcmc_hist(fit.slbm.bs42)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
mcmc_trace(fit.slbm.bs42)
summary(fit.slbm.bs.4)
##
## Model Info:
##
## function: stan_glm
## family: gaussian [identity]
## formula: sqrt(R) ~ S + D + L + W + log(M) + log(P)
## algorithm: sampling
## priors: see help('prior_summary')
## sample: 20000 (posterior sample size)
## observations: 26
## predictors: 7
##
## Estimates:
## mean sd 2.5% 25% 50% 75% 97.5%
## (Intercept) -40.0 109.1 -254.9 -112.3 -40.6 31.7 175.7
## S 9.4 5.4 -1.2 5.9 9.4 12.9 20.2
## D 37.4 20.3 -2.6 24.1 37.4 50.9 77.4
## L 0.9 1.8 -2.7 -0.3 0.9 2.0 4.4
## W 3.2 5.8 -8.5 -0.5 3.3 7.1 14.8
## log(M) 6.4 15.6 -24.3 -3.6 6.5 16.6 37.0
## log(P) -7.3 6.2 -19.5 -11.3 -7.3 -3.3 5.0
## sigma 10.4 1.8 7.6 9.2 10.2 11.4 14.6
## mean_PPD 69.0 2.9 63.2 67.1 69.0 70.9 74.7
## log-posterior -110.5 2.4 -116.3 -111.8 -110.1 -108.8 -107.1
##
## Diagnostics:
## mcse Rhat n_eff
## (Intercept) 1.2 1.0 8838
## S 0.0 1.0 12972
## D 0.2 1.0 9062
## L 0.0 1.0 12215
## W 0.1 1.0 11906
## log(M) 0.2 1.0 8349
## log(P) 0.1 1.0 13073
## sigma 0.0 1.0 10097
## mean_PPD 0.0 1.0 19001
## log-posterior 0.0 1.0 6237
##
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
set.seed(12345)
slbm.pp <- rstanarm::posterior_predict(fit.slbm.bs.4,
newdata = data.frame(L=18,D=1.25,S=2,W=1,
M=Mass,P=Payload),seed=12345)
Range.km <- slbm.pp^2
Range.km <- Range.km[1:10000,]
quantile(Range.km,probs = c(0.1,0.5,0.9))
## 10% 50% 90%
## 1393.838 3080.234 5552.154
ggplot(data=as.data.frame(Range.km), aes(Range.km)) +
geom_histogram(bins = 30,col="black",fill="green") +
geom_vline(xintercept = mean(Range.km), color = "red") +
geom_errorbarh(aes(y=-5, xmin=quantile(Range.km,0.1),
xmax=quantile(Range.km,0.9)),
data=as.data.frame(Range.km), col="#0094EA", size=3) +
ggtitle(label="Probability density of Sejjil Range")
mean(Range.km>2000)
## [1] 0.7797
Given historical SLBM data (M,P,D,L,S,W) and linear regression model we can produce point estimate for Sejjil MRBM Range as \({Range} = {{(\frac{6.443759+0.34433ln(Payload)-ln(Mass)}{0.01767})}^2}=3926\)
Applying Bayesian linear regression model for SLBM data we can produce posterior distribution sample given the same historical data with 80% credible interval for Sejjil MRBM Range as \(P(1393\le Range\le5552)=0.8\) and the mean \(Range=3080\).
We can be 78% assured that Sejjil MRBM \(Range\ge2000\).