In previous topic we discussed linear regression for ICBM verifying model for V-2. ICBM case can be found here 1. Now our focus is on SLBM. Is it different from ICBM case?
We use aggregated and transformed SLBM data 2 to explore the relationship between M - Mass (kg), R - Range (km), P - Payload (kg), S - Stage (1,2,3), D - Diameter (m), L - Length (m) and W - Type of Warhead (Single - 1, MIRV - 2) of submarine launched ballistic missile.
library(car)
library(MASS)
library(broom)
library(popbio)
library(bayesboot)
library(FactoMineR)
library(scatterplot3d)
library(MCMCpack)
## Loading required package: coda
## ##
## ## Markov Chain Monte Carlo Package (MCMCpack)
## ## Copyright (C) 2003-2016 Andrew D. Martin, Kevin M. Quinn, and Jong Hee Park
## ##
## ## Support provided by the U.S. National Science Foundation
## ## (Grants SES-0350646 and SES-0350613)
## ##
Polaris SLBM
load(file = "slbm.dat")
attach(slbm)
slbm
## S M L D R P W
## 1 1 5400 10.40 0.58 150 975 1
## 2 1 13700 11.80 1.30 650 1597 1
## 3 1 19650 14.20 1.30 1420 1179 1
## 4 1 14200 8.89 1.50 2400 650 1
## 5 1 14200 8.89 1.50 3000 650 1
## 6 1 14200 9.65 1.50 3000 650 2
## 7 2 33300 13.00 1.80 7800 1100 1
## 8 2 33300 13.00 1.80 9100 1100 1
## 9 2 26900 10.60 1.54 3900 450 1
## 10 2 35300 14.10 1.80 6500 1600 2
## 11 2 35300 14.10 1.80 8000 1600 1
## 12 2 35300 14.10 1.80 6500 1600 2
## 13 3 90000 16.00 2.40 8300 2550 2
## 14 3 40300 14.80 1.90 8300 2300 2
## 15 3 40800 14.80 1.90 8300 2800 2
## 16 3 36800 11.50 2.00 9300 1150 2
## 17 2 13600 9.45 1.37 2800 500 1
## 18 2 16200 9.86 1.37 4630 760 2
## 19 2 29485 10.36 1.88 5600 2000 2
## 20 3 32000 10.36 1.88 7400 1360 2
## 21 3 57500 13.42 2.11 11000 2880 2
## 22 2 20000 10.70 1.50 3000 1360 1
## 23 2 19500 10.70 1.50 3200 1360 1
## 24 2 19950 10.40 1.50 3200 1000 1
## 25 2 14700 10.70 1.40 2500 600 1
## 26 2 23000 13.00 2.00 8000 700 2
summary(slbm)
## S M L D
## Min. :1 Min. : 5400 Min. : 8.89 Min. :0.580
## 1st Qu.:2 1st Qu.:15075 1st Qu.:10.37 1st Qu.:1.500
## Median :2 Median :24950 Median :11.10 Median :1.670
## Mean :2 Mean :28253 Mean :11.88 Mean :1.651
## 3rd Qu.:2 3rd Qu.:35300 3rd Qu.:13.93 3rd Qu.:1.880
## Max. :3 Max. :90000 Max. :16.00 Max. :2.400
## R P W
## Min. : 150 Min. : 450 Min. :1.000
## 1st Qu.: 3000 1st Qu.: 715 1st Qu.:1.000
## Median : 5115 Median :1164 Median :1.000
## Mean : 5306 Mean :1326 Mean :1.462
## 3rd Qu.: 8000 3rd Qu.:1600 3rd Qu.:2.000
## Max. :11000 Max. :2880 Max. :2.000
The most familiar measure of dependence between two quantities is the Pearson product-moment correlation coefficient, or “Pearson’s correlation coefficient”, commonly called simply “the correlation coefficient”. It is obtained by dividing the covariance of the two variables by the product of their standard deviations 3, 4.
scatterplotMatrix(slbm,ellipse = TRUE,smoother = F)
cor(slbm)
## S M L D R P W
## S 1.0000000 0.7227514 0.4733140 0.7372275 0.7974522 0.6056825 0.5678210
## M 0.7227514 1.0000000 0.7279571 0.8306791 0.7512824 0.7402274 0.5098905
## L 0.4733140 0.7279571 1.0000000 0.5686223 0.6036077 0.6991290 0.3569790
## D 0.7372275 0.8306791 0.5686223 1.0000000 0.8688213 0.5706833 0.6072926
## R 0.7974522 0.7512824 0.6036077 0.8688213 1.0000000 0.5672620 0.5995672
## P 0.6056825 0.7402274 0.6991290 0.5706833 0.5672620 1.0000000 0.4993946
## W 0.5678210 0.5098905 0.3569790 0.6072926 0.5995672 0.4993946 1.0000000
cov(slbm)
## S M L D R
## S 0.4800 8.642000e+03 6.820000e-01 0.1804000 1.679200e+03
## M 8642.0000 2.978581e+08 2.612913e+04 5063.5180769 3.940806e+07
## L 0.6820 2.612913e+04 4.325425e+00 0.4176886 3.815455e+03
## D 0.1804 5.063518e+03 4.176886e-01 0.1247466 9.326571e+02
## R 1679.2000 3.940806e+07 3.815455e+03 932.6570769 9.237497e+06
## P 293.5600 8.937192e+06 1.017192e+03 141.0070308 1.206124e+06
## W 0.2000 4.473831e+03 3.774462e-01 0.1090462 9.264308e+02
## P W
## S 293.5600 0.2000000
## M 8937192.2538 4473.8307692
## L 1017.1920 0.3774462
## D 141.0070 0.1090462
## R 1206124.3538 926.4307692
## P 489398.3215 177.6123077
## W 177.6123 0.2584615
Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. The number of principal components is less than or equal to the number of original variables 5.
slbm.pca<-PCA(slbm)
summary(slbm.pca)
##
## Call:
## PCA(X = slbm)
##
##
## Eigenvalues
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6
## Variance 4.867 0.782 0.521 0.355 0.239 0.159
## % of var. 69.524 11.177 7.449 5.065 3.413 2.266
## Cumulative % of var. 69.524 80.701 88.150 93.215 96.628 98.894
## Dim.7
## Variance 0.077
## % of var. 1.106
## Cumulative % of var. 100.000
##
## Individuals (the 10 first)
## Dist Dim.1 ctr cos2 Dim.2 ctr cos2 Dim.3 ctr
## 1 | 4.265 | -3.816 11.506 0.800 | 0.980 4.720 0.053 | 1.105 9.010
## 2 | 2.718 | -2.133 3.597 0.616 | 1.392 9.532 0.262 | 0.766 4.324
## 3 | 2.712 | -1.699 2.283 0.393 | 1.873 17.254 0.477 | 0.423 1.320
## 4 | 2.823 | -2.637 5.494 0.872 | -0.292 0.420 0.011 | -0.275 0.557
## 5 | 2.760 | -2.555 5.158 0.857 | -0.337 0.557 0.015 | -0.339 0.850
## 6 | 2.641 | -1.792 2.537 0.460 | -1.126 6.228 0.182 | 1.152 9.788
## 7 | 1.498 | 0.416 0.137 0.077 | 0.454 1.013 0.092 | -1.241 11.364
## 8 | 1.779 | 0.594 0.278 0.111 | 0.357 0.628 0.040 | -1.381 14.071
## 9 | 1.792 | -1.326 1.390 0.548 | -0.281 0.387 0.025 | -1.002 7.409
## 10 | 1.742 | 1.372 1.487 0.620 | 0.185 0.168 0.011 | 0.695 3.565
## cos2
## 1 0.067 |
## 2 0.079 |
## 3 0.024 |
## 4 0.009 |
## 5 0.015 |
## 6 0.190 |
## 7 0.687 |
## 8 0.603 |
## 9 0.313 |
## 10 0.159 |
##
## Variables
## Dim.1 ctr cos2 Dim.2 ctr cos2 Dim.3 ctr cos2
## S | 0.848 14.766 0.719 | -0.235 7.048 0.055 | -0.160 4.940 0.026 |
## M | 0.914 17.150 0.835 | 0.170 3.679 0.029 | -0.116 2.575 0.013 |
## L | 0.757 11.777 0.573 | 0.541 37.474 0.293 | 0.037 0.264 0.001 |
## D | 0.899 16.593 0.808 | -0.186 4.403 0.034 | -0.221 9.326 0.049 |
## R | 0.898 16.569 0.806 | -0.196 4.885 0.038 | -0.232 10.289 0.054 |
## P | 0.799 13.122 0.639 | 0.363 16.802 0.131 | 0.313 18.746 0.098 |
## W | 0.698 10.024 0.488 | -0.449 25.710 0.201 | 0.530 53.861 0.281 |
Note. We see correlations between all pairs as well as two components of data explain about 80% of variance: {L, P, M} and {W, S, R, D}.
In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters are estimated from the data 6.
Box-Cox data transformation technique used to stabilize variance, make the data more normal distribution-like, improve the validity of measures of association such as the Pearson correlation between variables and for other data stabilization procedures 7.
fit.slbm<-lm(log(M)~.,data=slbm) # all variables
summary(fit.slbm)
##
## Call:
## lm(formula = log(M) ~ ., data = slbm)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.39208 -0.07248 0.01379 0.04525 0.31972
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.289e+00 2.949e-01 24.716 6.58e-16 ***
## S 1.431e-01 7.815e-02 1.831 0.0829 .
## L 6.084e-02 2.245e-02 2.710 0.0139 *
## D 1.143e+00 1.786e-01 6.401 3.88e-06 ***
## R -1.552e-06 2.381e-05 -0.065 0.9487
## P 8.102e-05 6.928e-05 1.170 0.2566
## W -1.306e-01 7.868e-02 -1.660 0.1134
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1501 on 19 degrees of freedom
## Multiple R-squared: 0.9483, Adjusted R-squared: 0.932
## F-statistic: 58.08 on 6 and 19 DF, p-value: 3.293e-11
boxCox(fit.slbm) # Box-Cox transformation
fit.slbm<-lm(log(M)~.,data=slbm) # all variables with transformation
summary(fit.slbm)
##
## Call:
## lm(formula = log(M) ~ ., data = slbm)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.39208 -0.07248 0.01379 0.04525 0.31972
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.289e+00 2.949e-01 24.716 6.58e-16 ***
## S 1.431e-01 7.815e-02 1.831 0.0829 .
## L 6.084e-02 2.245e-02 2.710 0.0139 *
## D 1.143e+00 1.786e-01 6.401 3.88e-06 ***
## R -1.552e-06 2.381e-05 -0.065 0.9487
## P 8.102e-05 6.928e-05 1.170 0.2566
## W -1.306e-01 7.868e-02 -1.660 0.1134
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1501 on 19 degrees of freedom
## Multiple R-squared: 0.9483, Adjusted R-squared: 0.932
## F-statistic: 58.08 on 6 and 19 DF, p-value: 3.293e-11
fit.lm.slbm<-lm(log(M)~L + D,data=slbm) # only variables with p-value <0.05
summary(fit.lm.slbm)
##
## Call:
## lm(formula = log(M) ~ L + D, data = slbm)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.57674 -0.03643 0.01608 0.06303 0.34667
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.08397 0.19867 35.657 < 2e-16 ***
## L 0.07882 0.01920 4.105 0.000433 ***
## D 1.25568 0.11306 11.107 1.02e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1642 on 23 degrees of freedom
## Multiple R-squared: 0.925, Adjusted R-squared: 0.9185
## F-statistic: 141.9 on 2 and 23 DF, p-value: 1.149e-13
tidy(anova(fit.lm.slbm)) #anova for linear model
## term df sumsq meansq statistic p.value
## 1 L 1 4.3287094 4.32870942 160.4785 7.440858e-12
## 2 D 1 3.3273690 3.32736896 123.3558 1.018482e-10
## 3 Residuals 23 0.6203965 0.02697376 NA NA
par(mfrow=c(2,2))
plot(fit.lm.slbm) # Plot Diagnostics for linear model
leveragePlots(fit.lm.slbm) # leverage plots
par(mfrow=c(1,2))
# Normality of Residuals
# qq plot for studentized resid
qqPlot(fit.lm.slbm, main="QQ Plot")
# distribution of studentized residuals
sresid <- studres(fit.lm.slbm)
hist(sresid, freq=FALSE, main="Studentized Residuals")
xfit<-seq(min(sresid),max(sresid),length=40)
yfit<-dnorm(xfit)
lines(xfit, yfit,col="red")
fit.lm.slbm.2<-lm(log(M)~sqrt(R) + log(P),data=slbm) # another pair of variables: see ICBM case
summary(fit.lm.slbm.2)
##
## Call:
## lm(formula = log(M) ~ sqrt(R) + log(P), data = slbm)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.43582 -0.09245 -0.05523 0.06307 0.65331
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.443759 0.653801 9.856 1.00e-09 ***
## sqrt(R) 0.017668 0.002243 7.877 5.59e-08 ***
## log(P) 0.344329 0.100484 3.427 0.0023 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.236 on 23 degrees of freedom
## Multiple R-squared: 0.8453, Adjusted R-squared: 0.8318
## F-statistic: 62.82 on 2 and 23 DF, p-value: 4.788e-10
tidy(anova(fit.lm.slbm.2))
## term df sumsq meansq statistic p.value
## 1 sqrt(R) 1 6.3420323 6.34203231 113.90236 2.207979e-10
## 2 log(P) 1 0.6538129 0.65381292 11.74242 2.304246e-03
## 3 Residuals 23 1.2806297 0.05567955 NA NA
par(mfrow=c(2,2))
plot(fit.lm.slbm.2) # Plot Diagnostics for linear model
leveragePlots(fit.lm.slbm.2) # leverage plots
par(mfrow=c(1,2))
# Normality of Residuals
# qq plot for studentized resid
qqPlot(fit.lm.slbm.2, main="QQ Plot")
## Warning in rlm.default(x, y, weights, method = method, wt.method =
## wt.method, : 'rlm' failed to converge in 20 steps
# distribution of studentized residuals
sresid <- studres(fit.lm.slbm.2)
hist(sresid, freq=FALSE, main="Studentized Residuals")
xfit<-seq(min(sresid),max(sresid),length=40)
yfit<-dnorm(xfit)
lines(xfit, yfit,col="red")
par(mfrow=c(1,1))
s3d<-scatterplot3d(slbm$L, slbm$D, log(slbm$M) , main="3D plot for M=f(L,D)",pch=16, highlight.3d=TRUE,type="h", grid = T, xlab = "L, m", ylab = "D, m", zlab = "log(M), kg")
s3d$plane3d(fit.lm.slbm,draw_lines = T,draw_polygon = T)
s23d<-scatterplot3d(sqrt(slbm$R), log(slbm$P), log(slbm$M), main="3D plot for for M=f(R,P)",pch=16, highlight.3d=TRUE,type="h", grid = T,xlab = "sqrt(R), m", ylab = "log(P), kg", zlab = "log(M), kg")
s23d$plane3d(fit.lm.slbm.2$coefficients,draw_lines = T,draw_polygon = T)
Note. So we got the following expression for SLBM Mass (M) given Length (L) and Diameter (D), that explains more than 90% of Mass dispersion: \({M} = e^{7.08 + 0.08*L + 1.26*D}\). We also got alternative expression \({M} = e^{6.444 + 0.018*\sqrt{R} + 0.344*log{P}}\), that explains about 85% of Mass dispersion by Range (R) and Payload (P).
We can try Bayesian linear regression 8 for our data converting confidence into credibility.
par(mfrow=c(1,1))
set.seed(12345)
lm.coefs <- function(d, w) {
coef( lm(log(M)~L + D, data = d, weights = w) )
}
blm <- bayesboot(slbm, lm.coefs, R = 10000, use.weights = TRUE)
plot(blm)
summary(blm)
## Bayesian bootstrap
##
## Number of posterior draws: 10000
##
## Summary of the posterior (with 95% Highest Density Intervals):
## statistic mean sd hdi.low hdi.high
## (Intercept) 7.10021330 0.15395052 6.82715209 7.42776385
## L 0.07841134 0.01034658 0.05770256 0.09768141
## D 1.24943814 0.09291084 1.05880524 1.40822699
##
## Quantiles:
## statistic q2.5% q25% median q75% q97.5%
## (Intercept) 6.84446481 6.99693113 7.08309384 7.1842325 7.45805049
## L 0.05681407 0.07193504 0.07885521 0.0853734 0.09706943
## D 1.01578148 1.20919684 1.26650756 1.3091089 1.38424576
##
## Call:
## bayesboot(data = slbm, statistic = lm.coefs, R = 10000, use.weights = TRUE)
lm.coefs.2 <- function(d, w) {
coef( lm(log(M)~sqrt(R) + log(P), data = d, weights = w) )
}
blm2 <- bayesboot(slbm, lm.coefs.2, R = 10000, use.weights = TRUE)
plot(blm2)
summary(blm2)
## Bayesian bootstrap
##
## Number of posterior draws: 10000
##
## Summary of the posterior (with 95% Highest Density Intervals):
## statistic mean sd hdi.low hdi.high
## (Intercept) 6.41170352 0.753961570 4.83932884 7.85042698
## sqrt(R) 0.01739646 0.002278476 0.01325511 0.02203054
## log(P) 0.35165112 0.115214612 0.11489972 0.57259407
##
## Quantiles:
## statistic q2.5% q25% median q75% q97.5%
## (Intercept) 4.85979888 5.93694581 6.41855098 6.89239887 7.88161395
## sqrt(R) 0.01325785 0.01575855 0.01722693 0.01898698 0.02203061
## log(P) 0.12439872 0.27749172 0.35207329 0.42514610 0.58359108
##
## Call:
## bayesboot(data = slbm, statistic = lm.coefs.2, R = 10000, use.weights = TRUE)
Note.We see some interesting results for Bayesian linear model: \({M} = e^{7.1 + 0.078*L + 1.249*D}\), \({M} = e^{6.411 + 0.017*\sqrt{R} + 0.351*log{P}}\). Here we got 95% credible intervals for linear regression coefficients as well as distributions too. In fact Bayesian models produce results very close to frequentist models. Anyway these results are more robust.
Logistic regression measures the relationship between the categorical dependent variable and one or more independent variables by estimating probabilities using a logistic function, which is the cumulative logistic distribution 9.
Here we use M as the categorical dependent variable indicating presence of MIRV for SLBM and all other variables as independent ones.
slbm.fit.all<-glm(W-1~.,data=slbm,family = binomial(link = "logit")) # all factors are included
summary(slbm.fit.all)
##
## Call:
## glm(formula = W - 1 ~ ., family = binomial(link = "logit"), data = slbm)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.1711 -0.7198 -0.0604 0.3307 2.2444
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.457e+01 1.267e+01 -1.150 0.250
## S 7.455e-01 1.682e+00 0.443 0.658
## M -9.776e-05 1.276e-04 -0.766 0.444
## L -3.908e-01 6.867e-01 -0.569 0.569
## D 1.103e+01 8.529e+00 1.293 0.196
## R -6.798e-05 5.980e-04 -0.114 0.909
## P 1.705e-03 2.081e-03 0.819 0.413
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 35.890 on 25 degrees of freedom
## Residual deviance: 19.733 on 19 degrees of freedom
## AIC: 33.733
##
## Number of Fisher Scoring iterations: 6
slbm.fit<-glm(W-1~D,data=slbm,family = binomial(link = "logit")) # only significant factor
summary(slbm.fit)
##
## Call:
## glm(formula = W - 1 ~ D, family = binomial(link = "logit"), data = slbm)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4782 -0.6579 -0.1837 0.6683 2.2022
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -11.942 4.394 -2.718 0.00658 **
## D 7.014 2.574 2.725 0.00643 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 35.890 on 25 degrees of freedom
## Residual deviance: 22.175 on 24 degrees of freedom
## AIC: 26.175
##
## Number of Fisher Scoring iterations: 5
coefficients(slbm.fit)
## (Intercept) D
## -11.941940 7.014425
confint(slbm.fit)
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## (Intercept) -22.413791 -4.664713
## D 2.736875 13.080207
plot(profile(slbm.fit)) #likelihood profiling
1-slbm.fit$deviance/slbm.fit$null.deviance # The pseudo R-squared for logistic model
## [1] 0.3821408
glance(slbm.fit.all)
## null.deviance df.null logLik AIC BIC deviance df.residual
## 1 35.88966 25 -9.866566 33.73313 42.53981 19.73313 19
glance(slbm.fit)
## null.deviance df.null logLik AIC BIC deviance df.residual
## 1 35.88966 25 -11.08738 26.17475 28.69095 22.17475 24
anova(slbm.fit.all, slbm.fit, test = "Chisq")# comparing full and reduced models by anova
## Analysis of Deviance Table
##
## Model 1: W - 1 ~ S + M + L + D + R + P
## Model 2: W - 1 ~ D
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 19 19.733
## 2 24 22.175 -5 -2.4416 0.7853
x<-seq(from=1,to=2.5,by=0.1)
y=predict(slbm.fit,data.frame(D=x),type = "response")
par(mfrow=c(1,1))
logi.hist.plot(D,W-1,boxp=FALSE,type="hist",col="gray",xlabel = "Diameter of SLBM, m",mainlabel = "Probability of MIRV")
points(D,fitted(slbm.fit),pch=10)
grid()
Here we apply Markov Chain Monte Carlo for Logistic Regression (MCMCpack) for slbm.fit model to generate posterior distributions for intercept ans D coefficients.
logmcmc = MCMClogit(slbm.fit$formula, burnin=1000, mcmc=30000, b0=0, B0=.04,data = slbm,seed = 12345)
summary(logmcmc)
##
## Iterations = 1001:31000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 30000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## (Intercept) -7.347 2.549 0.014718 0.04147
## D 4.330 1.500 0.008662 0.02429
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## (Intercept) -12.703 -8.976 -7.198 -5.556 -2.727
## D 1.589 3.288 4.255 5.298 7.487
plot(logmcmc)
hist(logmcmc[,1],main = "Intercept coefficient density for posterior",xlab = "Intercept coefficient", col="blue",freq = F)
hist(logmcmc[,2],main = "D coefficient density for posterior",xlab = "D coefficient", col="blue", freq = F)
Note. As we see the value of Diameter of SLBM does matter for the Type of warhead: \({P}_{MIRV} = 1 - \frac{1}{1 + e^{(-11.942 + 7.014*D)}}\). The pseudo R-squared for logistic model tells us that our model explains about 38% of the deviance. Number of Fisher Scoring iterations is equal 5 as well as AIC and BIC for single factor (D) have decreased compared to all factors included slbm.fit.all i.e. the model slbm.fit is valid. Anova provides evidence against the full model in favor of the reduced model by Chi-Square, showing no significant reduction in deviance when single factor (D) is used. Bayesian logistic regression gives 95% credible interval for coefficients (intercept B0 and B1 for D) as \({-12.703}\leqslant{B}_{0}\leqslant{-2.727}\) and \({1.589}\leqslant{B}_{1}\leqslant{7.487}\) with the mean values \(M({B}_{0})=-7.347\) and \(M({B}_{1})=4.330\).
M-51
Let’s verify our linear regression models (frequentist and Bayesian) applying french SLBM M-51 data 10, that are not contained in slbm data frame: L=12 m, D=2.3 m, R=10000 km, P=2500 kg (estimate).
exp(predict(fit.lm.slbm,data.frame(L=12, D=2.3),interval = "confidence"))
## fit lwr upr
## 1 55153.46 46847.27 64932.37
exp(predict(fit.lm.slbm.2,data.frame(R=10000, P=2500),interval = "confidence"))
## fit lwr upr
## 1 54426.24 45258.54 65450.99
predict(slbm.fit,data.frame(D=2.3),type = "response")
## 1
## 0.9850979
M1<-c(1:1000)
for (i in 1:1000) {
M1[i]<-exp(blm$`(Intercept)`[i]+blm$L[i]*12+blm$D[i]*2.3)
}
quantile(M1,probs = c(0.025,0.5,0.975))
## 2.5% 50% 97.5%
## 45097.88 55850.48 61448.69
M2<-c(1:1000)
for (i in 1:1000) {
M2[i]<-exp(blm2$`(Intercept)`[i]+blm2$`sqrt(R)`[i]*sqrt(10000)+blm2$`log(P)`[i]*log(2500))
}
quantile(M2,probs = c(0.025,0.5,0.975))
## 2.5% 50% 97.5%
## 47056.05 53803.23 69109.73
par(mfrow=c(1,2))
hist(M1,breaks = 20,col="blue",main = "Mass1 distribution",xlab = "Mass1, kg")
hist(M2,breaks = 20,col="green",main = "Mass2 distribution",xlab = "Mass2, kg")
par(mfrow=c(1,1))
MM<-cbind(M1,M2)
hist(MM,freq = F, breaks = 20,col="grey",main = "95% Credible M51 mass distribution",xlab = "Mass,kg")
x<-MM
curve(dnorm(x, mean=mean(MM), sd=sd(MM)), col="blue", add=TRUE, lwd=1)
abline(v=mean(MM), col="red")
quantile(MM,probs = c(0.025,0.5, 0.975))
## 2.5% 50% 97.5%
## 45965.80 54974.63 67039.54
PW<-c()
PW<-1-(1/(1+exp(logmcmc[,1]+logmcmc[,2]*2)))#Distribution of MIRV posterior probability from MCMC iterations for D=2 m
hist(PW,main = "Distribution of posterior probability of MIRV for D=2 m", xlab="Probability of MIRV",col="magenta",freq = F)
summary(PW)
##
## Iterations = 1001:31000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 30000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## 0.7682797 0.1104233 0.0006375 0.0017922
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## 0.5152 0.7016 0.7830 0.8499 0.9386
quantile(PW,probs = c(0.025,0.5, 0.975))#95% credible distribution for MIRV probability
## 2.5% 50% 97.5%
## 0.5152189 0.7830142 0.9385730
Note. As we see the results (\({Mass}_1=55153.46\),\({Mass}_2=54426.24\), \({P}_{MIRV}={0.985}\)) as well as their distributions are very close for M-51 parameters: Mass=52000, MIRV. 95% credible interval produced by Bayesian regression is \({45965}\leqslant{M}\leqslant{67039}\) with the mean \(54974\). The mean value for the \(M2=53803.23\) i.e. very close to real value of M51 mass. Binomial test for M-51 can be found here 11. Bayesian logistic regression gives 95% credible interval for D=2 m as \({0.5152}\leqslant{P}_{MIRV}\leqslant{0.9386}\) with the mean value \(M({P}_{MIRV})=0.7830\) i.e. more modest than frequentist logistic but more credible. Anyway two logistic models prove high probability of MIRV for SLBM with diameter D=2 meters.
Ohio SSBN
Here we quote some interesting facts from the book “SILVER BULLET?” by James M. Acton from Carnegie Endowment for International Peace.12
“In May 2003, the U.S. Department of Defense formally initiated efforts to acquire high precision conventional weapons capable of striking targets around the globe within “minutes or hours. In early 2012, the U.S. Department of Defense expressed an interest in developing the Sea-Launched Intermediate-Range Ballistic Missile (SLIRBM) as an option to Hypersonic Boost-Glide Weapons program. One possible way of attaining an earlier initial operational capability would be interim basing of SLIRBM in Ohio-class SSGNs.”(page 1)
“Then secretary of defense Leon Panetta indicated that, if developed, this missile would be deployed in the Virginia Payload Module—an “additional mid-body section” that the U.S. Navy wishes to add to Virginia-class submarines procured in FY 2019 and beyond. A shortened missile, if equipped with a steerable reentry vehicle weighing about 700 kg (1,500 lb.), would probably have a range of around 2,400 km (1,500 mi.) if four missiles were fitted in each tube and 3,700 km (2,300 mi.) with three missiles in each tube.” (page 50)
“According to the 2008 NRC report, the Sea-Launched Global Strike Missile was to have a diameter of 0.97 m (38 in.) and a length-to-diameter ratio of “substantially greater than 12,” probably making it similar in length to a 13.4 m (44.0 ft.) Trident D5 missile. It would, therefore, have to be shortened to fit inside a Virginia-class submarine.”(page 155)
Let’s apply data to our models both frequentist and Bayesian. Here it is.
exp(predict(fit.lm.slbm,data.frame(L=13.4, D=0.97),interval = "confidence"))
## fit lwr upr
## 1 11593.18 9389.52 14314.03
exp(predict(fit.lm.slbm.2,data.frame(R=2400, P=700),interval = "confidence"))
## fit lwr upr
## 1 14257.49 12384.53 16413.7
SB1<-c(1:1000)
for (i in 1:1000) {
SB1[i]<-exp(blm$`(Intercept)`[i]+blm$L[i]*13.4+blm$D[i]*0.97)
}
quantile(SB1,probs = c(0.025,0.5, 0.975))
## 2.5% 50% 97.5%
## 10646.00 11576.85 13235.97
SB2<-c(1:1000)
for (i in 1:1000) {
SB2[i]<-exp(blm2$`(Intercept)`[i]+blm2$`sqrt(R)`[i]*sqrt(2400)+blm2$`log(P)`[i]*log(700))
}
quantile(SB2,probs = c(0.025,0.5, 0.975))
## 2.5% 50% 97.5%
## 12569.38 14177.62 16534.93
par(mfrow=c(1,2))
hist(SB1,breaks = 20,col="blue",main = "SLIRBM Mass1 distribution",xlab = "Mass, kg")
hist(SB2,breaks = 20,col="green",main = "SLIRBM Mass2 distribution",xlab = "Mass, kg")
Note. As we see the value of Mass for SLIRBM could be somewhere in the 95% confidence interval \({9389.52}\leqslant{M_1}\leqslant{14314.03}\) for the first linear model fit.lm.slbm and in the 95% confidence interval \({12384.53}\leqslant{M_2}\leqslant{16413.7}\) for the second model fit.lm.slbm.2. Pay attention that the upper bound for Mass in the first model 14314.03 is less than the fitted value of Mass in the second model 14257.49 i.e. two 95% confidence intervals overlap each other that makes intrigue for the probable value of SLIRBM’s Mass. 95% credible intervals produced by Bayesian regression for two models are: \({10646.00}\leqslant{M_1}\leqslant{13235.97}\) and \({12569.38}\leqslant{M_2}\leqslant{16534.93}\).