Fidel Castro and Nikita Khrushchev, 1961

Fidel Castro and Nikita Khrushchev, 1961

Research goal

In previous topic we discussed linear and logistic regression for SLBM: https://rpubs.com/alex-lev/209272. So what else can we do applying linear regression for SLBM data? Here is some interesting historical case - Cuban missile crisis. See https://en.wikipedia.org/wiki/Cuban_Missile_Crisis.

R-12

R-12 Dvina

R-12 Dvina

The R-12 was a theatre ballistic missile developed and deployed by the Soviet Union during the Cold War. The R-12 provided the capability to attack targets at medium ranges with a megaton-class thermonuclear warhead and constituted the bulk of the Soviet offensive missile threat to Western Europe. Deployments of the R-12 missile in Cuba caused the Cuban Missile Crisis in 1962. See https://en.wikipedia.org/wiki/SS-4_Sandal.

SLBM Data

Imagine we are in October 1962 and suppose we have got some confident intelligence information about R-12: S - number of stages (1), D - diameter (1.65 m) and M - weight (41.7 ton): https://en.wikipedia.org/wiki/SS-4_Sandal.

Could we estimate for president J.F.Kennedy potential range of R-12 using SLBM data?

load(file = "slbm.dat")
attach(slbm)
slbm
##    S     M     L    D     R    P W
## 1  1  5400 10.40 0.58   150  975 1
## 2  1 13700 11.80 1.30   650 1597 1
## 3  1 19650 14.20 1.30  1420 1179 1
## 4  1 14200  8.89 1.50  2400  650 1
## 5  1 14200  8.89 1.50  3000  650 1
## 6  1 14200  9.65 1.50  3000  650 2
## 7  2 33300 13.00 1.80  7800 1100 1
## 8  2 33300 13.00 1.80  9100 1100 1
## 9  2 26900 10.60 1.54  3900  450 1
## 10 2 35300 14.10 1.80  6500 1600 2
## 11 2 35300 14.10 1.80  8000 1600 1
## 12 2 35300 14.10 1.80  6500 1600 2
## 13 3 90000 16.00 2.40  8300 2550 2
## 14 3 40300 14.80 1.90  8300 2300 2
## 15 3 40800 14.80 1.90  8300 2800 2
## 16 3 36800 11.50 2.00  9300 1150 2
## 17 2 13600  9.45 1.37  2800  500 1
## 18 2 16200  9.86 1.37  4630  760 2
## 19 2 29485 10.36 1.88  5600 2000 2
## 20 3 32000 10.36 1.88  7400 1360 2
## 21 3 57500 13.42 2.11 11000 2880 2
## 22 2 20000 10.70 1.50  3000 1360 1
## 23 2 19500 10.70 1.50  3200 1360 1
## 24 2 19950 10.40 1.50  3200 1000 1
## 25 2 14700 10.70 1.40  2500  600 1
## 26 2 23000 13.00 2.00  8000  700 2

Linear regression model estimate

Here is the model for \(R\) - range of missile: \(R=f(S,D,M) + E\), where \(E\) - error of estimate.

library(car)
fit.slbm<-lm(log(R)~S + D + M, data = slbm) #linear regression model
summary(fit.slbm) # summary of the model
## 
## Call:
## lm(formula = log(R) ~ S + D + M, data = slbm)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.82063 -0.23408  0.00621  0.18996  0.67479 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.442e+00  4.336e-01   7.938 6.72e-08 ***
## S            3.220e-01  1.547e-01   2.082  0.04922 *  
## D            2.984e+00  3.766e-01   7.922 6.94e-08 ***
## M           -2.517e-05  7.535e-06  -3.340  0.00296 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3461 on 22 degrees of freedom
## Multiple R-squared:  0.8829, Adjusted R-squared:  0.867 
## F-statistic: 55.32 on 3 and 22 DF,  p-value: 2.066e-10
par(mfrow=c(2,2))
plot(fit.slbm) #main plots

shapiro.test(fit.slbm$residuals) #test for normality of the residuals
## 
##  Shapiro-Wilk normality test
## 
## data:  fit.slbm$residuals
## W = 0.98117, p-value = 0.8978
leveragePlots(fit.slbm) # leverage plots

exp(predict(fit.slbm,data.frame(S=1,D=1.65,M=41700),interval = "conf",level = 0.95))
##       fit      lwr      upr
## 1 2073.87 1316.593 3266.716

Not bad! The true value for \(R\) is lying in the 95% confident interval \({1316.593}\leqslant{R}\leqslant{3266.716}\) with the mean value \(2073.87\) that is very close to the original one - \(R=2080\).

Bayesian linear regression model estimate

President Kennedy and Secretary of Defense McNamara

President Kennedy and Secretary of Defense McNamara

But what if we want to produce the probability distribution of \(P(R)\) to be sure about possible deviations? In other words we want to have credible interval for \(R\). Remember that JFK in 1962 had to take the most urgent decision for the whole mankind based on intelligence!

We can try Bayesian linear regression for our data converting confidence into credibility.

library(bayesboot)

set.seed(12345)
lm.coefs <- function(d, w) {
  coef( lm(log(R)~S + D + M, data = d, weights = w) )
}
blm <- bayesboot(slbm, lm.coefs, R = 10000, use.weights = TRUE)
plot(blm)

summary(blm)
## Bayesian bootstrap
## 
## Number of posterior draws: 10000 
## 
## Summary of the posterior (with 95% Highest Density Intervals):
##    statistic          mean           sd       hdi.low      hdi.high
##  (Intercept)  3.567428e+00 5.150899e-01  2.703757e+00  4.665412e+00
##            S  3.123405e-01 1.323728e-01  5.670599e-02  5.774338e-01
##            D  2.880673e+00 4.071315e-01  2.012368e+00  3.542198e+00
##            M -2.224895e-05 7.933479e-06 -3.617302e-05 -5.798076e-06
## 
## Quantiles:
##    statistic         q2.5%          q25%        median          q75%
##  (Intercept)  2.775375e+00  3.196509e+00  3.479382e+00  3.873703e+00
##            S  7.635106e-02  2.237353e-01  3.029292e-01  3.905000e-01
##            D  2.002252e+00  2.617766e+00  2.931485e+00  3.177034e+00
##            M -3.523914e-05 -2.793057e-05 -2.321088e-05 -1.758096e-05
##         q97.5%
##   4.754934e+00
##   6.042734e-01
##   3.539850e+00
##  -4.053415e-06
## 
## Call:
##  bayesboot(data = slbm, statistic = lm.coefs, R = 10000, use.weights = TRUE)
B0<-c()
B1<-c()
B2<-c()
B3<-c()
R<-c()

for (i in 1:10000) {
  B0[i]<-blm$`(Intercept)`[i]
  B1[i]<-blm$S[i]
  B2[i]<-blm$D[i]
  B3[i]<-blm$M[i]
  R[i]<-exp(B0[i] + B1[i]*1 + B2[i]*1.65 + B3[i]*41700)
}


summary(R)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   811.7  1930.0  2210.0  2275.0  2538.0  5598.0
hist(R,breaks = 50,freq = F,col="lightblue")
curve(dnorm(x,mean = mean(R),sd = sd(R)),add = T)
abline(v = mean(R),col="blue")
abline(v = 2080,col="red")

We see some interesting results for Bayesian linear model \({R} = e^{{B_0} + {B_1}*L + {B_2}*D+ {B_3}*M}\). We have got 95% credible intervals for linear regression coefficients \({B_0}, {B_1}, {B_2}, {B_3}\) as well as distributions too. In fact Bayesian models produce results very close to frequentest models. Anyway these results are more robust.

Conclusions

  1. Applying linear regression on SLBM data we have got very interesting model for the range of the missile \({R} = e^{{3.442} + {0.322}{L} + {2.984}{D} - {0.00002517}{M}}\).
  2. We have got point and interval estimates for R12 range: 95% confident interval \({1316.593}\leqslant{R}\leqslant{3266.716}\) with the mean value \(2073.87\) that is very close to the original one - \(R=2080\).
  3. Applying Bayesian linear model \({R} = e^{{B_0} + {B_1}*L + {B_2}*D+ {B_3}*M}\) we have got 95% credible intervals for linear regression coefficients \({B_0}, {B_1}, {B_2}, {B_3}\) as well as distributions too.
  4. Applying Bayesian linear model we have got 95% credible interval for R as \({1930.0}\leqslant{R}\leqslant{2538.0}\) with the mean value \(2275.0\).
  5. In fact Bayesian models produce results very close to frequentest models. Anyway these results are more robust.