Fidel Castro and Nikita Khrushchev, 1961
In previous topic we discussed linear and logistic regression for SLBM: https://rpubs.com/alex-lev/209272. So what else can we do applying linear regression for SLBM data? Here is some interesting historical case - Cuban missile crisis. See https://en.wikipedia.org/wiki/Cuban_Missile_Crisis.
R-12 Dvina
The R-12 was a theatre ballistic missile developed and deployed by the Soviet Union during the Cold War. The R-12 provided the capability to attack targets at medium ranges with a megaton-class thermonuclear warhead and constituted the bulk of the Soviet offensive missile threat to Western Europe. Deployments of the R-12 missile in Cuba caused the Cuban Missile Crisis in 1962. See https://en.wikipedia.org/wiki/SS-4_Sandal.
Imagine we are in October 1962 and suppose we have got some confident intelligence information about R-12: S - number of stages (1), D - diameter (1.65 m) and M - weight (41.7 ton): https://en.wikipedia.org/wiki/SS-4_Sandal.
Could we estimate for president J.F.Kennedy potential range of R-12 using SLBM data?
load(file = "slbm.dat")
attach(slbm)
slbm
## S M L D R P W
## 1 1 5400 10.40 0.58 150 975 1
## 2 1 13700 11.80 1.30 650 1597 1
## 3 1 19650 14.20 1.30 1420 1179 1
## 4 1 14200 8.89 1.50 2400 650 1
## 5 1 14200 8.89 1.50 3000 650 1
## 6 1 14200 9.65 1.50 3000 650 2
## 7 2 33300 13.00 1.80 7800 1100 1
## 8 2 33300 13.00 1.80 9100 1100 1
## 9 2 26900 10.60 1.54 3900 450 1
## 10 2 35300 14.10 1.80 6500 1600 2
## 11 2 35300 14.10 1.80 8000 1600 1
## 12 2 35300 14.10 1.80 6500 1600 2
## 13 3 90000 16.00 2.40 8300 2550 2
## 14 3 40300 14.80 1.90 8300 2300 2
## 15 3 40800 14.80 1.90 8300 2800 2
## 16 3 36800 11.50 2.00 9300 1150 2
## 17 2 13600 9.45 1.37 2800 500 1
## 18 2 16200 9.86 1.37 4630 760 2
## 19 2 29485 10.36 1.88 5600 2000 2
## 20 3 32000 10.36 1.88 7400 1360 2
## 21 3 57500 13.42 2.11 11000 2880 2
## 22 2 20000 10.70 1.50 3000 1360 1
## 23 2 19500 10.70 1.50 3200 1360 1
## 24 2 19950 10.40 1.50 3200 1000 1
## 25 2 14700 10.70 1.40 2500 600 1
## 26 2 23000 13.00 2.00 8000 700 2
Here is the model for \(R\) - range of missile: \(R=f(S,D,M) + E\), where \(E\) - error of estimate.
library(car)
fit.slbm<-lm(log(R)~S + D + M, data = slbm) #linear regression model
summary(fit.slbm) # summary of the model
##
## Call:
## lm(formula = log(R) ~ S + D + M, data = slbm)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.82063 -0.23408 0.00621 0.18996 0.67479
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.442e+00 4.336e-01 7.938 6.72e-08 ***
## S 3.220e-01 1.547e-01 2.082 0.04922 *
## D 2.984e+00 3.766e-01 7.922 6.94e-08 ***
## M -2.517e-05 7.535e-06 -3.340 0.00296 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3461 on 22 degrees of freedom
## Multiple R-squared: 0.8829, Adjusted R-squared: 0.867
## F-statistic: 55.32 on 3 and 22 DF, p-value: 2.066e-10
par(mfrow=c(2,2))
plot(fit.slbm) #main plots
shapiro.test(fit.slbm$residuals) #test for normality of the residuals
##
## Shapiro-Wilk normality test
##
## data: fit.slbm$residuals
## W = 0.98117, p-value = 0.8978
leveragePlots(fit.slbm) # leverage plots
exp(predict(fit.slbm,data.frame(S=1,D=1.65,M=41700),interval = "conf",level = 0.95))
## fit lwr upr
## 1 2073.87 1316.593 3266.716
Not bad! The true value for \(R\) is lying in the 95% confident interval \({1316.593}\leqslant{R}\leqslant{3266.716}\) with the mean value \(2073.87\) that is very close to the original one - \(R=2080\).
President Kennedy and Secretary of Defense McNamara
But what if we want to produce the probability distribution of \(P(R)\) to be sure about possible deviations? In other words we want to have credible interval for \(R\). Remember that JFK in 1962 had to take the most urgent decision for the whole mankind based on intelligence!
We can try Bayesian linear regression for our data converting confidence into credibility.
library(bayesboot)
set.seed(12345)
lm.coefs <- function(d, w) {
coef( lm(log(R)~S + D + M, data = d, weights = w) )
}
blm <- bayesboot(slbm, lm.coefs, R = 10000, use.weights = TRUE)
plot(blm)
summary(blm)
## Bayesian bootstrap
##
## Number of posterior draws: 10000
##
## Summary of the posterior (with 95% Highest Density Intervals):
## statistic mean sd hdi.low hdi.high
## (Intercept) 3.567428e+00 5.150899e-01 2.703757e+00 4.665412e+00
## S 3.123405e-01 1.323728e-01 5.670599e-02 5.774338e-01
## D 2.880673e+00 4.071315e-01 2.012368e+00 3.542198e+00
## M -2.224895e-05 7.933479e-06 -3.617302e-05 -5.798076e-06
##
## Quantiles:
## statistic q2.5% q25% median q75%
## (Intercept) 2.775375e+00 3.196509e+00 3.479382e+00 3.873703e+00
## S 7.635106e-02 2.237353e-01 3.029292e-01 3.905000e-01
## D 2.002252e+00 2.617766e+00 2.931485e+00 3.177034e+00
## M -3.523914e-05 -2.793057e-05 -2.321088e-05 -1.758096e-05
## q97.5%
## 4.754934e+00
## 6.042734e-01
## 3.539850e+00
## -4.053415e-06
##
## Call:
## bayesboot(data = slbm, statistic = lm.coefs, R = 10000, use.weights = TRUE)
B0<-c()
B1<-c()
B2<-c()
B3<-c()
R<-c()
for (i in 1:10000) {
B0[i]<-blm$`(Intercept)`[i]
B1[i]<-blm$S[i]
B2[i]<-blm$D[i]
B3[i]<-blm$M[i]
R[i]<-exp(B0[i] + B1[i]*1 + B2[i]*1.65 + B3[i]*41700)
}
summary(R)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 811.7 1930.0 2210.0 2275.0 2538.0 5598.0
hist(R,breaks = 50,freq = F,col="lightblue")
curve(dnorm(x,mean = mean(R),sd = sd(R)),add = T)
abline(v = mean(R),col="blue")
abline(v = 2080,col="red")
We see some interesting results for Bayesian linear model \({R} = e^{{B_0} + {B_1}*L + {B_2}*D+ {B_3}*M}\). We have got 95% credible intervals for linear regression coefficients \({B_0}, {B_1}, {B_2}, {B_3}\) as well as distributions too. In fact Bayesian models produce results very close to frequentest models. Anyway these results are more robust.