Multiple linear regression for V-2 and KN-11

Research goal
R packages
Data
Correlations
Linear model test
Model after transformation
V-2 case
KN-11 case
Conclusions

Research goal

We use ICBM data (http://en.wikipedia.org/wiki/Comparison_of_ICBMs) to explore the relationship between Mass (kg), Range (km) and Payload (kg) of ballistic missile. Some interesting fact will be found in the end!

R packages

library(car)
library(MASS)
library(ggplot2)
library(broom)
library(binom)

V-2

Data

ICBM <- read.table("icbmdat.txt", header=TRUE, quote="\"")
attach(ICBM)
ICBM

##    Range   Mass Payload
## 1  16000 209600   20000
## 2  11200 211100   20000
## 3  10000 105600    5000
## 4  10500  45100     800
## 5  11000  47200     800
## 6   5750  49000    2000
## 7   6500  35300    1650
## 8   6500  34388    1650
## 9   9000  35300    1650
## 10  8300  40300    2800
## 11 13000  35300     510
## 12  9660  32000    1200
## 13 11300  58500    1900
## 14  5000  36000     900
## 15  6000  35000     660
## 16 10000  52000    1000
## 17 12000 183000    5000
## 18 15000 183000    4000
## 19  7000  82000    3300
## 20  8000  42000    1000
## 21  1250  12000    1000
## 22  3500  16000    1000
## 23  5000  22000    2500
## 24  4000  17000    1000
## 25  5800  50000    1500
## 26 10000  55000    1000

Correlations

One can see correlations between Mass, Range and Payload. Let’s prove it significance.

scatterplotMatrix(ICBM,ellipse = TRUE)

cor(ICBM)

##             Range      Mass   Payload
## Range   1.0000000 0.7012246 0.4796463
## Mass    0.7012246 1.0000000 0.8234138
## Payload 0.4796463 0.8234138 1.0000000

cor.test(Mass,Range)

## 
##  Pearson's product-moment correlation
## 
## data:  Mass and Range
## t = 4.8185, df = 24, p-value = 6.589e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4309193 0.8560544
## sample estimates:
##       cor 
## 0.7012246

cor.test(Mass,Payload)

## 
##  Pearson's product-moment correlation
## 
## data:  Mass and Payload
## t = 7.1089, df = 24, p-value = 2.389e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.6402797 0.9179763
## sample estimates:
##       cor 
## 0.8234138

Note. We proved significance of correlations both for \(Mass,Range\) and \(Mass,Payload\) pairs.

Linear model test

Now we can use lm for our research goal. But we need some sort of test to be confident that our model is relevant.

icbm.fit<-lm(Mass~Range+Payload)
summary(icbm.fit)

## 
## Call:
## lm(formula = Mass ~ Range + Payload)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -40817 -16562  -4474  10515  80057 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -14797.376  14823.221  -0.998  0.32854    
## Range            6.700      1.791   3.742  0.00107 ** 
## Payload          7.467      1.255   5.950 4.57e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 28050 on 23 degrees of freedom
## Multiple R-squared:  0.7998, Adjusted R-squared:  0.7824 
## F-statistic: 45.96 on 2 and 23 DF,  p-value: 9.241e-09

par(mfrow=c(2,2))
plot(icbm.fit)

par(mfrow=c(1,1))
qqPlot(icbm.fit)

shapiro.test(icbm.fit$res)

## 
##  Shapiro-Wilk normality test
## 
## data:  icbm.fit$res
## W = 0.87024, p-value = 0.003606

Note. We conclude icbm.fit is not relevant for our research goal due to non normal residuals. So we need some transformation using boxCox.

Model after transformation

boxCox(icbm.fit)

icbm.fit.2<-lm(log(Mass)~sqrt(Range)+log(Payload))
summary(icbm.fit.2)

## 
## Call:
## lm(formula = log(Mass) ~ sqrt(Range) + log(Payload))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.53912 -0.23756  0.01578  0.23230  0.47421 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.668222   0.504820  11.228 8.23e-11 ***
## sqrt(Range)  0.020685   0.003258   6.348 1.77e-06 ***
## log(Payload) 0.435456   0.072690   5.991 4.15e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3082 on 23 degrees of freedom
## Multiple R-squared:  0.8496, Adjusted R-squared:  0.8365 
## F-statistic: 64.95 on 2 and 23 DF,  p-value: 3.461e-10

par(mfrow=c(2,2))
plot(icbm.fit.2)

par(mfrow=c(1,1))
qqPlot(icbm.fit.2)

shapiro.test(icbm.fit.2$res)

## 
##  Shapiro-Wilk normality test
## 
## data:  icbm.fit.2$res
## W = 0.95722, p-value = 0.3397

tidy(anova(icbm.fit))

##        term df       sumsq      meansq statistic      p.value
## 1     Range  1 44459301501 44459301501  56.50369 1.227730e-07
## 2   Payload  1 27860040544 27860040544  35.40756 4.569196e-06
## 3 Residuals 23 18097293216   786838835        NA           NA

tidy(anova(icbm.fit.2))

##           term df    sumsq     meansq statistic      p.value
## 1  sqrt(Range)  1 8.929760 8.92976013  94.01220 1.364374e-09
## 2 log(Payload)  1 3.408786 3.40878557  35.88757 4.149067e-06
## 3    Residuals 23 2.184658 0.09498512        NA           NA

glance(icbm.fit)

##   r.squared adj.r.squared    sigma statistic      p.value df    logLik
## 1 0.7998455     0.7824408 28050.65  45.95563 9.240612e-09  3 -301.5845
##       AIC      BIC    deviance df.residual
## 1 611.169 616.2014 18097293216          23

glance(icbm.fit.2)

##   r.squared adj.r.squared     sigma statistic      p.value df    logLik
## 1 0.8495747     0.8364942 0.3081966  64.94989 3.460919e-10  3 -4.696116
##        AIC      BIC deviance df.residual
## 1 17.39223 22.42462 2.184658          23

Note. The new model icbm.fit.2 after Box-Cox transformation is as good as relevant too: pay attention for Adjusted R-squared, R-squared, Residual standard error, F-statistic, p.value, sigma, sum of squares (sumsq), meansq, Residuals for sumsq, log-likelihood (logLik), Akaike Information Criterion(AIC), Bayesian Information Criterion (BIC), deviance.

coefficients(icbm.fit.2)

##  (Intercept)  sqrt(Range) log(Payload) 
##   5.66822165   0.02068521   0.43545588

g<-ggplot(ICBM,aes(x=sqrt(Range),y=log(Mass)))
g<-g+geom_point()
g<-g+geom_smooth(method="lm")
g

g2<-ggplot(ICBM,aes(x=log(Payload),y=log(Mass)))
g2<-g2+geom_point()
g2<-g2+geom_smooth(method="lm")
g2

V-2 case

Let’s prove it. We use V-2 data for \(Range\)=320 km and \(Payload\)=1000 kg and two our linear models: icbm.fit and icbm.fit.2.

predict(icbm.fit,data.frame(Range=320,Payload=1000),interval="confidence")

##         fit       lwr      upr
## 1 -5186.383 -35440.53 25067.77

exp(predict(icbm.fit.2,data.frame(Range=320,Payload=1000),interval="confidence"))

##        fit      lwr      upr
## 1 8486.844 5288.784 13618.73

As we see icbm.fit is not good due to the result we got for Mass. On the contrary icbm.fit.2 produced very close result we need: with 95% confidence V-2 Mass is between 5280 kg and 13618 kg. See V-2 Weight=12,500 kg here: https://en.wikipedia.org/wiki/V-2_rocket.

\[{Mass} = e^{5.66 + {0.021*sqrt(Range)} + 0.435*log(Payload)}\]

KN-11 case

Using the formula above we can estimate the weight of North Korean submarine launched KN-11 missile (see https://en.wikipedia.org/wiki/KN-11). Here is a quote from Reuters: “North Korea fired a submarine-launched ballistic missile (SLBM) on Wednesday (24.8.2016) which flew about 500 km (300 miles) towards Japan.” (see http://www.reuters.com/article/us-northkorea-missiles-idUSKCN10Z2TX). Suppose Payload=500 kg.

exp(predict(icbm.fit.2,data.frame(Range=500,Payload=500),interval="confidence"))

##        fit      lwr      upr
## 1 6883.927 4463.102 10617.83

We can be 95% confident that North Korean submarine launched KN-11 missile Mass is between 4463 kg and 10618 kg i.e. less than 11000 kg.

Anyway the current reliability of KN-11 is not appropriate for permanent use:

binom.test(6,11,0.8)

## 
##  Exact binomial test
## 
## data:  6 and 11
## number of successes = 6, number of trials = 11, p-value = 0.05041
## alternative hypothesis: true probability of success is not equal to 0.8
## 95 percent confidence interval:
##  0.2337936 0.8325119
## sample estimates:
## probability of success 
##              0.5454545

hpd <- binom.bayes(6, 11, type = "highest", conf.level = 0.95, tol = 1e-9)
print(hpd)

##   method x  n shape1 shape2      mean    lower     upper  sig
## 1  bayes 6 11    6.5    5.5 0.5416667 0.274871 0.8049937 0.05

binom.bayes.densityplot(hpd)

Conclusions

We come to very interesting conclusion: modern missiles have the same relationship between Mass, Range and Payload as the old V-2! It means Wernher von Braun made some sort of technical miracle that is actual 70 years after. This relationship makes possible to deliver payloads to the Earth orbits as well as to the Moon and other planets of our solar system.