We use ICBM data (http://en.wikipedia.org/wiki/Comparison_of_ICBMs) to explore the relationship between Mass (kg), Range (km) and Payload (kg) of ballistic missile. Some interesting fact will be found in the end!
library(car)
library(MASS)
library(ggplot2)
library(broom)
library(binom)
ICBM <- read.table("icbmdat.txt", header=TRUE, quote="\"")
attach(ICBM)
ICBM
## Range Mass Payload
## 1 16000 209600 20000
## 2 11200 211100 20000
## 3 10000 105600 5000
## 4 10500 45100 800
## 5 11000 47200 800
## 6 5750 49000 2000
## 7 6500 35300 1650
## 8 6500 34388 1650
## 9 9000 35300 1650
## 10 8300 40300 2800
## 11 13000 35300 510
## 12 9660 32000 1200
## 13 11300 58500 1900
## 14 5000 36000 900
## 15 6000 35000 660
## 16 10000 52000 1000
## 17 12000 183000 5000
## 18 15000 183000 4000
## 19 7000 82000 3300
## 20 8000 42000 1000
## 21 1250 12000 1000
## 22 3500 16000 1000
## 23 5000 22000 2500
## 24 4000 17000 1000
## 25 5800 50000 1500
## 26 10000 55000 1000
One can see correlations between Mass, Range and Payload. Let’s prove it significance.
scatterplotMatrix(ICBM,ellipse = TRUE)
cor(ICBM)
## Range Mass Payload
## Range 1.0000000 0.7012246 0.4796463
## Mass 0.7012246 1.0000000 0.8234138
## Payload 0.4796463 0.8234138 1.0000000
cor.test(Mass,Range)
##
## Pearson's product-moment correlation
##
## data: Mass and Range
## t = 4.8185, df = 24, p-value = 6.589e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4309193 0.8560544
## sample estimates:
## cor
## 0.7012246
cor.test(Mass,Payload)
##
## Pearson's product-moment correlation
##
## data: Mass and Payload
## t = 7.1089, df = 24, p-value = 2.389e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.6402797 0.9179763
## sample estimates:
## cor
## 0.8234138
Note. We proved significance of correlations both for \(Mass,Range\) and \(Mass,Payload\) pairs.
Now we can use lm for our research goal. But we need some sort of test to be confident that our model is relevant.
icbm.fit<-lm(Mass~Range+Payload)
summary(icbm.fit)
##
## Call:
## lm(formula = Mass ~ Range + Payload)
##
## Residuals:
## Min 1Q Median 3Q Max
## -40817 -16562 -4474 10515 80057
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -14797.376 14823.221 -0.998 0.32854
## Range 6.700 1.791 3.742 0.00107 **
## Payload 7.467 1.255 5.950 4.57e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 28050 on 23 degrees of freedom
## Multiple R-squared: 0.7998, Adjusted R-squared: 0.7824
## F-statistic: 45.96 on 2 and 23 DF, p-value: 9.241e-09
par(mfrow=c(2,2))
plot(icbm.fit)
par(mfrow=c(1,1))
qqPlot(icbm.fit)
shapiro.test(icbm.fit$res)
##
## Shapiro-Wilk normality test
##
## data: icbm.fit$res
## W = 0.87024, p-value = 0.003606
Note. We conclude icbm.fit is not relevant for our research goal due to non normal residuals. So we need some transformation using boxCox.
boxCox(icbm.fit)
icbm.fit.2<-lm(log(Mass)~sqrt(Range)+log(Payload))
summary(icbm.fit.2)
##
## Call:
## lm(formula = log(Mass) ~ sqrt(Range) + log(Payload))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.53912 -0.23756 0.01578 0.23230 0.47421
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.668222 0.504820 11.228 8.23e-11 ***
## sqrt(Range) 0.020685 0.003258 6.348 1.77e-06 ***
## log(Payload) 0.435456 0.072690 5.991 4.15e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3082 on 23 degrees of freedom
## Multiple R-squared: 0.8496, Adjusted R-squared: 0.8365
## F-statistic: 64.95 on 2 and 23 DF, p-value: 3.461e-10
par(mfrow=c(2,2))
plot(icbm.fit.2)
par(mfrow=c(1,1))
qqPlot(icbm.fit.2)
shapiro.test(icbm.fit.2$res)
##
## Shapiro-Wilk normality test
##
## data: icbm.fit.2$res
## W = 0.95722, p-value = 0.3397
tidy(anova(icbm.fit))
## term df sumsq meansq statistic p.value
## 1 Range 1 44459301501 44459301501 56.50369 1.227730e-07
## 2 Payload 1 27860040544 27860040544 35.40756 4.569196e-06
## 3 Residuals 23 18097293216 786838835 NA NA
tidy(anova(icbm.fit.2))
## term df sumsq meansq statistic p.value
## 1 sqrt(Range) 1 8.929760 8.92976013 94.01220 1.364374e-09
## 2 log(Payload) 1 3.408786 3.40878557 35.88757 4.149067e-06
## 3 Residuals 23 2.184658 0.09498512 NA NA
glance(icbm.fit)
## r.squared adj.r.squared sigma statistic p.value df logLik
## 1 0.7998455 0.7824408 28050.65 45.95563 9.240612e-09 3 -301.5845
## AIC BIC deviance df.residual
## 1 611.169 616.2014 18097293216 23
glance(icbm.fit.2)
## r.squared adj.r.squared sigma statistic p.value df logLik
## 1 0.8495747 0.8364942 0.3081966 64.94989 3.460919e-10 3 -4.696116
## AIC BIC deviance df.residual
## 1 17.39223 22.42462 2.184658 23
Note. The new model icbm.fit.2 after Box-Cox transformation is as good as relevant too: pay attention for Adjusted R-squared, R-squared, Residual standard error, F-statistic, p.value, sigma, sum of squares (sumsq), meansq, Residuals for sumsq, log-likelihood (logLik), Akaike Information Criterion(AIC), Bayesian Information Criterion (BIC), deviance.
coefficients(icbm.fit.2)
## (Intercept) sqrt(Range) log(Payload)
## 5.66822165 0.02068521 0.43545588
g<-ggplot(ICBM,aes(x=sqrt(Range),y=log(Mass)))
g<-g+geom_point()
g<-g+geom_smooth(method="lm")
g
g2<-ggplot(ICBM,aes(x=log(Payload),y=log(Mass)))
g2<-g2+geom_point()
g2<-g2+geom_smooth(method="lm")
g2
Let’s prove it. We use V-2 data for \(Range\)=320 km and \(Payload\)=1000 kg and two our linear models: icbm.fit and icbm.fit.2.
predict(icbm.fit,data.frame(Range=320,Payload=1000),interval="confidence")
## fit lwr upr
## 1 -5186.383 -35440.53 25067.77
exp(predict(icbm.fit.2,data.frame(Range=320,Payload=1000),interval="confidence"))
## fit lwr upr
## 1 8486.844 5288.784 13618.73
As we see icbm.fit is not good due to the result we got for Mass. On the contrary icbm.fit.2 produced very close result we need: with 95% confidence V-2 Mass is between 5280 kg and 13618 kg. See V-2 Weight=12,500 kg here: https://en.wikipedia.org/wiki/V-2_rocket.
\[{Mass} = e^{5.66 + {0.021*sqrt(Range)} + 0.435*log(Payload)}\]
Using the formula above we can estimate the weight of North Korean submarine launched KN-11 missile (see https://en.wikipedia.org/wiki/KN-11). Here is a quote from Reuters: “North Korea fired a submarine-launched ballistic missile (SLBM) on Wednesday (24.8.2016) which flew about 500 km (300 miles) towards Japan.” (see http://www.reuters.com/article/us-northkorea-missiles-idUSKCN10Z2TX). Suppose Payload=500 kg.
exp(predict(icbm.fit.2,data.frame(Range=500,Payload=500),interval="confidence"))
## fit lwr upr
## 1 6883.927 4463.102 10617.83
We can be 95% confident that North Korean submarine launched KN-11 missile Mass is between 4463 kg and 10618 kg i.e. less than 11000 kg.
Anyway the current reliability of KN-11 is not appropriate for permanent use:
binom.test(6,11,0.8)
##
## Exact binomial test
##
## data: 6 and 11
## number of successes = 6, number of trials = 11, p-value = 0.05041
## alternative hypothesis: true probability of success is not equal to 0.8
## 95 percent confidence interval:
## 0.2337936 0.8325119
## sample estimates:
## probability of success
## 0.5454545
hpd <- binom.bayes(6, 11, type = "highest", conf.level = 0.95, tol = 1e-9)
print(hpd)
## method x n shape1 shape2 mean lower upper sig
## 1 bayes 6 11 6.5 5.5 0.5416667 0.274871 0.8049937 0.05
binom.bayes.densityplot(hpd)
We come to very interesting conclusion: modern missiles have the same relationship between Mass, Range and Payload as the old V-2! It means Wernher von Braun made some sort of technical miracle that is actual 70 years after. This relationship makes possible to deliver payloads to the Earth orbits as well as to the Moon and other planets of our solar system.