NASA Space Rockets

1 Top 10 List

Recently I came across some interesting presentation “Top 10 Tallest and Powerful Rockets Ever Launched” https://www.youtube.com/watch?v=WmVHC96OCmU. Though this top list is questionable anyhow one can apply linear regression for the data in the presentation.

2 Rockets Data

Here is the data: Thrust,kN; Mass, tons; Height, meters.

library(DT)
library(dplyr)
library(ggplot2)
ten_rockets <- read.csv("ten_rockets", comment.char="#")
attach(ten_rockets)
#ten_rockets %>% datatable
ten_rockets

##         Name Height   Mass Thrust
## 1    Shuttle  56.14 2030.0  35000
## 2    Atlas-V  58.30  334.5  10600
## 3   Ariane-4  58.72  470.0  12120
## 4    Delta-4  62.50  404.6   3560
## 5  Angara-A5  64.00  790.0   7680
## 6   Falcon-H  70.00 1420.0  16000
## 7   Delta-4H  72.00  733.0   6200
## 8     Ares-1  94.20  900.0  15000
## 9        N-1 105.00 2750.0  45000
## 10  Saturn-V 110.60 2970.0  35100

str(ten_rockets)

## 'data.frame':    10 obs. of  4 variables:
##  $ Name  : Factor w/ 10 levels "Angara-A5","Ares-1",..: 10 4 3 5 1 7 6 2 8 9
##  $ Height: num  56.1 58.3 58.7 62.5 64 ...
##  $ Mass  : num  2030 334 470 405 790 ...
##  $ Thrust: int  35000 10600 12120 3560 7680 16000 6200 15000 45000 35100

2.1 Correlation Matrix

round(cor(ten_rockets[,2:4]),3)

##        Height  Mass Thrust
## Height  1.000 0.719  0.605
## Mass    0.719 1.000  0.933
## Thrust  0.605 0.933  1.000

2.2 Correlation Charts

ggplot(ten_rockets,aes(x = Mass, y = Thrust)) +
    geom_point(data = ten_rockets, colour = "darkseagreen4", size = 3) +
      geom_text(aes(label=Name),hjust=0, vjust=0)+
    stat_smooth(method = "lm", col = "red",data =ten_rockets) +
    scale_fill_brewer(palette = "Greys") + 
      xlab(label = "Mass, tons") + ylab(label="Thrust, kN") + 
      ggtitle("Top 10 Powerfull Space Rockets")+ theme_bw()+
      theme(plot.title = element_text(hjust = .5))

    ggplot(ten_rockets,aes(x = Height, y = Thrust)) +
      geom_point(data = ten_rockets, colour = "darkseagreen4", size = 3) +
      geom_text(aes(label=Name),hjust=0, vjust=0)+
      stat_smooth(method = "lm", col = "red",data =ten_rockets) +
      scale_fill_brewer(palette = "Greys") + 
      xlab(label = "Height, m") + ylab(label="Thrust, kN") + 
      ggtitle("Top 10 Powerful Space Rockets")+ theme_bw()+
      theme(plot.title = element_text(hjust = .5))

    ggplot(ten_rockets,aes(x = Height, y = Mass)) +
      geom_point(data = ten_rockets, colour = "darkseagreen4", size = 3) +
      geom_text(aes(label=Name),hjust=0, vjust=0)+
      stat_smooth(method = "lm", col = "red",data =ten_rockets) +
      scale_fill_brewer(palette = "Greys") + 
      xlab(label = "Height, m") + ylab(label="Mass, tons") + 
      ggtitle("Top 10 Powerful Space Rockets")+ theme_bw()+
      theme(plot.title = element_text(hjust = .5))

3 Linear Regression Model

3.1 Frequentist case

3.1.1 Fitting Best Model

fit_r <- lm(Thrust~Mass+Height,ten_rockets) # full model
summary(fit_r)

## 
## Call:
## lm(formula = Thrust ~ Mass + Height, data = ten_rockets)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5762.2 -4539.1    83.2  4154.1  6937.9 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 6435.207   8039.682   0.800 0.449769    
## Mass          15.182      2.776   5.469 0.000937 ***
## Height       -96.420    133.002  -0.725 0.491997    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5657 on 7 degrees of freedom
## Multiple R-squared:  0.8797, Adjusted R-squared:  0.8454 
## F-statistic: 25.61 on 2 and 7 DF,  p-value: 0.000603

fit_r.2 <- lm(Thrust~Mass,ten_rockets) # reduced model
summary(fit_r.2)

## 
## Call:
## lm(formula = Thrust ~ Mass, data = ten_rockets)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6735.1 -4462.8  -721.7  4878.0  6186.6 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1042.421   2957.865   0.352    0.734    
## Mass          13.735      1.871   7.340 8.07e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5487 on 8 degrees of freedom
## Multiple R-squared:  0.8707, Adjusted R-squared:  0.8546 
## F-statistic: 53.88 on 1 and 8 DF,  p-value: 8.068e-05

anova(fit_r,fit_r.2) # comparing models

## Analysis of Variance Table
## 
## Model 1: Thrust ~ Mass + Height
## Model 2: Thrust ~ Mass
##   Res.Df       RSS Df Sum of Sq      F Pr(>F)
## 1      7 224038556                           
## 2      8 240859246 -1 -16820690 0.5256  0.492

AIC(fit_r,fit_r.2)

##         df      AIC
## fit_r    4 205.6262
## fit_r.2  3 204.3502

BIC(fit_r,fit_r.2)

##         df      BIC
## fit_r    4 206.8365
## fit_r.2  3 205.2579

# log transformation
fit_r.3 <- lm(log(Thrust)~ log(Mass),ten_rockets) # reduced log model
summary(fit_r.3)

## 
## Call:
## lm(formula = log(Thrust) ~ log(Mass), data = ten_rockets)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.59335 -0.36011  0.03895  0.26630  0.66547 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   3.4789     1.3643   2.550  0.03418 * 
## log(Mass)     0.8816     0.1971   4.474  0.00207 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4654 on 8 degrees of freedom
## Multiple R-squared:  0.7144, Adjusted R-squared:  0.6787 
## F-statistic: 20.01 on 1 and 8 DF,  p-value: 0.002073

fit_r.4 <- lm(log(Thrust)~ log(Mass) + log(Height),ten_rockets) # full log model
summary(fit_r.4)

## 
## Call:
## lm(formula = log(Thrust) ~ log(Mass) + log(Height), data = ten_rockets)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.58711 -0.37846  0.08124  0.26022  0.66599 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   4.0311     2.8421   1.418   0.1990  
## log(Mass)     0.9227     0.2779   3.321   0.0127 *
## log(Height)  -0.1948     0.8616  -0.226   0.8276  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4957 on 7 degrees of freedom
## Multiple R-squared:  0.7165, Adjusted R-squared:  0.6355 
## F-statistic: 8.845 on 2 and 7 DF,  p-value: 0.01213

anova(fit_r.3,fit_r.4) # comparing log models

## Analysis of Variance Table
## 
## Model 1: log(Thrust) ~ log(Mass)
## Model 2: log(Thrust) ~ log(Mass) + log(Height)
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1      8 1.7327                           
## 2      7 1.7202  1  0.012557 0.0511 0.8276

AIC(fit_r.3,fit_r.4)

##         df      AIC
## fit_r.3  3 16.84980
## fit_r.4  4 18.77707

BIC(fit_r.3,fit_r.4)

##         df      BIC
## fit_r.3  3 17.75755
## fit_r.4  4 19.98741

NOTE: When comparing models fitted by maximum likelihood to the same data, the smaller the AIC or BIC, the better the fit.

3.1.2 Predicting with Model

Here we apply our linear model fit_r.3 as log(Thrust) ~ log(Mass) for V2, MX and RS28 missiles to find Thrust,kN. For more about missiles see: V2 - https://rpubs.com/alex-lev/68593, MX - https://rpubs.com/alex-lev/488966, RS28 - https://en.wikipedia.org/wiki/RS-28_Sarmat

3.1.2.1 V2 Thrust

V2 <- predict(fit_r.3,newdata=data.frame(Mass=12.5),type="response",se.fit=T)
V2.mean <- exp(V2$fit)
V2.se <- exp(V2$se.fit)

V2.mean.up <- V2.mean+2*V2.se
V2.mean.lo <- V2.mean-2*V2.se

V2.mean

##        1 
## 300.5233

V2.mean.up

##        1 
## 305.3026

V2.mean.lo

##       1 
## 295.744

3.1.2.2 MX Thrust

MX <- predict(fit_r.3,newdata=data.frame(Mass=96),type="response",se.fit=T)
MX.mean <- exp(MX$fit)
MX.se <- exp(MX$se.fit)

MX.mean.up <- MX.mean+2*MX.se
MX.mean.lo <- MX.mean-2*MX.se

MX.mean

##       1 
## 1812.92

MX.mean.up

##        1 
## 1816.153

MX.mean.lo

##        1 
## 1809.688

3.1.2.3 RS28 Thrust

RS28 <- predict(fit_r.3,newdata=data.frame(Mass=208.1),type="response",se.fit=T)
RS28.mean <- exp(RS28$fit)
RS28.se <- exp(RS28$se.fit)

RS28.mean.up <- RS28.mean+2*RS28.se
RS28.mean.lo <- RS28.mean-2*RS28.se

RS28.mean

##        1 
## 3585.785

RS28.mean.up

##       1 
## 3588.59

RS28.mean.lo

##       1 
## 3582.98

3.2 Bayesian case

3.2.1 Fitting Best Model

library(rstan)
library(rstanarm)
library(bayesplot)
library(rstantools)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)
fit_rs <- stan_glm(log(Thrust)~log(Mass),ten_rockets, family = "gaussian",
                   prior = student_t(7,0),
                   prior_intercept = student_t(7,0))# reduced log model
summary(fit_rs)

## 
## Model Info:
## 
##  function:     stan_glm
##  family:       gaussian [identity]
##  formula:      log(Thrust) ~ log(Mass)
##  algorithm:    sampling
##  priors:       see help('prior_summary')
##  sample:       4000 (posterior sample size)
##  observations: 10
##  predictors:   2
## 
## Estimates:
##                 mean   sd    2.5%   25%   50%   75%   97.5%
## (Intercept)     3.5    1.5   0.6    2.6   3.5   4.5   6.6  
## log(Mass)       0.9    0.2   0.4    0.7   0.9   1.0   1.3  
## sigma           0.5    0.1   0.3    0.4   0.5   0.6   0.9  
## mean_PPD        9.5    0.2   9.1    9.4   9.5   9.7  10.0  
## log-posterior -13.0    1.3 -16.4  -13.5 -12.7 -12.0 -11.5  
## 
## Diagnostics:
##               mcse Rhat n_eff
## (Intercept)   0.0  1.0  2958 
## log(Mass)     0.0  1.0  2992 
## sigma         0.0  1.0  1975 
## mean_PPD      0.0  1.0  3697 
## log-posterior 0.0  1.0  1256 
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

plot(fit_rs)

mcmc_pairs(fit_rs)

mcmc_trace(fit_rs)

pp_check(fit_rs, plotfun = "stat", stat = "mean")

pp_check(fit_rs, plotfun = "dens_overlay")

fit_rs.2 <- stan_glm(log(Thrust)~log(Mass)+log(Height),ten_rockets, 
                     family = "gaussian",prior = student_t(7,0),
                     prior_intercept = student_t(7,0))# full log model
summary(fit_rs.2)

## 
## Model Info:
## 
##  function:     stan_glm
##  family:       gaussian [identity]
##  formula:      log(Thrust) ~ log(Mass) + log(Height)
##  algorithm:    sampling
##  priors:       see help('prior_summary')
##  sample:       4000 (posterior sample size)
##  observations: 10
##  predictors:   3
## 
## Estimates:
##                 mean   sd    2.5%   25%   50%   75%   97.5%
## (Intercept)     4.0    3.4  -2.7    2.0   4.0   6.0  10.9  
## log(Mass)       0.9    0.3   0.3    0.7   0.9   1.1   1.5  
## log(Height)    -0.2    1.0  -2.2   -0.8  -0.2   0.4   1.9  
## sigma           0.6    0.2   0.3    0.4   0.5   0.6   1.0  
## mean_PPD        9.5    0.3   9.0    9.4   9.5   9.7  10.1  
## log-posterior -14.7    1.7 -19.0  -15.5 -14.3 -13.4 -12.6  
## 
## Diagnostics:
##               mcse Rhat n_eff
## (Intercept)   0.1  1.0  2686 
## log(Mass)     0.0  1.0  2215 
## log(Height)   0.0  1.0  1859 
## sigma         0.0  1.0  1633 
## mean_PPD      0.0  1.0  2896 
## log-posterior 0.0  1.0  1186 
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

plot(fit_rs.2)

mcmc_pairs(fit_rs.2)

mcmc_trace(fit_rs.2)

pp_check(fit_rs.2, plotfun = "stat", stat = "mean")

pp_check(fit_rs.2, plotfun = "dens_overlay")

3.2.2 Predicting with models

Here we apply both models for predicting posterior samples.

3.2.2.1 V2 Thrust

library(rstantools)

## This is rstantools version 1.5.1

res_pred_v2 <- posterior_predict(fit_rs.2,newdata = data.frame(Mass=12.5,Height=14),
                              draws = 3000,seed=12345)

## Registered S3 method overwritten by 'xts':
##   method     from
##   as.zoo.xts zoo

quantile(exp(res_pred_v2),probs = 0.5)

##      50% 
## 342.6689

res_pred_v2_2 <- posterior_predict(fit_rs,newdata = data.frame(Mass=12.5),
                                 draws = 3000,seed = 12345)

quantile(exp(res_pred_v2_2),probs = 0.5)

##      50% 
## 309.5671

3.2.2.2 MX Thrust

res_pred_mx_2 <- posterior_predict(fit_rs.2,newdata = data.frame(Mass=96,Height=22),
                              draws = 3000,seed=12345)

quantile(exp(res_pred_mx_2),probs = 0.5)

##      50% 
## 2026.561

res_pred_mx <- posterior_predict(fit_rs,newdata = data.frame(Mass=96),
                                 draws = 3000,seed=12345)

quantile(exp(res_pred_mx),probs = 0.5)

##      50% 
## 1854.822

3.2.2.3 RS28 Thrust

res_pred_RS28_2 <- posterior_predict(fit_rs.2,newdata = data.frame(Mass=208.1,Height=35.5), draws = 3000,seed=12345)

quantile(exp(res_pred_RS28_2),probs = 0.5)

##      50% 
## 3794.158

res_pred_RS28 <- posterior_predict(fit_rs,newdata = data.frame(Mass=208.1),
                                 draws = 3000,seed=12345)

quantile(exp(res_pred_RS28),probs = 0.5)

##     50% 
## 3642.81

3.3 Frequentist versus Bayesian

Here we compare both cases as regard of produced results out of linear models.

3.3.1 Frequentist

3.3.1.1 V2

\(295.7\le M[T]_{V2,freq} \le305.3\) True mean thrust for V2 with 95% confidence

3.3.1.2 MX

\(1809.6\le M[T]_{MX,freq} \le1816.1\) True mean thrust for MX with 95% confidence

3.3.1.3 RS28

\(3583.0\le M[T]_{RS28,freq} \le3588.6\) True mean thrust for RS28 with 95% confidence

3.3.2 Bayesian

3.3.2.1 V2

\(303.7\le M[T]_{V2,bayes}\le 330\) Mean thrust for V2 by two Bayesian models, 95% credible

3.3.2.2 MX

\(1830.64\le M[T]_{MX,freq} \le2030.5\) Mean thrust for MX by two Bayesian models, 95% credible

3.3.2.3 RS28

\(1213.13\le M[T]_{RS28,freq} \le10704.7\) Mean thrust for RS28 by two Bayesian models, 95% credible

3.3.3 True historical values

3.3.3.1 V2

The V-2 was 14 meters (47 feet) long, weighed 12,700–13,200 kg (28,000–29,000 pounds) at launching, and developed about 60,000 pounds of thrust, burning alcohol and liquid oxygen. https://www.britannica.com/technology/V-2-missile

V2_THRUST=60000*0.45*9.8/1000
V2_THRUST

## [1] 264.6

3.3.3.2 MX

Three-stage solid-fuel rocket. First stage: 500,000 lbf (2.2 MN thrust) Thiokol SR118 solid-fuel rocket motor https://en.wikipedia.org/wiki/LGM-118_Peacekeeper

MX_THRUST=500000*0.45*9.8/1000
MX_THRUST

## [1] 2205

3.3.3.3 RS28

The RS-28 Sarmat is a Russian liquid-fueled, MIRV-equipped, heavy thermonuclear ICBM, mass 208.1 tons, length 35.5 meters. https://en.wikipedia.org/wiki/RS-28_Sarmat

No information is currently available about real test flights except ejection tests.

For some operational range estimations see: https://rpubs.com/alex-lev/376345

As we can see RS28 has mass twice than that of the MX, so the real thrust for the first stage could be twice too, that is about 4000 kN that makes our estimates reliable.

4 Conclusions

Both frequentist and Bayesian models can be applied for linear regression.
Bayesian models are more robust with few data.
Priors should be chosen carefully for Bayesian models as well as transformations for frequentist models.

Top 10 Space Rockets. Linear regression

Alexander Levakov, Senior Research Fellow, Ph.D

February, 2020