Terminal High Altitude Area Defense interceptor
Terminal High Altitude Area Defense (THAAD) is an American anti-ballistic missile defense system designed to shoot down short-, medium-, and intermediate-range ballistic missiles in their terminal phase (descent or reentry) by intercepting with a hit-to-kill approach. THAAD was developed after the experience of Iraq’s Scud missile attacks during the Gulf War in 1991. The THAAD interceptor carries no warhead, but relies on its kinetic energy of impact to destroy the incoming missile. A kinetic energy hit minimizes the risk of exploding conventional-warhead ballistic missiles, and the warhead of nuclear-tipped ballistic missiles will not detonate upon a kinetic-energy hit.
For more see:
https://en.wikipedia.org/wiki/Terminal_High_Altitude_Area_Defense
We can import THAAD test events from the open source WIKI data like this, applying substitution FAILURE=FALSE and SUCCESS=TRUE. In fact we can produce two data sets for THAAD test events: Demonstration and Validation (DV) phase and Engineering and Manufacturing (EM) phase.
library(rvest)
## Loading required package: xml2
library(htmlTable)
library(DT)
url <- "https://en.wikipedia.org/wiki/Terminal_High_Altitude_Area_Defense"
#Demonstration and validation phase
thaad_test <- url %>%
read_html()%>%
html_nodes(xpath='//*[@id="mw-content-text"]/div/table[2]') %>%
html_table()
thaad_test_data <- data.frame(cbind(thaad_test[[1]]$Date,thaad_test[[1]]$Result))
thaad_test_data[,2] <- ifelse(thaad_test[[1]]$Result=="Failure",FALSE,TRUE)
names(thaad_test_data)=c("Date","Result")
datatable(thaad_test_data,caption = "Demonstration and validation")
#print("Demonstration and validation")
#print(thaad_test_data)
#Engineering and manufacturing phase
thaad_test_2 <- url %>%
read_html()%>%
html_nodes(xpath='//*[@id="mw-content-text"]/div/table[3]') %>%
html_table()
thaad_test_data_2 <- data.frame(cbind(thaad_test_2[[1]]$Date,thaad_test_2[[1]]$Result))
thaad_test_data_2[,2] <- ifelse(thaad_test_2[[1]]$Result=="Success",TRUE,FALSE)
names(thaad_test_data_2)=c("Date","Result")
datatable(thaad_test_data_2,caption = "Engineering and manufacturing")
#print("Engineering and manufacturing")
#print(thaad_test_data_2)
summary(thaad_test_data$Result)
## Mode FALSE TRUE
## logical 6 5
NS <- sum(thaad_test_data$Result)
NALL <-length(thaad_test_data$Result)
prop_success <- NS/NALL
prop_success
## [1] 0.4545455
As we see THAAD has got 45.5% reliability according to the test results during Demonstration and Validation phase.
summary(thaad_test_data_2$Result)
## Mode FALSE TRUE
## logical 4 13
NS2 <- sum(thaad_test_data_2$Result)
NALL2 <-length(thaad_test_data_2$Result)
prop_success2 <- NS2/NALL2
prop_success2
## [1] 0.7647059
As we see THAAD has got 76.5% reliability according to the test results during Engineering and Manufacturing phase. Can we make out something else applying current data? Let’s see!
binom.test(NS,NALL,0.85)
##
## Exact binomial test
##
## data: NS and NALL
## number of successes = 5, number of trials = 11, p-value = 0.002657
## alternative hypothesis: true probability of success is not equal to 0.85
## 95 percent confidence interval:
## 0.1674881 0.7662064
## sample estimates:
## probability of success
## 0.4545455
binom.test(NS2,NALL2,0.85)
##
## Exact binomial test
##
## data: NS2 and NALL2
## number of successes = 13, number of trials = 17, p-value = 0.3075
## alternative hypothesis: true probability of success is not equal to 0.85
## 95 percent confidence interval:
## 0.5010067 0.9318923
## sample estimates:
## probability of success
## 0.7647059
One can see that THAAD has made great success during Engineering and Manufacturing phase compared to the Demonstration and Validation
We apply THAAD test data (both phases) for the proportion of success as function of the current test event using beta prior as Bayesian likelihood.
#The prop_model function - Rasmus Bååth R code
# This function takes a number of successes and failuers coded as a TRUE/FALSE
# or 0/1 vector. This should be given as the data argument.
# The result is a visualization of the how a Beta-Binomial
# model gradualy learns the underlying proportion of successes
# using this data. The function also returns a sample from the
# posterior distribution that can be further manipulated and inspected.
# The default prior is a Beta(1,1) distribution, but this can be set using the
# prior_prop argument.
# Make sure the packages tidyverse and ggridges are installed, otherwise run:
# install.packages(c("tidyverse", "ggridges"))
# Example usage:
# data <- c(TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE)
# prop_model(data)
prop_model <- function(data = c(), prior_prop = c(1, 1), n_draws = 10000,
gr_name="Proportion graph") {
library(tidyverse)
data <- as.logical(data)
# data_indices decides what densities to plot between the prior and the posterior
# For 20 datapoints and less we're plotting all of them.
data_indices <- round(seq(0, length(data), length.out = min(length(data) + 1, 40)))
# dens_curves will be a data frame with the x & y coordinates for the
# denities to plot where x = proportion_success and y = probability
proportion_success <- c(0, seq(0, 1, length.out = 100), 1)
dens_curves <- map_dfr(data_indices, function(i) {
value <- ifelse(i == 0, "Prior", ifelse(data[i], "Success", "Failure"))
label <- paste0("n=", i)
probability <- dbeta(proportion_success,
prior_prop[1] + sum(data[seq_len(i)]),
prior_prop[2] + sum(!data[seq_len(i)]))
probability <- probability / max(probability)
data_frame(value, label, proportion_success, probability)
})
# Turning label and value into factors with the right ordering for the plot
dens_curves$label <- fct_rev(factor(dens_curves$label, levels = paste0("n=", data_indices )))
dens_curves$value <- factor(dens_curves$value, levels = c("Prior", "Success", "Failure"))
graph_label <- paste("Prior likelihood distribution Beta(a =",
as.character(prior_prop[1]),", b =",
as.character(prior_prop[2]),")")
p <- ggplot(dens_curves, aes(x = proportion_success, y = label,
height = probability, fill = value)) +
ggridges::geom_density_ridges(stat="identity", color = "white", alpha = 0.8,
panel_scaling = TRUE, size = 1) +
scale_y_discrete("", expand = c(0.01, 0)) +
scale_x_continuous("Proportion of success") +
scale_fill_manual(values = hcl(120 * 2:0 + 15, 100, 65), name = "", drop = FALSE,
labels = c("Prior ", "Success ", "Failure ")) +
ggtitle(paste0(gr_name, ": ", sum(data), " successes, ", sum(!data), " failures"),
subtitle = graph_label) +
labs(caption = "based on Rasmus Bååth R code") +
theme_light() +
theme(legend.position = "top")
print(p)
# Returning a sample from the posterior distribution that can be further
# manipulated and inspected
posterior_sample <- rbeta(n_draws, prior_prop[1] + sum(data), prior_prop[2] + sum(!data))
invisible(posterior_sample)
}
prop_model(data = thaad_test_data$Result,prior_prop = c(1,1),gr_name = "Demonstration and validation")
prop_model(data = thaad_test_data$Result,prior_prop = c(2,2),gr_name = "Demonstration and validation")
prop_model(data = thaad_test_data_2$Result,prior_prop = c(1,1),gr_name = "Engineering and manufacturing")
prop_model(data = thaad_test_data_2$Result,prior_prop = c(2,2),gr_name = "Engineering and manufacturing")
See relevant math stuff here:
##
## data {
## int<lower=0> N;
## int<lower=0,upper=1> y[N];
##
## real a;
## real b;
## }
##
## transformed data{
## int<lower=0,upper=1> yy[N];
## for (k in 1:N)
## yy[k]=abs(1-y[k]);
## }
##
##
## parameters {
## real<lower=0,upper=1> theta[N];
## }
## model {
## for (j in 1:N){
## theta[j] ~ beta(a+sum(y[1:j]),b+sum(yy[1:j]));
## for (n in 1:j) //
## y[j] ~ bernoulli(theta[j]);
## }
## }
##
## generated quantities {
## real<lower=0,upper=1> pp[N];
## for (j in 1:N)
## pp[j]=beta_rng(a+sum(y[1:j]),b+sum(yy[1:j]));
## }
y <- as.integer(thaad_test_data$Result)
N <- length(y)
data_list <- list(N=N,y=y,a=2,b=2)
fit.ber <- stan(model_code = stan_code,data = data_list,chains = 2,verbose = FALSE,
iter = 5000,warmup = 1000,seed=12345)
rstan::traceplot(fit.ber)
s1 <- as.array(fit.ber)
pp_par <- paste("pp[",1:NALL,"]",sep = "")
pp_par2 <- paste("theta[",1:NALL,"]",sep = "")
bayesplot::mcmc_intervals(x = s1,pars = pp_par)
bayesplot::mcmc_intervals(x = s1,pars = pp_par2)
y2 <- as.integer(thaad_test_data_2$Result)
N2 <- length(y2)
data_list <- list(N=N2,y=y2,a=2,b=2)
fit.ber.2 <- stan(model_code = stan_code,data = data_list,chains = 2,verbose = FALSE,
iter = 5000,warmup = 1000,seed=12345)
rstan::traceplot(fit.ber.2)
s2 <- as.array(fit.ber.2)
pp_par2 <- paste("pp[",1:NALL2,"]",sep = "")
pp_par22 <- paste("theta[",1:NALL2,"]",sep = "")
bayesplot::mcmc_intervals(x = s2,pars = pp_par2)
bayesplot::mcmc_intervals(x = s2,pars = pp_par22)
last_par_pp<- paste("pp[",NALL,"]",sep = "")
last_par_theta<- paste("theta[",NALL,"]",sep = "")
bayesplot::mcmc_hist(x = s1,pars = c(last_par_pp,last_par_theta))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot(fit.ber,pars = c(last_par_pp,last_par_theta))
## ci_level: 0.8 (80% intervals)
## outer_level: 0.95 (95% intervals)
print(fit.ber,pars = c(last_par_pp,last_par_theta))
## Inference for Stan model: 8110aaf154812cced3e43a9d64740f79.
## 2 chains, each with iter=5000; warmup=1000; thin=1;
## post-warmup draws per chain=4000, total post-warmup draws=8000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## pp[11] 0.47 0 0.12 0.23 0.38 0.46 0.55 0.71 8000 1
## theta[11] 0.69 0 0.09 0.51 0.63 0.70 0.76 0.85 8000 1
##
## Samples were drawn using NUTS(diag_e) at Tue Mar 19 16:26:59 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
last_par_pp_2<- paste("pp[",NALL2,"]",sep = "")
last_par_theta_2<- paste("theta[",NALL2,"]",sep = "")
bayesplot::mcmc_hist(x = s2,pars = c(last_par_pp_2,last_par_theta_2))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot(fit.ber.2,pars = c(last_par_pp_2,last_par_theta_2))
## ci_level: 0.8 (80% intervals)
## outer_level: 0.95 (95% intervals)
print(fit.ber.2,pars = c(last_par_pp_2,last_par_theta_2))
## Inference for Stan model: 8110aaf154812cced3e43a9d64740f79.
## 2 chains, each with iter=5000; warmup=1000; thin=1;
## post-warmup draws per chain=4000, total post-warmup draws=8000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## pp[17] 0.71 0 0.10 0.50 0.65 0.72 0.78 0.89 8000 1
## theta[17] 0.84 0 0.06 0.72 0.81 0.85 0.88 0.94 8000 1
##
## Samples were drawn using NUTS(diag_e) at Tue Mar 19 16:27:20 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
Here is a moment of truth - we see significant difference between two distributions of the posterior samples: \(theta\), produced by the data through Bernoulli experiment with the prior Beta(a,b) distribution as Bayesian likelihood and \(pp\), produced by Beta(a,b) random distribution only. The last one is similar to the proportion model above. Both samples use Gaussian like \(Beta(2,2)\) distribution. Now we are ready to make the final conclusion about THAAD reliability.