U-576

U-576

U-boat

During World War II, U-boat warfare was the major component of the Battle of the Atlantic, which lasted the duration of the war. Prime Minister Winston Churchill wrote “The only thing that really frightened me during the war was the U-boat peril.”

See https://en.wikipedia.org/wiki/U-boat#World_War_II

Data

Here we can get some interesting data about German submarine losses during 1943 - climax of the Battle for Atlantic: http://uboat.net/fates/losses/1943.htm. Let’s do it!

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(BEST)
load(file = "ub1943.dat")
ub1943
## # A tibble: 236 <U+00D7> 7
##    Month Uboat  Plus  Crew  Dead  Surv    Cause
##    <chr> <chr> <int> <int> <int> <int>    <chr>
## 1    Jan U-337     1    47    47     0  Missing
## 2    Jan U-164     0    56    54     2 Aircraft
## 3    Jan U-224     0    46    45     1  Warship
## 4    Jan U-507     1    54    54     0 Aircraft
## 5    Jan U-553     1    47    47     0  Missing
## 6    Jan U-301     0    46    45     1  Warship
## 7    Jan U-519     1    50    50     0  Missing
## 8    Feb U-265     1    46    46     0 Aircraft
## 9    Feb U-187     0    54     9    45  Warship
## 10   Feb U-609     1    47    47     0  Warship
## # ... with 226 more rows

Note: \(Plus=1\) means that u-boat crew was lost for 100%: \(Crew - Dead=0\)

Research goal

We want to compare Bayesian t-test and frequentist t-test for the mean values of survived members of the crews by month. We have null hypthesis \(H_0|M1-M2=0\) and alternative - \(H_1|M1-M2{\neq}0\)

Filtering data for analysis

#AIRCRAFT
#Aircraft absolute values of the survived members by month
by_month_air_0<-filter(ub1943,Plus==0, Cause=="Aircraft") %>% group_by(Month) %>% summarise(min=min(Surv), mean=mean(Surv), max=max(Surv),sd=sd(Surv), var=sd/mean)


#WARSHIP
#Warship absolute values of the survived members by month
by_month_ship_0<-filter(ub1943,Plus==0, Cause=="Warship") %>% group_by(Month) %>% summarise(min=min(Surv), mean=mean(Surv), max=max(Surv),sd=sd(Surv), var=sd/mean)

Frequentist t-test

#Warships and aircraft
#Normality test
shapiro.test(by_month_ship_0$mean)
## 
##  Shapiro-Wilk normality test
## 
## data:  by_month_ship_0$mean
## W = 0.98097, p-value = 0.9713
shapiro.test(by_month_air_0$mean)
## 
##  Shapiro-Wilk normality test
## 
## data:  by_month_air_0$mean
## W = 0.90676, p-value = 0.1939
#Variance equality test
var.test(by_month_ship_0$mean, by_month_air_0$mean)
## 
##  F test to compare two variances
## 
## data:  by_month_ship_0$mean and by_month_air_0$mean
## F = 0.49941, num df = 10, denom df = 11, p-value = 0.2841
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.1416498 1.8302967
## sample estimates:
## ratio of variances 
##          0.4994106
#Student test for mean values 
t.test(by_month_ship_0$mean, by_month_air_0$mean)
## 
##  Welch Two Sample t-test
## 
## data:  by_month_ship_0$mean and by_month_air_0$mean
## t = -0.93669, df = 19.79, p-value = 0.3602
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -17.963811   6.835432
## sample estimates:
## mean of x mean of y 
##  22.90909  28.47328

Bayesian t-test

#Warships and Aircraft means
set.seed(12345)
y1<-c()
y2<-c()
y1 <- by_month_ship_0$mean
y2 <- by_month_air_0$mean
BESTout <- BESTmcmc(y1, y2)
## 
## Processing function input....... 
## 
## Done. 
##  
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 23
##    Unobserved stochastic nodes: 5
##    Total graph size: 69
## 
## Initializing model
## 
## Adaptive phase, 500 iterations x 3 chains 
## If no progress bar appears JAGS has decided not to adapt 
##  
## 
##  Burn-in phase, 1000 iterations x 3 chains 
##  
## 
## Sampling from joint posterior, 33334 iterations x 3 chains 
##  
## 
## MCMC took 0.866 minutes.
summary(BESTout)
##             mean median   mode HDI%   HDIlo  HDIup compVal %>compVal
## mu1       22.929 22.931 22.993   95  14.706 31.381                  
## mu2       28.585 28.592 28.500   95  17.333 39.900                  
## muDiff    -5.655 -5.660 -5.321   95 -19.606  8.161       0      20.2
## sigma1    12.959 12.296 11.037   95   6.972 20.420                  
## sigma2    18.460 17.653 15.720   95  10.690 28.034                  
## sigmaDiff -5.500 -5.261 -4.501   95 -17.986  6.199       0      15.2
## nu        36.081 27.589 11.325   95   1.479 95.522                  
## log10nu    1.415  1.441  1.501   95   0.687  2.099                  
## effSz     -0.364 -0.361 -0.321   95  -1.231  0.482       0      20.2
plot(BESTout, "mean")

Conclusions

So what these math, tables and diagrams are all about? Do we see something interesting?

  1. Frequentist t-test proves null hypothesis about equality of two means with p-value = 0.3602, i.e. we can’t reject null hypothesis. If we reject null there is 36% chance that it is true - error type I. So we can’t do it.
  2. Frequentist t-test gives 95% confident interval for the difference of two means as \(-17.963811 < M1 - M2 < 6.835432\)
  3. Bayesian t-test gives us high density 95% credible interval for the difference of two means as \(-19.4 < M1 - M2 < 8.5\).
  4. Bayes gives us more - there is 20% chance that true difference would be greater than zero \(M1-M2>0\), and 80% - less than zero $M1-M2<0.
  5. As frequentists we accept null hypothesis about equality of two means, as bayesians we believe that 80% of two means true difference is less than zero - warships make less survivors in u-boats than aircraft with 80% probability. That’s it.