The German Tank Problem

A Shiny Solution

What is the German Tank Problem?

Formally, the problem of estimating the maximum of a discrete uniform distribution from sampling without replacement
Named due to WW2 - Allies wanted to estimate the total number of German Tanks from the serial numbers of captured tanks.
Reasoning relies on the mediocrity principle: Very unlikely that a random sample of the serial numbers would all be clustered at the end or the beginning of the set of numbers.
Has also been used to estimate iPod and Commodore 64 production - you can use it with random user ID's to estimate traffic to websites etc.

Frequentist Approach

obs<-c(2,6,7,14)
m<-max(obs)
k<-length(obs)
freqN<-m+(m/k)-1
freqN

[1] 16.5

lowconfinv<-m/(0.975^(1/k))
highconfinv<-m/(0.025^(1/k))
paste0("[",format(lowconfinv,digits=5),",",
       format(highconfinv,digits=5),"]")

[1] "[14.089,35.208]"

Point estimate with confidence intervals.

Bayesian Approach

obs<-c(2,6,7,14)
m<-max(obs)
k<-length(obs)
bayesMean<-(m-1)*((k-1)/(k-2))
bayesSD<-sqrt(((k-1)*(m-1)*(m-k+1))
              /((k-3)*((k-2)^2)))
paste0(format(bayesMean,digits=5),"±",
       format(bayesSD,digits=5))

[1] "19.5±10.356"

Can estimate probability distribution - only computed parameters here as computing plot of distribution can be computationally intensive.

Warnings

Does not perform well with small numbers of observations.
Must factor in bias of samples (i.e. what if the Germans sent all the old tanks to Africa?)
What if there are different sets of ID numbers?
Be careful with your own data - are you giving away more info than you realise?