Suppose the dogs are clustered so a proportion of exactly propOccupied blocks are occupied by dogs, and assume that the dogs are always clustered with an equal number in each block.
Define the function dog which first calculates, for a set given size sample and a given propOccupied, a vector db which is the probability that the sample will include a given number of unoccupied blocks from 0 to the sample size, and the vector under which says, for the corresponding number of unoccupied blocks, whether this proportion of the total sample is less than propOccupied, i.e. would lead to an underestimate.
The function returns the sum of the product of these two vectors, i.e. the total of the probabilities of just those outcomes which lead to underestimates.
dog = function(sample, propOccupied) {
db = dbinom(0:sample, sample, propOccupied)
under = (((0:sample)/sample)) < propOccupied
sum(db * under)
}
So for example if:
sample = 3
propOccupied = 1/3
The answer is 0.2963
Plotting the probability of an underestimate for sample size from 1 to 100 in steps of 5, and with propOccupied from .01 to .5 in steps of .05 gives a surprising figure.
draws = seq(1, 100, 1)
steps = seq(0.01, 0.5, 0.05)
s = sapply(draws, function(x) sapply(steps, function(y) dog(x, y)))
rownames(s) = steps
colnames(s) = draws
require(pheatmap)
## Loading required package: pheatmap
main = "Prob of underestimation: x axis=sample size; y axis=prop occupied blocks"
pheatmap(s, cluster_rows = F, cluster_cols = F, main = main)
There is a sequential pattern based on the sample size that I don't understand.
In this first graphic, it is hard to see the exact values. So in the following graphic, the probabilities are broken into three ranges: 0-.4 (red), .4-.6 (yellow) and .6-1 (blue).
s = matrix(sapply(cut(s, c(0, 0.4, 0.6, 1), labels = 1:3), as.numeric),
nrow = dim(s)[1])
rownames(s) = steps
colnames(s) = draws
pheatmap(s, cluster_rows = F, cluster_cols = F, main = main, legend = F)