Probability of underestimating the total dog population based on a sample of city blocks when the dogs are clustered.

Suppose the dogs are clustered so a proportion of exactly propOccupied blocks are occupied by dogs, and assume that the dogs are always clustered with an equal number in each block.

Define the function dog which first calculates, for a set given size sample and a given propOccupied, a vector db which is the probability that the sample will include a given number of unoccupied blocks from 0 to the sample size, and the vector under which says, for the corresponding number of unoccupied blocks, whether this proportion of the total sample is less than propOccupied, i.e. would lead to an underestimate.
The function returns the sum of the product of these two vectors, i.e. the total of the probabilities of just those outcomes which lead to underestimates.

dog = function(sample, propOccupied) {
    db = dbinom(0:sample, sample, propOccupied)
    under = (((0:sample)/sample)) < propOccupied
    sum(db * under)
}

So for example if:

sample = 3
propOccupied = 1/3

The answer is 0.2963

Plotting the probability of an underestimate for sample size from 1 to 100 in steps of 5, and with propOccupied from .01 to .5 in steps of .05 gives a surprising figure.

draws = seq(1, 100, 1)
steps = seq(0.01, 0.5, 0.05)
s = sapply(draws, function(x) sapply(steps, function(y) dog(x, y)))
rownames(s) = steps
colnames(s) = draws
require(pheatmap)

## Loading required package: pheatmap

main = "Prob of underestimation: x axis=sample size; y axis=prop occupied blocks"
pheatmap(s, cluster_rows = F, cluster_cols = F, main = main)

plot of chunk unnamed-chunk-3

There is a sequential pattern based on the sample size that I don't understand.
In this first graphic, it is hard to see the exact values. So in the following graphic, the probabilities are broken into three ranges: 0-.4 (red), .4-.6 (yellow) and .6-1 (blue).

s = matrix(sapply(cut(s, c(0, 0.4, 0.6, 1), labels = 1:3), as.numeric), 
    nrow = dim(s)[1])
rownames(s) = steps
colnames(s) = draws
pheatmap(s, cluster_rows = F, cluster_cols = F, main = main, legend = F)

plot of chunk unnamed-chunk-4