Due: 3/11/2014
Feynman Liang (feynman.liang@gmail.com)
She must buy at least 262 packages. The expected number of packages she will have to buy is \( \mathbb{E}(X) \), which can be calculated to be
N <- 262
N * sum(1/c(1:N))
## [1] 1611
Simulating the distribution of \( X \)
# collector simulation
nbought <- function(N) {
coupons <- matrix(0, 1, N) # initially have no coupons
for (k in 1:1e+05) {
# collect until complete collection
x <- sample.int(N, 1, replace = TRUE)
coupons[x] <- 1 # mark that coupon type as collected
ifelse(sum(coupons) == N, return(k), -1)
}
}
N <- 262 # set to desired number of different coupons
y <- sapply(matrix(N, 1000, 1), nbought) # set number of samples
hist(y, main = paste("Coupon collector, mean=", mean(y)), breaks = 30)
The sample mean of 1612.907 is fairly close to the theoretical mean from (1).
A particular coupon's waiting time \( t \) can be modeled by a Poisson process with rate \( \lambda = m/N \). This means that \[ \Pr \{ T_{\text{one coupon}} \leq t \} = 1 - \Pr \{ T > t \} = 1 - e^{-(m/N) t} \]
Given that the process for each coupon is i.i.d, the probability for all coupon processes completing before \( T \) can be found by taking the product over each individual coupon \[ F(t) = \Pr \{T \leq t \} = (\Pr \{ T_{\text{one coupon}} \leq t \})^N = (1 - e^{-m t / N})^N \]
Using the substitution \( u = 1 - e^{-m t / N} \), we have that \( t = \frac{-N \log(1 - u)}{m} \) and therefore \( dt = \frac{N}{m (1-u)} du \). Expanding the given expression for expected waiting time \[ \mathbb{E}[T] = \int_0^\infty (1 - F(t)) dt = \int_0^\infty (1 - (1 - e^{-m t / N})^N) dt = \int_0^1 (1-u^N) \frac{N}{m (1-u)} du = \frac{N}{m} \int_0^1 \frac{1-u^N}{1-u} du \] Identifying \( \frac{1-u^N}{1-u} = \sum_{k=0}^{N-1}u^k \) to be the geometric sum formula (summation and integration can be switched because the finite sum is absolutely convergent) \[ \mathbb{E}[T] = \frac{N}{m} \int_0^1 \sum_{k=0}^{N-1} u^k du = \frac{N}{m} \sum_{k=0}^{N-1} \int_0^1 u^k du = \frac{N}{m} \sum_{k=0}^{N-1} \frac{1}{k+1} (1^k-0^k) = \frac{N}{m} \sum_{n=1}^{N} \frac{1}{n} \] Compared to equation (1) for the geometric distribution \[ \mathbb{E}[X] = N \sum_{n=1}^N \frac{1}{n} \] we see that the two are very similar with only an additional \( \frac{1}{m} \) factor. This makes sense because \( X \) represents the number of coupons bought before all \( N \) distinct coupons are obtained whereas \( T \) represents the time until all \( N \) distinct coupons are obtained. If coupons are bought at a rate \( m \), then the relationship \[ \mathbb{E}[T] \cdot m = \sum_{n=1}^n \frac{1}{n} = \mathbb{E}[X] \] says that the expected number of packages bought until all coupons are collected is equal to the rate \( m \) times the expected time until all coupons are collected if buying at rate \( m \).