Coupon Collector's Problem

Q：Assuming there are $n$ different types of coupons, each with an equal probability of being obtained, and an unlimited supply of coupons. On average, how many times does one need to draw to collect all $n$ types of coupons?

Solution

The problem is to find the expected number of draws needed to collect all $n$ types of coupons. Let $T$ be the number of draws required to collect all $n$ types of coupons, and let $t_{i}$ be the number of draws required to collect the $i$-th type of coupon after collecting the first $i-1$ types. $T=\sum\limits_{i=1}^{n}{t_{i}}$, $t_{i}(i=1,2,…,n)$ are independent, and $t_{i}\sim Ge(p_{i})$, where $p_{i}=\frac{n-i+1}{n}$. Therefore, $E(T)=\sum\limits_{i=1}^{n}E(t_{i})=\sum\limits_{i=1}^{n}\frac{1}{p_{i}}=\frac{n}{n}+\frac{n}{n-1}+…+\frac{n}{1}=nH_{n}$

As an advanced topic, let’s calculate the variance of $T$:

$$ \begin{aligned} Var(T) & = \sum_{i=1}^{n}Var(t_{i}) = \sum_{i=1}^{n}\frac{1-p_{i}}{p_{i}^2} \\ & = \sum_{i=1}^{n}\frac{n(i-1)}{(n-i+1)^2} = n\sum_{i=1}^{n}\frac{n-(n-i+1)}{(n-i+1)^2} \\ & = n\sum_{i=1}^{n}(\frac{n}{(n-i+1)^2}-\frac{1}{(n-i+1)}) \\ & = n^2\sum_{i=1}^{n}\frac{1}{i^2}-n\sum_{i=1}^{n}\frac{1}{i} \\ & = n^2H_n^{(2)}-nH_n \end{aligned} $$

Enhancements

Enhancement 1

Lemma：Prove$\sum\limits_{k=1}^{n}(-1)^{k+1} \frac{1}{k} C_{n}^{k}=1+\frac{1}{2}+\cdots+\frac{1}{n}$

Note that

$$ \frac{1}{k} C_{n}^{k}=\frac{1}{k}\left(C_{n}^{k}+C_{n-1}^{k-1}\right)=\frac{1}{k} C_{n-1}^{k}+\frac{1}{n} C_{n}^{k} $$

Let $f_{n}=\sum\limits_{k=1}^{n}(-1)^{k-1} \frac{1}{k} C_{n}^{k}$, Thus

$$ \begin{aligned} f_{n}&=\sum_{k=1}^{n}(-1)^{k+1} \frac{1}{k} C_{n}^{k}+(-1)^{n+1} \frac{1}{n}\\ &=\sum_{k=1}^{n-1}(-1)^{k+1} \frac{1}{k} C_{n-1}^{k}+\frac{1}{n} \sum_{k=1}^{n-1}(-1)^{k+1} C_{n}^{k}+(-1)^{n-1} \frac{1}{n} \\ & =f_{n-1}+\frac{1}{n} \sum_{k=1}^{n}(-1)^{k+1} C_{n}^{k}\\&=f_{n-1}+\frac{1}{n} \end{aligned} $$

The last step is due to $$ \sum_{i=1}^{n}(-1)^{k+1}C_n^k=\sum_{i=0}^{n}(-1)^{k+1}C_n^k+C_n^0=C_n^0-\sum_{i=0}^{n}(-1)^{k}C_n^k=1 $$ Thus $f_n=f_{n-1}+\frac{1}{n}$, and$f_1=1$, we have $f_n=1+\frac{1}{2}+\cdots+\frac{1}{n}$.

Now let’s look at some enhanced questions:

Q: Suppose there are $n$ types of coupons, each with an equal chance of being obtained, and the coupons are also supplied infinitely. If t coupons are drawn each time, on average, how many times will it take to collect all $n$ types of coupons?

A：Let $A_{k,i}$ be the event that the $i$-th type of coupon has not been obtained after the $k$-th draw, and let $T$ be the number of draws. Then,

$$ \begin{align*} &\quad\ P(T>k)\\ &=P\left(\mathop{\cup}\limits_{i=1}^{n} A_{k, i}\right)\\ &=\sum_{i=1}^{n} P\left(A_{k, i}\right)-\sum_{1 \leqslant i< j \leqslant n_{i}} P\left(A_{k, i} \cap A_{k, j}\right)+\cdots+(-1)^{n+1} P\left(\cap_{i=1}^{n} A_{k, i}\right) \end{align*} $$

Note that,

$$ P\left(\cap_{s=1}^r A_{k, i_{s}}\right)=\left(\frac{C_{n-r}^{t}}{C_{n}^{t}}\right)^{k} , 1 \leqslant r \leqslant n $$

Substituting this into the previous equation, we have,

$$ P(T>k)=\sum_{r=1}^{n}(-1)^{r+1} C_{n}^{r}\left(\frac{C_{n-r}^{t}}{C_{n}^{t}}\right)^{k} $$

$$ \begin{aligned} E(T) &=\sum_{k=0}^{\infty} P(T>k)=\sum_{k=0}^{\infty} \sum_{k=1}^{n}(-1)^{r+1} C_{n}^{t}\left(\frac{C_{n-r}^{t}}{C_{n}^{t}}\right)^{k}\\ &=\sum_{r=1}^{n}(-1)^{r+1} C_{n}^{r} \sum_{k=0}^{\infty}\left(\frac{C_{n-r}^{t}}{C_{n}^{t}}\right)^{k} \\ &=\sum_{r=1}^{n}(-1)^{r+1} C_{n}^{r} \frac{1}{1-C_{n-r}^{t} / C_{n}^{t}} \end{aligned} $$

We verify the case where t=1,

$$ \begin{aligned} E(T)&=\sum_{k=1}^{n}(-1)^{r+1} \frac{n}{r} C_{n}^{r}=n \sum_{r=1}^{n}(-1)^{r+1} \frac{1}{r} C_n^r\\ &=n\left(1+\frac{1}{2}+\cdots+\frac{1}{n}\right) \end{aligned} $$

Enhancement 2

Q: Suppose there are $n$ types of gift vouchers, each with $a_i$ items, and each time you draw one out and put it back. What is the expected number of draws to get all $n$ types of gift vouchers?

A：Let $p_i=\frac{a_i}{\sum\limits_{i=1}^na_i}(i=1,2,…,n)$, and let $A_{k,i}$ be the event that the $i$-th gift voucher has not been obtained after $k$ draws. Let $T$ be the number of draws. It is easy to see that $\sum\limits_{i=1}^np_i=1$.

$$ \begin{aligned} &\quad\ P(T>k)\\ &=P\left(\cup_{i=1}^{n} A_{k, i}\right)\\ &=\sum_{i=1}^{n} P\left(A_{k, i}\right)-\sum_{1 \leqslant i< j \leqslant n_{i}} P\left(A_{k, i} \cap A_{k, j}\right)+\cdots+(-1)^{n+1} P\left(\cap_{i=1}^{n} A_{k, i}\right) \end{aligned} $$

And $P\left(\sum\limits_{s=1}^{r} A_{k, i_{s}}\right)=\left(1-\sum\limits_{j \in J, J= \{ 1.2, \cdots, n \} \backslash\left \{ i_{1}, \ldots, i_{r}\right \} } p_{j}\right)^{k}$
Thus,

$$ \begin{aligned} E(T)&=\sum_{k=0}^{\infty}P(T>k)\\ &=\sum_{k=0}^\infty(-1)^{r+1}\sum_{r=1}^n\sum_{1\leqslant i_1 < …<i_r \leqslant n}\left(1-\sum_{j \in J, J= \{ 1.2, \cdots, n \} \backslash\left \{ i_{1}, \ldots, i_{r}\right \} } p_{j}\right)^{k}\\ &=\sum_{r=1}^n(-1)^{r+1}\sum_{1\leqslant i_1 < …<i_r \leqslant n}\sum_{k=0}^\infty\left(1-\sum_{j \in J, J= \{ 1.2, \cdots, n \} \backslash\left \{ i_{1}, \ldots, i_{r}\right \} } p_{j}\right)^{k} \\ &=\sum_{r=1}^n(-1)^{r+1}\sum_{1\leqslant i_1 < …<i_r \leqslant n}\frac{1}{\sum\limits_{j \in J, J= \{ 1.2, \cdots, n \} \backslash\left \{ i_{1}, \ldots, i_{r}\right \} } p_{j}}\\ &=\sum_{r=1}^n\sum_{1\leqslant i_1 < …<i_r \leqslant n}(-1)^{r+1}\frac{1}{p_{i_{1}}+p_{i_{2}}+\cdots+p_{i_{r}}} \end{aligned} $$