Saturday, October 18, 2014

a bit of stats: Relationship between Poisson and Binomial Distribution

I have came across a problem that seems very counter intuitive in a book "All of Statistics", roughly goes like this:
Let random variable N follows Poisson Distribution, and we do N flips of a coin, with probability of getting a head is p. Let X and Y be the random variables representing the total number of heads and tails respectively. Prove that X and Y are independent.

Wait, what? Does not X affects Y? But apparently it does not and here is the proof:

If we know \(N = n\) in advance, then X given N follows Binomial distribution where \(P(X=x|N=n) = {n \choose x} p^x (1-p)^{n-x} \). However, in general, the random variable \(X\) takes in any values from 0 to infinity, hence it certainly does not follow binomial distribution. To find its probability mass function, we use the theory of total probability across \(N\):
\(
\begin{eqnarray*}
P(X=x) &=& \sum_{n=0}^{\infty} P(X=x|N=n) P(N=n)  \\
     &=& \sum_{n=0}^{\infty} {n \choose x} p^x (1-p)^{n-x} e^{-\lambda} \frac{\lambda^n}{n!} \\
     &=& \sum_{n=0}^{\infty} \left( \frac{(p\lambda)^xe^{-p\lambda}}{x!} \right) \left( \frac{((1-p)\lambda)^{n-x}e^{-(1-p)\lambda}}{(n-x)!} \right) \\
     &=& \left( \frac{(p\lambda)^xe^{-p\lambda}}{x!} \right)  \sum_{n=0}^{\infty} \left( \frac{((1-p)\lambda)^{n-x}e^{-(1-p)\lambda}}{(n-x)!} \right)
\end{eqnarray*}
\)
The summation is equal to 1 since it is the sum over the probability mass density of \(\text{Poisson}((1-p)\lambda)\), hence we arrive at the conclusion that \(X \sim \text{Poisson}(p\lambda)\). Similarly, \(Y \sim \text{Poisson}((1-p)\lambda) \).
Lastly, we need to find a way to conclude that \(P(X=x \text{ and } Y=y) = P(X=x)P(Y=y) \). Using theory of total probability we have
\(P(X=x \text{ and } Y=y) = \sum_{n=0}^{\infty} P(X=x \text{ and } Y=y | N=n) P(N=n) \)
Here, the terms with \(x+y \neq n\) will equal to zero, hence the probability is the summation over those where \(x+y=n\). This last step is direct cancellation of factorial terms to derive the desired conclusion.