Say you have some real or virtual coin that you can flip infinitely many times. Beforehand, you do not know if the coin is biased. Additionally, you can't see the flips (or implementation) but the reported results are honest and generation procedure never changes. How can you turn the resulting sequence of possibly biased coin flips into a sequence of definitely unbiased ones?
I can't remember who asked me this rhetorical question first, but the lesson stuck. I think it was a stats class, so the word "infinitely" made me start thinking about asymptotics. That is, at the limit, I could infer what the underlying distribution was and use that to synthesize an unbiased one. But, if you think about that for a bit you'll find lots of complications.
- How many samples do I have to burn before I have a good estimate?
- Does good mean certainly unbiased?
- How do I create a coding from one sample to one unbiased outcome?
You'll suddenly find yourself realizing that you don't want statistical thinking, you want probabilistic thinking. That's the stuff of the Von Neumann extractor, the elegant solution to this puzzle.
Nowhere in the problem specification was there a constraint on memory or the requirement that you must emit 1 output for each coin flip. If you instead operate on pairs of (non-overlapping) samples at a time, you'll have one of four outcomes,
X | P(X) |
---|---|
HH | pp |
TH | pq |
HT | qp |
TT |
You do not know p=P(H) or q=1-P(H). But, if you look at the middle two rows, you'll notice that they are equal: pq = qp. This is exactly the unbiased distribution demanded. So if you want to translate the possibly biased sequence into a definitely unbiased one, emit nothing for HH and TT; 1 for TH and 0 for HT.
X | P(X) | Y |
---|---|---|
HH | pp | Ø |
TH | pq | 1 |
HT | qp | 0 |
TT | Ø |
The caveat of course, is that sampling may be slow. For example, if P(H) = 0.99999999999, you're going to be discarding HH pairs most of the time. But, the transformation gets you exactly the distribution you want with three memory registers (two for the pair, one for the bit that says when to clear the pair.).
Anyways, probability is fun.