Removing Bias

Removing bias

It's possible to remove bias from a stream of random numbers. Three popular methods are:

1, The Von Neumann method.
Take two independent 1-bit samples. If the bits are the same, discard them and take two more. If they are different, use 1 of the bits as the output. This method has the nice property that it's theoretically perfect - regardless of how biased the original source is, the output is exactly as likely to be a 1 as a 0. But it has the unnice property that it might never produce any bits. So, there's method
2, Exclusive Or.
Exclusive or (XOR) involves taking a chunk of the stream, and reducing the number of bits by XOR'ing them together. If the original source is 1 90% of the time, and 0 10%, then the XOR of 10 bits will be 1 about 55% of the time, a vast improvement. XOR'ing enough bits brings the probably of a 1 close enough to 50% for all practical purposes. This is guaranteed to produce numbers in a known amount of time, but it uses up a lot of bits, and so there's method:
3, Hashing.
Hashing involves running a chunk of bits through a hashing function. CRC16 only takes about 6 bits to equal the performance of 10 XOR'ed bits, and larger hashing functions work better. Plus, you can feed almost anything into a hashing function, and distill it down to just the randomness.
Keyrand uses a simple CRC function to hash key stroke timing. SHA256 and MD5 are popular hashing functions in cryptographic random number generators like Fortuna SHA256 is used mostly for security reasons, not because it produces a higher quality of random number. However, it does do a better job of mixing than CRC does.