In celebration of my recent birthday, I'd like to ask everyone a question. How big a crowd do you need before you find two people with the same birthday?
The easy answer is 366 people. Since there are only 365 days in the year, any larger crowd will guarantee that two people in the group share a birthday. Surprisingly, you really only need 23 people to have a better than even chance that two people in the group will share a birthday. For a lot of cases, a 50% chance of finding a match is more than enough.
Let's consider a group of 23 people. We want to calculate the probability that they all have different birthdays. Remember, in probability, an event that will happen has a probability of 1 and you count down from there to zero as the event gets increasingly improbable. We'll ignore leap years and assume all birthdays are equally likely. Taking those into account would put the math even more in our favor.
The probability that the first person has a unique birthday, since you haven't recorded anyone else's birthday, is 1. The probability that the second person has a different birthday, out of the 364 days left, is (1 - 1/365). The third person, with two days already marked out, as a probability of (1 - 2/365) to have a birthday different from the first two. To get the probability that all three people have different birthdays, you multiply the individual probabilities, like so: 1(1 - 1/365)(1 - 2/365).
Continuing this way for the entire group, (1 - 1/365)(1 - 2/365)...(1 - 22/365) = 0.493. With a little extra math, 1 - .493, the probability that two people in a group of 23 will have the same birthday is 0.507 or 50.7%! That's interesting but how can it be applied?
Generalizing from the birthday collision example above, let's suppose we have n objects in a room and r people in the hallway. Each person walks in, picks up one of the objects, puts it back down and walks out. If r = 1.177√n then there's about 50% probability that two people selected the same object.
Applying to cryptography, like digitally signing contracts, requires a slightly different set-up. Suppose there are now two groups of r people. The probability that someone from the second group chooses the same object as someone from the first group can be calculated. If r = √dn then the probability that two people chose the same object, i.e. that there is a collision, is 1 - e-d.
Let's introduce a concept called a hash function. Essentially, it's the equation at the heart of encryption. Hashes, the output from hash functions, underpin all electronic communications and commerce. A hash function calculates a small fixed-length output, like an index value, from any large, possibly arbitrary amount, of data. These outputs are supposed to be unique. Collisions occur when they aren't.
Let's suppose we have two individuals who need to digitally sign a contract, Notch and a Creeper. Obviously Notch doesn't trust the creeper not to change the contract to, say, blow up Notch's world. But Notch feels confident here because the hash function he's using to make the digital signature outputs a 50 bit hash. The chances that the creeper can come up with fraudulent contract with a matching hash are around 1 in 1,000,000,000,000,000.
The creeper, having studied birthday attacks, knows it needs to find 30 places it can make minor changes to the contract. Deciding at each place to either make the change or keep the original, it comes up with around one billion essentially identical documents. We'll have to assume a billion different versions is nothing for contract negotiations of this scale. Now, using high-powered redstone computers, the creeper calculates the hashes from all one billion legitimate versions of the contract. Then, using the same 30 changes with the fraudulent contract, it also calculates the hashes of the billion fraudulent versions.
Using the probability equations derived above where r = 230 and n = 250, we see that d = 1024 and the probability of finding a collision now is 1 - e-1024 or nearly equal to one.
The creeper asks Notch to sign the slightly edited good version and then copies his signature onto the fraudulent version with the matching hash. Since both the good and fraudulent versions have the same hash, the signature would still be valid and the creeper could claim that Notch wanted to be blown up.
Notch, having followed this discussion on birthday attacks so far, knows he can foil the creeper by making another minor change to the contract and then immediately signing the new version. Since the signed version has a different hash than the fraudulent contract the creeper has, there is no collision.
But, since it's a creeper, there will be an explosion. hisssssss...