A hash is cryptographic algorithm that attempts to uniquely describe inputted content by outputting a value that is unique for a given piece of inputted content. A good hash algorithm has several characteristics, including:
- It will output the same hash result for the same exact input, every time.
- It will output a different hash result for a different input.
- Given only the hash output result, it should be non-trivial to impossible to recreate the input that generated the output.
If two different inputs create the same output, it's called a hash collision. In 2004 and 2005, Chinese Professor Xiaoyun Wang and other colleagues demonstrated that the hash algorithms MD5 and SHA-1 had weaknesses that allowed previously unexpected collisions at iteration rounds deemed cryptographically weak. It sent a minor shockwave through the cryptographic community and nearly everyone else paying attention.
MD5 and SHA-1, although not alone in a crowded hash field, are easily the most used cryptographic algorithms used to compare two pieces of stand-alone content in the PC world. Name a popular operating system (e.g. Windows, Macintosh, Linux, Unix, Solaris, BSD, etc.) and you'll find it full of MD5 and SHA-1 cryptographic comparisons. For example SHA-1 is used in Apple's OS for password hashing.
It sent a shockwave, but it didn't freak out the cryptographic community. First, a community full of double mathematical doctorates is a staid crowd to begin with. Second, being cryptographers, they're predicting this sort of stuff all the time. The discovery of the MD5 and SHA1 collisions was just a confirmation of something cryptographers repeat in every statement, about "...no algorithm can be proved to be secure," and other pronouncements like that. Plus, there had been other previous MD5 and SHA-1 findings that pointed to potential collision weaknesses.
The cryptographic community and NIST responded by telling vendors and users to use better and stronger hash algorithms, including bigger variants of SHA-1 called SHA-256 and SHA-512. The increase in bit size theoretically makes the collisions more difficult to generate (although, interestingly, the July 2007 draft recommendation still includes the smaller 160-bit SHA-1 algorithm).
Many vendors (i.e. any with common sense) have started to abandon MD-5 and SHA-1. For example, Microsoft made a formal announcement during the Windows Vista beta program to begin removing all instances of MD-5 and SHA-1 from Windows. Although MD-5 and SHA-1 remnants remain in use, many cryptographic functions have been upgraded, and what remains is on the chopping block for future Windows versions. Most other popular vendors are doing, or have plans to do, the same.
One of the biggest fear scenarios from a hash collision is that two programs, one legitimate and one malicious, could end up with the same hash output value (i.e. the bogus program could be substituted for the legitimate one with no one the wiser). But in the real world, it would be non-trivial for an attacker to find a legitimate target program and then set about making a malicious clone that would produce precisely the same hash value. Making a second program with the same hash is hard enough, but making a second malicious program that mimics the legitimate program well enough that people run it while it performs malicious instructions the attacker wants instead is orders of magnitude harder to pull off. Even I, a crypto-hobbyist, not a crypto expert, felt like the theoretical collisions of MD5 and SHA-1 were more of a mathematical issue and not a real problem.
Maybe there's more to this...
Then come along a few demonstrations that make me realize that I'm just a crypto hobbyist and I should really just stay out of making cryptographic pronouncements. One site that reminded of this is http://www.win.tue.nl/hashclash/Nostradamus. I've seen the demonstration before, but this site is the most recent one I've come across and it explains the involved issues pretty well, and does so with humor.